How to Convert JSON to a Pandas DataFrame
This article will introduce how to convert JSON to a Pandas DataFrame.
JSON stands for JavaScript Object Notation. It is based on the format of objects in JavaScript and is an encoding technique for representing structured data. It is widely used these days, especially for sharing data between servers and web applications.
Due to its simplicity and influence from programming language data structures, JSON is becoming immensely popular. It’s relatively easy to understand, and the following is a simple example of a JSON response from an API.
{
"Results":
[
{ "id": "01", "Name": "Jay" },
{ "id": "02", "Name": "Mark" },
{ "id": "03", "Name": "Jack" }
],
"status": ["ok"]
}
As you can see in our example, JSON appears to be somewhat a combination of nested lists and dictionaries; therefore, it is relatively easy to extract data from JSON files and even store it as a Pandas DataFrame.
Pandas and JSON libraries in Python can help in achieving this. We have two functions read_json()
and json_normalize()
which can help in converting JSON string to a DataFrame.
JSON to Pandas DataFrame Using json_normalize()
The json_normalize()
function is very widely used to read the nested JSON string and return a DataFrame. To use this function, we need first to read the JSON string using json.loads()
function in the JSON library in Python. Then we pass this JSON object to the json_normalize()
, which will return a Pandas DataFrame containing the required data.
import pandas as pd
import json
from pandas import json_normalize
data = """
{
"Results":
[
{ "id": "1", "Name": "Jay" },
{ "id": "2", "Name": "Mark" },
{ "id": "3", "Name": "Jack" }
],
"status": ["ok"]
}
"""
info = json.loads(data)
df = json_normalize(info["Results"]) # Results contain the required data
print(df)
Output:
id Name
0 1 Jay
1 2 Mark
2 3 Jack
JSON to Pandas DataFrame Using read_json()
Another Pandas function to convert JSON to a DataFrame is read_json()
for simpler JSON strings. We can directly pass the path of a JSON file or the JSON string to the function for storing data in a Pandas DataFrame. read_json()
has many parameters, among which orient
specifies the format of the JSON string.
The downside is that it is difficult to use with nested JSON strings. So for using read_json()
, we will use a much simpler example as shown below:
import pandas as pd
data = """
{
"0":{
"Name": "Jay",
"Age": "17"
},
"1":{
"Name": "Mark",
"Age": "15"
},
"2":{
"Name": "Jack",
"Age":"16"
}
}
"""
df = pd.read_json(data, orient="index")
print(df)
Output:
Name Age
0 Jay 17
1 Mark 15
2 Jack 16
We set orient
to be 'index'
because the JSON string fromat matchs the pattern as {index : {column: value}}
.
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedInRelated Article - Pandas DataFrame
- How to Get Pandas DataFrame Column Headers as a List
- How to Delete Pandas DataFrame Column
- How to Convert Pandas Column to Datetime
- How to Convert a Float to an Integer in Pandas DataFrame
- How to Sort Pandas DataFrame by One Column's Values
- How to Get the Aggregate of Pandas Group-By and Sum