circuit

Data Extraction: Parse a 3-Nested JSON Object and Convert it to a pandas dataframe




JSON, or JavaScript Object Notation, is a human-readable text-based format for data exchange between a server and web application, as an alternative to XML.

Querying API in Python with JSON Output

As a follow-up to my previous blog which mentions rapidapi.com as one of the data sources to extract OHLC (Open, High, Low and Close price) data for cryptocurrency, let's look at querying an API in Python.

We can use the requests package for this purpose. To get the data as a JSON output, we call the **request.get **method. The response variable in the below code, stores the returned values, and the JSON output is parsed using the json( ) method.

API request to rapidapi.comAPI request to rapidapi.com

The request.get method has the following parameter values: requests.get(url, params={key: value}, args).

Detailed documentation can be looked at here: Python Requests get Method *❮ Requests Module Make a request to a web page, and return the status code: import requests x =…*www.w3schools.com

What is a Nested-JSON ?

A JSON object is a collection of key and value pairs.

For example, { 'data’ : [ {'screen_ID’ : '63’, in this instance 'data’ and 'screen_ID’ are the keys, and '63’ is the value. Keys must be strings, and values must be a valid JSON data type (string, number, object, array, boolean or null).

So, what is a Nested-JSON? A Nested-JSON is a JSON object which has other JSON objects or Javascript arrays as its values.

Nested-JSON Types (Source:www.digitalocean.com)Nested-JSON Types (Source:www.digitalocean.com)

The API request to rapidapi.com produces a nested JSON output with Javascript arrays, structured as follows:

JSON output of API request to rapidapi.comJSON output of API request to rapidapi.com

JSON Output to Pandas Dataframe

Each nested JSON object has a unique access path. To get first-level keys, we can use the json.keys( ) method. In this case, it returns 'data’ which is the first level key and can be seen from the above image of the JSON output.

**pd.json_normalize **is a function of pandas that comes in handy in flattening the JSON output into a datatable. So, using the first level key in the following code format returns a datatable like below:

df stores flattened table of json outputdf stores flattened table of json output

Our problem statement here is to extract the keys and corresponding values of 'color’, 'date’, 'high’, 'low’, 'open’, 'perc_chg’, 'price’, 'vol’ to a data table format. These values are contained in 'screen_data.data’( To access nested fields, concatenate the field names with a . (dot) as separator) column of the df, dataframe, using below code we extract this column to a datatable:

dt stores df['screen_data.data’]dt stores df['screen_data.data’]

But here all the values are stored as a list. To transform each element of this list to a row in the dataframe, we can use pd.dataframe.explode. This returns exploded lists as rows of the subset columns; index will be duplicated for these rows as below.

Here, dt stores exploded dataframe, dtHere, dt stores exploded dataframe, dt

Now that we have a dictionary in each row of the datatable dt, we can use pandas.Series to convert the dictionaries to a Pandas series format and apply** pandas.Dataframe** to further convert it into type dataframe.

Final datafram df_finalFinal datafram df_final

“Data extraction through API requests and web scraping is especially useful for analyzing or automating data service on web-based applications.”

The JSON format is already compatible in itself and pandas functions add extra flexibility in making the parsing and readability of this data more user-friendly.




Continue Learning