The open blogging platform. Say no to algorithms and paywalls.

Pandas AI: A Step-by-Step Guide to Exploratory Data Analysis Powered by AI

Pandas AI is a Python library that integrates Generative AI into Pandas, making data analysis conversational. It is a powerful tool for exploring and understanding your data in a natural language way.

Pandas AI

Pandas AI is free to use for everyone. There are no paid plans or subscriptions. To use Pandas AI, all you need is a Python environment eg Jupyter notebooks and an API key for a large language model(LLM).

OpenAI provides API keys for their LLM but it is not free. In this tutorial I will show you how to get an API key for free without any payment method setup and how to use Pandas AI to perform simple Exploratory Data Analysis tasks.

Step 1: Getting free API key

To get a free API key, we will use the huggingface platform. To get the API key from huggingface, follow the steps below;

  • Go to huggingface.co and signup with your email address
  • After your account has been created, go to your profile and click on settings.
  • Chose access tokens, Name the access tokens and set the token to be on the WRITE option and click generate token for your access tokens to be generated.

Generating free access token

Step 2: Installing Pandas AI

To get started, the latest version of pandas AI needs to be installed. This can be done by simply typing the following line of code below into a jupyter notebook cell.

!pip install pandasai

Now that we’ve successfully installed pandas AI, we can start using it for exploratory data analysis.

Step 3: Importing necessary libraries

Here, we are going to import the necessary libraries needed to use Pandas AI for exploratory data analysis.

from pandasai import  SmartDataframe
import pandas as pd
import numpy as np

Step 4: Creating a new DataFrame

Below we are going to create a new DataFrame that is going to be used in this tutorial.

df = pd.DataFrame({
    "country": [
        "United States",
        "United Kingdom",
        "France",
        "Germany",
        "Italy",
        "Spain",
        "Canada",
        "Australia",
        "Japan",
        "China",
    ],
    "gdp": [
        19294482071552,
        2891615567872,
        2411255037952,
        3435817336832,
        1745433788416,
        1181205135360,
        1607402389504,
        1490967855104,
        4380756541440,
        14631844184064,
    ],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12],
})

Step 5: Importing LLM

Since PandasAI is powered by a LLM, we need to import the LLM we would like to use.

This is where we’ll need a API token created using huggunface.

from pandasai.llm import Starcoder
starcoder_llm = Starcoder(api_token="YOUR TOKEN")

Replace “YOUR TOKEN” with the API token gotten from huggingface.co.

Now that we have instantiated the LLM, we can finally instantiate the SmartDataframe using the code below.

sdf = SmartDataframe(df, config={"llm": starcoder_llm})

Now we can start our exploratory data analysis on the dataset.

  • Top 5 countries by GDP
sdf.chat("Return the top 5 countries by GDP")

Top 5 country by GDP
  • Sum of the GDP of the 2 most unhappiest countries
sdf.chat("What's the sum of the gdp of the 2 unhappiest countries?")

Sum of the GDP of the 2 most unhappiest countries
  • Plotting chart of GDP by country
sdf.chat("Plot a chart of the gdp by country")

GDP by country
  • Plotting GDP per country using different colors for each bar
sdf.chat("Plot a histogram of the gdp by country, using a different color for each bar")

GDP per country using different colors for each bar

Conclusion

Pandas AI is a powerful tool for exploring and understanding your data in a conversational way. It is free to use for everyone, so there is no reason not to give it a try.

Here are some of the benefits of using Pandas AI:

  • It is easy to use, even if you are not familiar with Generative AI or with Pandas.
  • It can be used to perform a wide variety of tasks, including data exploration, analysis, visualization, cleaning, imputation, and feature engineering.
  • It can help you to understand your data more deeply and to make better decisions.

You can learn more about Pandas AI from the link below. I encourage you to check it out.

Link: https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing#scrollTo=U5pJRgyY5QR2




Continue Learning