PandasAI lets you leverage LLMs to perform querying, filtering and graph creation over Pandas DataFrame’s.

Let’s start off by defining our LLM inside PandasAI

from pandasai.llm import AzureOpenAI
from dotenv import load_dotenv
from os import getenv

load_dotenv()

gpt4o = AzureOpenAI(
    api_token=getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=getenv("OPENAI_VERSION"),
    deployment_name=getenv("GPT4O_NAME"),
)

Now let’s define our SmartDataFrame

from pandasai import SmartDataframe
from pandas import DataFrame

config = {
    "custom_whitelisted_dependencies": [
        "collections",
        "re",
        "wordcloud",
        "nltk",
        "sklearn",
        "random",
    ],
    "save_charts": True,
    "verbose": False,
    "llm": gpt4o,
}

def filter_dataframe(query: str, df: DataFrame):
    pandas_df = SmartDataframe(df=df, config=config)
    response_df = pandas_df.chat(query=query)
    return response_df

<aside> 💡

Just a heads up that I am using my own fork as I needed to disable a security check for uninterrupted use. pip install git+https://github.com/Galahad-OSS/pandas-ai

</aside>

If you think about it it’s a valid concern

https://github.com/Galahad-OSS/pandas-ai/blob/main/pandasai/safe_libs/base_restricted_module.py#L3-L13

All these packages allow running kernel level commands or importing an otherwise blacklisted library. Perhaps we’re better off using the original package at https://github.com/sinaptik-ai/pandas-ai