PandasAI lets you leverage LLMs to perform querying, filtering and graph creation over Pandas DataFrame’s.
Let’s start off by defining our LLM inside PandasAI
from pandasai.llm import AzureOpenAI
from dotenv import load_dotenv
from os import getenv
load_dotenv()
gpt4o = AzureOpenAI(
api_token=getenv("AZURE_OPENAI_API_KEY"),
azure_endpoint=getenv("AZURE_OPENAI_ENDPOINT"),
api_version=getenv("OPENAI_VERSION"),
deployment_name=getenv("GPT4O_NAME"),
)
Now let’s define our SmartDataFrame
from pandasai import SmartDataframe
from pandas import DataFrame
config = {
"custom_whitelisted_dependencies": [
"collections",
"re",
"wordcloud",
"nltk",
"sklearn",
"random",
],
"save_charts": True,
"verbose": False,
"llm": gpt4o,
}
def filter_dataframe(query: str, df: DataFrame):
pandas_df = SmartDataframe(df=df, config=config)
response_df = pandas_df.chat(query=query)
return response_df
<aside> 💡
Just a heads up that I am using my own fork as I needed to disable a security check for uninterrupted use.
pip install git+https://github.com/Galahad-OSS/pandas-ai
</aside>
If you think about it it’s a valid concern
https://github.com/Galahad-OSS/pandas-ai/blob/main/pandasai/safe_libs/base_restricted_module.py#L3-L13
All these packages allow running kernel level commands or importing an otherwise blacklisted library. Perhaps we’re better off using the original package at https://github.com/sinaptik-ai/pandas-ai