Have you ever had your data stored in a pandas
DataFrame, and wanted to use SQL to query the data? Then you have to check out the pandasql
library. pandasql
is a Python library that allows you to use SQL syntax to work with data in the pandas
DataFrame
structure. It provides simple access to powerful data analysis functionality and makes it easier for users familiar with SQL syntax to utilize the power of pandas
.
Just open and fork the canvas below to check out our example using car sales transactions, or read on for an in-depth walkthrough of how to get started with pandasql
.
pandasql
Getting started with # Imports and setup
__requires__= 'SQLAlchemy==1.4.46'
%pip install SQLAlchemy==1.4.46
%pip install pandasql
import sqlalchemy
from pandasql import *
import pandas as pd
First make sure that you have installed the library and imported the functions. Keep in mind that a particular version of SQLAlchemy
is a dependency. pandasql
was first released back in 2017 by a company called Yhat, Inc. that focused on deploying self service data science tools. Yhat has since been acquired by Alteryx.
pandasql.sqldf()
Write a SQL query using # NOTE: because we're using Einblick, we don't include Line 2, as the CSV is already saved natively into the Einblick platform as df when you upload the dataset.
# df = pd.read_csv("car_sales_transactions.csv")
query = """SELECT Model, Type, ExteriorColor, CarYear, Price
FROM df
WHERE df.Model NOT LIKE '%C%'"""
df2 = sqldf(query = query, env = locals())
Now that the packages have been installed and imported, and the DataFrame has been stored as df
, you can write your query. The main function in pandasql
is sqldf()
. The function can two arguments:
query
: a formatted SQL query stored as a stringenv
: eitherlocals()
orglobals()
to give the function access to environment variables**
The query in this example filters out car models of type C.
Check queried data
# Plot queried data, and check if columns were subset
import seaborn as sns
sns.set_theme()
sns.countplot(data = df2, x = "Model")
print(df2.columns)
Output:

Based on the output, we can see that the SQL query worked! We only have car models A and B. Happy coding!
env
and df
**NOTE on You may have noticed in the code above that we never read in a DataFrame using pd.read_csv()
or a similar command. This is because we uploaded the CSV file directly into Einblick's platform, which automatically saves those CSV files as pandas
DataFrames. By linking the car_sales_transaction
table to the Python cell, the platform saved the data in the table as a DataFrame called df
.
If you're not using Einblick, just make sure you've added in a line of code that instantiates a DataFrame object.

About
Einblick is an AI-native data science platform that provides data teams with an agile workflow to swiftly explore data, build predictive models, and deploy data apps. Founded in 2020, Einblick was developed based on six years of research at MIT and Brown University. Einblick is funded by Amplify Partners, Flybridge, Samsung Next, Dell Technologies Capital, and Intel Capital. For more information, please visit www.einblick.ai and follow us on LinkedIn and Twitter.