How to Execute SQL Query in Pandas
-
Use
pandasql
to Run SQL Queries in Python -
Use
DuckDB
to Run SQL Queries in Python -
Use
Fugue
to Run SQL Queries in Python
SQL stands for Structured Query Language; it is a well-known language used to interact with a relational database. There are many ways to run SQL queries in Python.
Use pandasql
to Run SQL Queries in Python
This package has an sqldf
method like the sqldf
in R
. The pandasql
provides a more familiar way to perform CRUD operations on the data frame.
Before we use pandasql
, we have to install it first using the following command.
#Python 3.x
pip install -U pandasql
We will import the sqldf
method from the pandasql
module to run a query. Then we will call the sqldf
method that takes two arguments.
The first argument is a SQL query in string format. The second argument is a set of session/environment variables (globals()
or locals()
).
In the following code, first, we have uploaded the Student.csv
data to our current working directory of the Jupyter
notebook. Then we have read the student’s records using the traditional SQL Select
query over the data frame; it will show all records in the data frame.
# Python 3.x
import pandas as pd
from pandasql import sqldf
def mysql(q):
return sqldf(q, globals())
df = pd.read_csv("Student.csv")
mysql("SELECT * FROM df")
Output:
We have used the WHERE
clause in the SQL query to display only selected records that satisfy the given condition in the following code.
# Python 3.x
import pandas as pd
from pandasql import sqldf
def mysql(q):
return sqldf(q, globals())
df = pd.read_csv("Student.csv")
mysql("SELECT * FROM df WHERE Department = 'SE'")
Output:
Use DuckDB
to Run SQL Queries in Python
DuckDB
is a Python API and a database management system that uses SQL queries to interact with the database.
To use DuckDB
, we should install it first using the following command.
#Python 3.x
pip install duckdb
In the following code, we have imported the duckdb
and Pandas package, read the CSV file and run the query by calling the query()
method with duckdb
. We will pass the query (as an argument) to the query()
method.
The code will return the result as a data frame. We can write any SQL query of our choice according to the data frame.
# Python 3.x
import pandas as pd
import duckdb
df = pd.read_csv("Student.csv")
duckdb.query("SELECT * FROM df").df()
Output:
Use Fugue
to Run SQL Queries in Python
Fugue
is a unified interface for distributed computing that allows users to run Python, Pandas, and SQL code on Spark and Dask without rewriting.
We have to install it first using the following command to use fugue
.
#Python 3.x
pip install fugue[sql]
We have imported Pandas and fugue
packages in the following code and loaded the data frame with the CSV file. Then we will pass our SQL query to the fsql()
method and call the run()
method with it.
# Python 3.x
import pandas as pd
from fugue_sql import fsql
df = pd.read_csv("Student.csv")
query = "SELECT * FROM df PRINT"
fsql(query).run()
Output:
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn