Pandas read_sql_query in Python
SQL is a programming language used to create and maintain databases in a Relational Database Management System. We can create many database applications using Python.
Several libraries like SQLite
, SQLAlchemy
, and more can be used to create connections to SQL applications installed on the device.
In Python, we can store tables as DataFrame objects using the pandas
library. We can work with SQL queries with this library as well.
This tutorial will demonstrate how to read results from a SQL query using the pandas.read_sql_query()
function.
Python Pandas read_sql_query
Using the pandas.read_sql_query()
function, we can directly read the result from a query into a DataFrame. We need to create a connection to an SQL database to use this function.
In our example, we will use the sqlite3
library. We can use this library to work with the SQLite database application.
The connect()
method is used to create a connection, and the name of the database is provided in this function. We will create the connection and execute the SELECT
statement to load data into the DataFrame.
See the code below.
import pandas as pd
import sqlite3
connection = sqlite3.connect("delftstack.db")
crsr = connection.cursor()
ct_sql = """CREATE TABLE data3 (
number INTEGER,
name VARCHAR(20));"""
crsr.execute(ct_sql)
crsr.execute("""INSERT INTO data3 VALUES (1, "GEORGE");""")
crsr.execute("""INSERT INTO data3 VALUES (2, "KEVIN");""")
df = pd.read_sql_query("""SELECT number,name FROM data3""", con=connection)
print(df)
connection.close()
Output:
number name
0 1 GEORGE
1 2 KEVIN
In the above example, we read data from the data1
table in the delftstack.db
database. The result of the query is stored in the df
DataFrame.
We can use some convenient parameters also with this function. The index_col
parameter is used to specify the index for the DataFrame, and the column names can be provided with the columns
parameter.
The coerce_float
parameter can convert non-numeric, non-string values to floating-point numbers.
The most useful parameter here is the chunksize
parameter. We may encounter huge tables and datasets in SQL.
So if we read them at once into a DataFrame, then a lot of memory is consumed internally. This can be avoided with the chunksize
parameter that creates chunks of data of the specified rows.
It creates iterator objects of such chunks that can be iterated over and used to work with the data.
The pandas.read_sql
is another function available in the pandas
library that can read the result of an SQL query to a DataFrame. However, this is just a wrapper around the read_sql_query
and read_sql_table
functions for backward compatibility.
Whenever the pandas.read_sql()
function encounters an SQL query, it gets routed to the read_sql_query
function discussed in this tutorial.
An alternative to this function is by using the fetchall()
function. This function fetches all the rows of the result of an SQL query.
We will execute the SELECT
statement to print the rows and use this function to retrieve them and store them in a DataFrame using the pandas.DataFrame
constructor.
See the following example.
import pandas as pd
import sqlite3
connection = sqlite3.connect("delftstack.db")
crsr = connection.cursor()
ct_sql = """CREATE TABLE data4 (
number INTEGER,
name VARCHAR(20));"""
crsr.execute(ct_sql)
crsr.execute("""INSERT INTO data4 VALUES (1, "GEORGE");""")
crsr.execute("""INSERT INTO data4 VALUES (2, "KEVIN");""")
crsr.execute("""SELECT number,name FROM data4""")
df_new = pd.DataFrame(crsr.fetchall(), columns=["number", "name"])
print(df_new)
connection.close()
Output:
number name
0 1 GEORGE
1 2 KEVIN
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn