How to Read Specific Rows From CSV in Pandas
- Use In-Place Assignment to Read Specific Rows From CSV in Pandas
-
Use the
query
Function to Read Specific Rows From CSV in Pandas -
Use
chunksize
to Read Specific Rows From CSV in Pandas -
Use
loc
to Read Specific Rows From CSV in Pandas -
Use
iloc
to Read Specific Rows From CSV in Pandas
Not all the data we have access to is needed when working with data. Often, we might need just a part of the data.
With Python, we can work with specific sections of data.
If we work with CSV files, we can work and read specific rows from CSV in Pandas. This article will discuss how we will work with specific rows from CSV in Pandas.
Use In-Place Assignment to Read Specific Rows From CSV in Pandas
We can’t read specific rows from CSV during the reading process (via the read_csv
function). Still, after completing the read operation, we can use an in-place assignment to select specific rows from the dataframe we want.
To illustrate, we will use a CSV file attached here. First, let’s read the CSV file - leverage.csv
.
import pandas as pd
df = pd.read_csv("leverage.csv")
print(df.head())
Output:
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq datacqtr \
0 1300 20150331 2015 1 INDL C D STD USD 2015Q1
1 1300 20150630 2015 2 INDL C D STD USD 2015Q2
2 1300 20150930 2015 3 INDL C D STD USD 2015Q3
3 1300 20151231 2015 4 INDL C D STD USD 2015Q4
4 1300 20160331 2016 1 INDL C D STD USD 2016Q1
datafqtr atq cshoq dlttq lctq costat prccq
0 2015Q1 45357.0 781.707 5661.0 15432.0 A 104.31
1 2015Q2 46412.0 781.762 5562.0 15574.0 A 101.97
2 2015Q3 46625.0 770.691 5599.0 16367.0 A 94.69
3 2015Q4 49316.0 770.400 5554.0 18371.0 A 103.57
4 2016Q1 50365.0 762.115 9700.0 15659.0 A 112.05
With this dataframe, we can now select (or read) specific rows. For example, if we want only rows where the fyearq
is above 2017
, we can use an in-place assignment.
df = df[df["fyearq"] > 2017]
print(df.head())
Output:
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq \
59 1690 20171231 2018 1 INDL C D STD USD
107 2111 20171231 2018 1 INDL C D STD USD
179 2663 20171031 2018 1 INDL C D STD USD
335 3980 20171231 2018 1 INDL C D STD USD
670 6547 20171231 2018 1 INDL C D STD USD
datacqtr datafqtr atq cshoq dlttq lctq costat prccq
59 2017Q4 2018Q1 406794.0 5081.651 103922.0 115788.0 A 169.23
107 2017Q4 2018Q1 55363.0 266.242 22095.0 4895.0 A 214.06
179 2017Q3 2018Q1 7746.0 301.000 2269.0 2583.0 A 47.37
335 2017Q4 2018Q1 97734.0 1500.000 20082.0 19875.0 A 107.51
670 2017Q4 2018Q1 6701.1 48.340 2030.0 1211.3 A 75.15
Use the query
Function to Read Specific Rows From CSV in Pandas
Instead of using an in-place assignment, we can use the query
function, which allows us to pass a query string with a Boolean expression and whichever rows fulfill the expression is selected.
We can achieve the same effect as in the last section by passing the string "fyearq > 2017"
to the query
function.
df = df.query("fyearq > 2017")
print(df)
Output:
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq \
59 1690 20171231 2018 1 INDL C D STD USD
107 2111 20171231 2018 1 INDL C D STD USD
179 2663 20171031 2018 1 INDL C D STD USD
335 3980 20171231 2018 1 INDL C D STD USD
670 6547 20171231 2018 1 INDL C D STD USD
...
Use chunksize
to Read Specific Rows From CSV in Pandas
However, if we need to work with the CSV file in batches (due to choice, constraints, or memory), we can use the chunksize
parameter to read only some rows. Therefore, within the read_csv
function, we specify the chunksize
parameter, and the result with be an iterator object that we can loop through to access the chunks of the dataset.
Here, we specified that the chunksize
be 60, so only 60 rows are loaded.
chunksize = 60
filename = "leverage.csv"
with pd.read_csv(filename, chunksize=chunksize) as reader:
for chunk in reader:
print(chunk)
print("--------------------------------")
Output:
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq \
0 1300 20150331 2015 1 INDL C D STD USD
1 1300 20150630 2015 2 INDL C D STD USD
2 1300 20150930 2015 3 INDL C D STD USD
3 1300 20151231 2015 4 INDL C D STD USD
...
57 1690 20170630 2017 3 INDL C D STD USD
58 1690 20170930 2017 4 INDL C D STD USD
59 1690 20171231 2018 1 INDL C D STD USD
--------------------------------
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq \
60 1837 20150331 2015 1 INDL C D STD USD
61 1837 20150630 2015 2 INDL C D STD USD
62 1837 20150930 2015 3 INDL C D STD USD
...
We used the ---
to indicate that only 60 rows are loaded at a time.
Use loc
to Read Specific Rows From CSV in Pandas
Just like the query
function or the in-place assignment, we can make use of the loc
operator to pass a Boolean expression that will define the specific rows that are read.
df = df.loc[df["fyearq"] > 2017]
print(df)
Output:
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq \
59 1690 20171231 2018 1 INDL C D STD USD
107 2111 20171231 2018 1 INDL C D STD USD
179 2663 20171031 2018 1 INDL C D STD USD
335 3980 20171231 2018 1 INDL C D STD USD
670 6547 20171231 2018 1 INDL C D STD USD
...
Use iloc
to Read Specific Rows From CSV in Pandas
If we need a set of rows and know their indexes, we can use the iloc
operator and pass multiple indexes within a []
. Then, all the rows with the indexes are read.
Here we read only the rows with the indexes 12
, 13
, and 45
.
df = df.iloc[[12, 13, 45]]
print(df)
Output:
gvkey datadate fyearq fqtr indfmt consol popsrc datafmt curcdq \
12 1440 20150331 2015 1 INDL C D STD USD
13 1440 20150630 2015 2 INDL C D STD USD
45 1487 20170630 2017 2 INDL C D STD USD
Olorunfemi is a lover of technology and computers. In addition, I write technology and coding content for developers and hobbyists. When not working, I learn to design, among other things.
LinkedIn