How to Filter Data in a Pandas DataFrame
- Filter Data in a Pandas DataFrame Based on Single Condition
- Filter Data in a Pandas DataFrame Based on Multiple Conditions
- Filter Data in a Pandas DataFrame Based on Multiple Columns Value
This tutorial will demonstrate filtering data in a Pandas dataframe based on single or multiple conditions.
Boolean indexing means choosing subsets of data or filtering data based on some conditions. We deal with the actual values of the data in the dataframe rather than their row or column labels or integer positions.
A boolean vector is used to filter data in boolean indexing. Parenthesis can be used to group several conditions involving the operators, such as |
(OR
), &
(AND
), ==
(EQUAL
), and ~
(NOT
).
Filter Data in a Pandas DataFrame Based on Single Condition
We can filter the data using a single column’s value by applying a single condition.
In the following code, we have students’ data, and we have filtered the records by applying a single condition to the Department
value. Only those students’ records will be displayed whose department is CS
.
Example code:
# Python 3.x
import pandas as pd
df = pd.read_csv("Student.csv")
display(df)
df_filtered = df[(df["Department"] == "CS")]
display(df_filtered)
Output:
Filter Data in a Pandas DataFrame Based on Multiple Conditions
We can also apply multiple conditions to select data from a single column in some cases.
If we want to display only those students’ records whose marks are greater than 60 but less than 90, we will use multiple conditions joined by the &
operator.
An important thing to remember is to use operators &
, |
, ~
instead of AND
, OR
, NOT
, respectively.
Example code:
# Python 3.x
import pandas as pd
df = pd.read_csv("Student.csv")
display(df)
df_filtered = df[(df["Marks"] > 60) & (df["Marks"] < 90)]
display(df_filtered)
Output:
Filter Data in a Pandas DataFrame Based on Multiple Columns Value
We can also filter the data using conditions based on multiple columns value.
In the following code, we have filtered the records, and only those will be displayed whose department is EE
and marks above or equal to 80. We have used parenthesis to group multiple conditions.
Whenever we filter data from multiple columns, we always apply multiple conditions.
Example code:
# Python 3.x
import pandas as pd
df = pd.read_csv("Student.csv")
display(df)
df_filtered = df[(df["Department"] == "EE") & (df["Marks"] >= 80)]
display(df_filtered)
Output:
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn