How to Filter Rows That Contain a Specific String in Pandas
- Install Prerequisite Libraries
- Create a Pandas DataFrame
-
Use
str.contains()
to Filter Rows That Contain a Specific String -
Use
str.contains()
to Filter Rows That Contain a String in a List
The Pandas library is a complete tool for handling text data in addition to numbers. You’ll want to exclude text input from many data analysis applications and machine learning exploration/pre-processing.
Dataframes in Python is a primary data structure present in the Pandas module. These data structures are used for storing and processing data in tabular forms.
One such process performed on data stored in tabular form is filtering the Dataframe by substring criteria so that relevant information can be extracted from it. This article will go through a step-by-step procedure to perform this same operation.
Install Prerequisite Libraries
To begin filtering the Pandas dataframe, we first need to install the Pandas library. We can quickly achieve this by running the following command in the terminal of choice:
pip install pandas
It is also essential to ensure we work with the correct Python version. In this article, we are using version 3.10.4.
We can check the currently installed Python version by running the following command in the terminal:
python --version
Create a Pandas DataFrame
To perform the dataframe filtering operation, we will need an example dataframe; hence, we will generate a dataframe for our article using the code below. It shows us the names of five students being graded for two subjects, Biology and Chemistry, out of 100.
Example Code:
import pandas as pd
data = {
"Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
"Biology": [68, 73, 87, 58, 78],
"Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data)
print(data_frame)
So, the code above is pretty straightforward. We begin by importing the Pandas library and then initialize the data
variable as a dictionary containing the information we want to insert in our resulting dataframe.
We then use the DataFrame()
method in the Pandas module to generate our dataframe by passing the data
dictionary into the abovementioned technique.
The following dataframe is generated when we run the code.
Output:
Use str.contains()
to Filter Rows That Contain a Specific String
Now that we’ve created our dataframe, we can move on to the filtering step. Let’s suppose we want to filter out the data for the student Suharwardy
; the result should be all information stored against Suharwardy
.
We can perform this operation using the str.contains()
method. In the snippet below, we have accessed the dataframe column Student_Name
and, using the str.contains()
method, accessed the information stored against the name Suharwardy
.
Example Code:
import pandas as pd
data = {
"Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
"Biology": [68, 73, 87, 58, 78],
"Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data=data)
df = data_frame[data_frame["Student_Name"].str.contains("Suharwardy")]
print(df)
Output:
An even more straightforward and intuitive way of performing this operation could be using the dot operator to access the Student_Name
column. We get the same results.
Example Code:
import pandas as pd
data = {
"Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
"Biology": [68, 73, 87, 58, 78],
"Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data=data)
df = data_frame[data_frame.Student_Name.str.contains("Suharwardy")]
print(df)
Output:
The str.contains()
method also has the regex
parameter, which you can use to get faster results by setting it as False
.
Example Code:
import pandas as pd
import regex as regex
data = {
"Student_Name": ["Anil", "Suharwardy", "Fatina", "John", "Karen"],
"Biology": [68, 73, 87, 58, 78],
"Chemistry": [78, 98, 89, 73, 87],
}
data_frame = pd.DataFrame(data=data)
df = data_frame[data_frame.Student_Name.str.contains("Suharwardy", regex=False)]
print(df)
Output:
This is how we can filter a Pandas dataframe using the str.contains()
method and specify the particulars of information we want to extract.
Use str.contains()
to Filter Rows That Contain a String in a List
The below code shows how to filter for dataframe rows that contain ID1
or ID2
in the ID column.
Example Code:
import pandas as pd
d1 = {
"ID": [
"ID1",
"ID1",
"ID2",
"ID2",
"ID3",
"ID3",
],
"Names": ["Harry", "Petter", "Daniel", "Ron", "Sofia", "Kelvin"],
"marks": [70, 80, 90, 70, 60, 90],
}
df = pd.DataFrame(d1)
print(df)
s = df[df["ID"].str.contains("ID1|ID2")]
print("use of str.contains() : ")
print(s)
Output:
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn