How to Filter Pandas DataFrame Rows by Regex
- Filter Pandas DataFrame Rows in Python
- Filter Pandas DataFrame Rows by Regex
- Filter Pandas DataFrame Rows By String
In this article, we will learn how to filter our Pandas dataframe with the help of regex expressions and string functions. We will also learn how to apply the filter
function on a Pandas dataframe in Python.
Filter Pandas DataFrame Rows in Python
In our code, we first imported pandas
. Then, we imported the pokemon_data
from a CSV file; the first 50 original Pokemon data are here.
We see various information such as the Name
, the Type 1
, Type 2
, and the different attributes they have, like special attacks, speed, and so on.
import pandas as pd
POK_Data = pd.read_csv("pokemon_data.csv")
POK_Data
There are various options to filter the dataframe, but we want to filter the rows or keep all the Pokemon with an instance. We could do this for an attack above 80 using the following code.
PK_Filtered_Data = POK_Data[POK_Data["Attack"] > 80]
PK_Filtered_Data
To filter, we will use brackets. We want to filter based on the column; in this case, our column would be Attack
.
By doing this, we will have all of the data greater than 80. If we execute this, we can see that we now have a different dataframe.
Looking at the Attack
column, we will see that all are now above 80. We will store the filtered dataframe in another variable called PK_Filtered_Data
.
Filter Pandas DataFrame Rows by Regex
We can also do this for other columns or combine this for several columns. Suppose we filter the data greater than 80 under the Attack
column and, simultaneously, filter the Sp. Atk
to be above 100.
We do not need to use the brackets. Another option is to filter the dataframe using the POK_Data[POK_Data.Attack>80]
.
We can also filter the dataframe using the filter()
function and apply the regex.
Regular expressions can filter the columns of a dataframe using the filter()
function. We specify the regular expressions using the regex
parameter in this function.
We pass a value to regex
to keep all the columns that end with the letter e
, and the dollar symbol means that we filter the columns whose names end with e
. Since we are on a column level, we also need to specify the axis is equal to 1.
POK_Data.filter(regex="e$", axis=1)
It will return the complete dataframe but only the columns which end with e
.
In this case, we filter the rows using a Name
that starts with M
while applying a regular expression.
POK_Data[POK_Data["Name"].str.contains("^M")]
Filter Pandas DataFrame Rows By String
We can also filter the rows of the dataframe using regular expressions. The above code does not work exactly that way because we would need to specify the specific column. If we want to filter the dataframe, we will use the contains()
function.
In this case, the dataframe filter is applied on the Name
, and inside the contains()
function, we pass ur
as a string. We are using the str()
function because contains()
is actually a function based on string values here and POK_Data['Name']
itself is a Pandas series.
POK_Data[POK_Data["Name"].str.contains("ur")]
Now, if we execute this, we see that only a few Pokemon contain ur
.
Full Python Code:
# In[1]:
import pandas as pd
POK_Data = pd.read_csv("pokemon_data.csv")
POK_Data
# In[ ]:
# In[2]:
# PK_Filtered_Data= POK_Data[POK_Data['Attack'] >80]
# PK_Filtered_Data
# In[3]:
# PK_Filtered_Data= POK_Data[POK_Data.Attack>80]
# PK_Filtered_Data
# In[4]:
# POK_Data.filter(regex='e$',axis=1)
# In[5]:
POK_Data[POK_Data["Name"].str.contains("^M")]
# In[6]:
POK_Data[POK_Data["Name"].str.contains("ur")]
Read more solutions here.
Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.
LinkedIn