How to Filter Pandas DataFrame Rows by Regex

Salman Mehmood Feb 02, 2024

Pandas Pandas Filter

Filter Pandas DataFrame Rows in Python
Filter Pandas DataFrame Rows by Regex
Filter Pandas DataFrame Rows By String

How to Filter Pandas DataFrame Rows by Regex

In this article, we will learn how to filter our Pandas dataframe with the help of regex expressions and string functions. We will also learn how to apply the filter function on a Pandas dataframe in Python.

Filter Pandas DataFrame Rows in Python

In our code, we first imported pandas. Then, we imported the pokemon_data from a CSV file; the first 50 original Pokemon data are here.

We see various information such as the Name, the Type 1, Type 2, and the different attributes they have, like special attacks, speed, and so on.

import pandas as pd

POK_Data = pd.read_csv("pokemon_data.csv")
POK_Data

Filter Pandas Dataframe - Output 1

There are various options to filter the dataframe, but we want to filter the rows or keep all the Pokemon with an instance. We could do this for an attack above 80 using the following code.

PK_Filtered_Data = POK_Data[POK_Data["Attack"] > 80]
PK_Filtered_Data

To filter, we will use brackets. We want to filter based on the column; in this case, our column would be Attack.

By doing this, we will have all of the data greater than 80. If we execute this, we can see that we now have a different dataframe.

Looking at the Attack column, we will see that all are now above 80. We will store the filtered dataframe in another variable called PK_Filtered_Data.

Filter Pandas Dataframe - Output 2

Filter Pandas DataFrame Rows by Regex

We can also do this for other columns or combine this for several columns. Suppose we filter the data greater than 80 under the Attack column and, simultaneously, filter the Sp. Atk to be above 100.

We do not need to use the brackets. Another option is to filter the dataframe using the POK_Data[POK_Data.Attack>80].

We can also filter the dataframe using the filter() function and apply the regex.

Regular expressions can filter the columns of a dataframe using the filter() function. We specify the regular expressions using the regex parameter in this function.

We pass a value to regex to keep all the columns that end with the letter e, and the dollar symbol means that we filter the columns whose names end with e. Since we are on a column level, we also need to specify the axis is equal to 1.

POK_Data.filter(regex="e$", axis=1)

It will return the complete dataframe but only the columns which end with e.

Filter Pandas Dataframe - Output 3

In this case, we filter the rows using a Name that starts with M while applying a regular expression.

POK_Data[POK_Data["Name"].str.contains("^M")]

Filter Pandas Dataframe - Output 4

Filter Pandas DataFrame Rows By String

We can also filter the rows of the dataframe using regular expressions. The above code does not work exactly that way because we would need to specify the specific column. If we want to filter the dataframe, we will use the contains() function.

In this case, the dataframe filter is applied on the Name, and inside the contains() function, we pass ur as a string. We are using the str() function because contains() is actually a function based on string values here and POK_Data['Name'] itself is a Pandas series.

POK_Data[POK_Data["Name"].str.contains("ur")]

Now, if we execute this, we see that only a few Pokemon contain ur.

Filter Pandas Dataframe - Output 5

Full Python Code:

# In[1]:

import pandas as pd

POK_Data = pd.read_csv("pokemon_data.csv")
POK_Data

# In[ ]:

# In[2]:

# PK_Filtered_Data= POK_Data[POK_Data['Attack'] >80]
# PK_Filtered_Data

# In[3]:

# PK_Filtered_Data= POK_Data[POK_Data.Attack>80]
# PK_Filtered_Data

# In[4]:

# POK_Data.filter(regex='e$',axis=1)

# In[5]:

POK_Data[POK_Data["Name"].str.contains("^M")]

# In[6]:

POK_Data[POK_Data["Name"].str.contains("ur")]

Filter Pandas DataFrame Rows in Python

Filter Pandas DataFrame Rows by Regex

Filter Pandas DataFrame Rows By String

Related Article - Pandas Filter