How to Filter Dataframe Rows Based on Column Values in Pandas
- Select Pandas Rows Based on Specific Column Value
- Select Pandas Rows Which Do Not Contain Specific Column Value
- Select Pandas Rows With Column Values Greater Than or Smaller Than Specific Value
- Select Pandas Rows Based on Multiple Column Values
- Select DataFrame Rows With Multiple Conditions
We can select rows of DataFrame
based on single or multiple column values. We can also get rows from DataFrame
satisfying or not satisfying one or more conditions. This can be accomplished using boolean indexing
, positional indexing
, label indexing
, and query()
method.
Select Pandas Rows Based on Specific Column Value
We can select pandas rows from a DataFrame that contains or does not contain the specific value for a column. It is widely used in filtering the DataFrame based on column value.
Select Pandas Rows Which Contain Specific Column Value
Filter Using Boolean Indexing
In Boolean indexing, we at first generate a mask which is just a series of boolean values representing whether the column contains the specific element or not.
df_mask = df["col_name"] == "specific_value"
We then apply this mask to our original DataFrame to filter the required values.
filtered_df = df[df_mask]
This returns the filtered DataFrame containing only rows that have the specific_value
for column col_name
.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
df_mask = df["Sales"] == 300
filtered_df = df[df_mask]
print(filtered_df)
Output:
Date Sales Price
1 April-11 300 1
4 April-14 300 3
5 April-16 300 2
This gives all the rows in the df
whose Sales
values is 300
.
Filter Using Positional Indexing
It is similar to boolean indexing but involves one extra step. In this method, we first create a boolean mask and then find positions at which boolean mask has True
value. Then we use pass all the positions with True
value in the mask to the iloc()
method so that all the desired rows are only selected.
import pandas as pd
import numpy as np
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
df_mask = df["Sales"] == 300
positions = np.flatnonzero(df_mask)
filtered_df = df.iloc[positions]
print(filtered_df)
Output:
Date Sales Price
1 April-11 300 1
4 April-14 300 3
5 April-16 300 2
This also gives all the rows in the df
whose Sales
values is 300
.
Pandas Chaining
We also can use Pandas Chaining to filter pandas dataframe filter by column value. In this method, we use pandas.DataFrame.eq()
method for the DataFrame column whose values are to be checked to compare element-wise equality in DataFrame.
import pandas as pd
import numpy as np
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
filtered_df = df[df.Sales.eq(300)]
print(filtered_df)
Output:
Date Sales Price
1 April-11 300 1
4 April-14 300 3
5 April-16 300 2
pandas.DataFrame.query()
We could use pandas.DataFrame.query()
to select rows by column value in Pandas.
import pandas as pd
import numpy as np
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
filtered_df = df.query("Sales == 300")
print(filtered_df)
Output:
Date Sales Price
1 April-11 300 1
4 April-14 300 3
5 April-16 300 2
If we wish to update the existing DataFrame, we can set inplace=True
in query method.
Select Pandas Rows Which Do Not Contain Specific Column Value
The method to select Pandas rows that don’t contain specific column value is similar to that in selecting Pandas rows with specific column value. The only thing we need to change is the condition that the column does not contain specific value by just replacing ==
with !=
when creating masks or queries.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
df_mask = df["Sales"] != 300
filtered_df = df[df_mask]
print(filtered_df)
Output:
Date Sales Price
0 April-10 200 3
2 April-12 400 2
3 April-13 200 4
This selects all the rows of df
whose Sales
values are not 300
.
Select Pandas Rows With Column Values Greater Than or Smaller Than Specific Value
To select Pandas rows with column values greater than or smaller than specific value, we use operators like >
, <=
, >=
while creating masks or queries.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
df_mask = df["Sales"] >= 300
filtered_df = df[df_mask]
print(filtered_df)
Output:
Date Sales Price
1 April-11 300 1
2 April-12 400 2
4 April-14 300 3
5 April-16 300 2
This results in DataFrame with values of Sales
greater than or equal to 300
.
Select Pandas Rows Based on Multiple Column Values
We have introduced methods of selecting rows based on specific values of column in DataFrame. In this section, we will discuss methods to select Pandas rows based on multiple column values.
Select Pandas Rows Which Contain Any One of Multiple Column Values
To select Pandas rows that contain any one of multiple column values, we use pandas.DataFrame.isin(values)
which returns DataFrame of booleans showing whether each element in the DataFrame is contained in values or not. The DataFrame of booleans thus obtained can be used to select rows.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
values = [200, 400]
filtered_df = df[df.Sales.isin(values)]
print(filtered_df)
Output:
Date Sales Price
0 April-10 200 3
2 April-12 400 2
3 April-13 200 4
It filters all the rows from DataFrame whose Sales
value is either 200
or 400
.
Select Pandas Rows Which Does Not Contain Any One of Multiple Specified Column Values
To select the rows of a DataFrame which does not contain any one of multiple specified column values, we will negate the DataFrame of booleans returned from pandas.DataFrame.isin(values)
by placing ~
sign at the front.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
values = [200, 400]
filtered_df = df[~df.Sales.isin(values)]
print(filtered_df)
Output:
Date Sales Price
1 April-11 300 1
4 April-14 300 3
5 April-16 300 2
It filters all the rows from DataFrame whose Sales
value is neither 200
nor 400
.
Select DataFrame Rows With Multiple Conditions
If we want to filter rows considering row values of multiple columns, we make multiple conditions and combine them with &
operators. Now, the row is only selected when it satisfies conditions for all the columns.
import pandas as pd
dates = ["April-10", "April-11", "April-12", "April-13", "April-14", "April-16"]
sales = [200, 300, 400, 200, 300, 300]
prices = [3, 1, 2, 4, 3, 2]
df = pd.DataFrame({"Date": dates, "Sales": sales, "Price": prices})
values_1 = [200, 400]
values_2 = [2, 3]
filtered_df = df[df.Sales.isin(values_1) & ~df.Price.isin(values_2)]
print(filtered_df)
Output:
Date Sales Price
3 April-13 200 4
It filters all the rows from DataFrame whose Sales
value is either 200
or 400
and Price
value is neither 2 nor 3. The row in the output only satisfies both conditions in entire DataFrame.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedInRelated Article - Pandas DataFrame
- How to Get Pandas DataFrame Column Headers as a List
- How to Delete Pandas DataFrame Column
- How to Convert Pandas Column to Datetime
- How to Convert a Float to an Integer in Pandas DataFrame
- How to Sort Pandas DataFrame by One Column's Values
- How to Get the Aggregate of Pandas Group-By and Sum