How to Drop Rows With NaN in Pandas
-
Pandas Drop Rows With NaN Using the
DataFrame.notna()
Method -
Pandas Drop Rows Only With
NaN
Values for All Columns UsingDataFrame.dropna()
Method -
Pandas Drop Rows Only With
NaN
Values for a Particular Column UsingDataFrame.dropna()
Method -
Pandas Drop Rows With
NaN
Values for Any Column UsingDataFrame.dropna()
Method
This tutorial explains how we can drop all the rows with NaN
values using DataFrame.notna()
and DataFrame.dropna()
methods.
We will use the DataFrame in the example code below.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print(data)
Output:
Name Age Income($) Expense($)
0 Alice 19.0 4000.0 3000.0
1 Steven NaN 5000.0 2000.0
2 Neesham 18.0 NaN 2500.0
3 Chris 21.0 3500.0 25000.0
4 Alice NaN NaN NaN
Pandas Drop Rows With NaN Using the DataFrame.notna()
Method
The DataFrame.notna()
method returns a boolean object with the same number of rows and columns as the caller DataFrame. If an element is not NaN
, it gets mapped to the True
value in the boolean object, and if an element is a NaN
, it gets mapped to the False
value.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data[data["Income($)"].notna()]
print("DataFrame after removing rows with NaN value in Income Field:")
print(data)
Output:
Initial DataFrame:
Name Age Income($) Expense($)
0 Alice 19.0 4000.0 3000.0
1 Steven NaN 5000.0 2000.0
2 Neesham 18.0 NaN 2500.0
3 Chris 21.0 3500.0 25000.0
4 Alice NaN NaN NaN
DataFrame after removing rows with NaN value in Income Field:
Name Age Income($) Expense($)
0 Alice 19.0 4000.0 3000.0
1 Steven NaN 5000.0 2000.0
3 Chris 21.0 3500.0 25000.0
Here, we apply the notna()
method to the column Income($)
, which returns a series object with True
or False
values depending upon the column’s values. When we pass the boolean object as an index to the original DataFrame, we only get rows without NaN
values for the Income($)
column.
Pandas Drop Rows Only With NaN
Values for All Columns Using DataFrame.dropna()
Method
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Id": [621, 645, 210, 345, None],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data.dropna(how="all")
print("DataFrame after removing rows with NaN value in All Columns:")
print(data)
Output:
Initial DataFrame:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
4 NaN NaN NaN NaN
DataFrame after removing rows with NaN value in All Columns:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
It removes only the rows with NaN
values for all fields in the DataFrame. We set how='all'
in the dropna()
method to let the method drop row only if all column values for the row is NaN
.
Pandas Drop Rows Only With NaN
Values for a Particular Column Using DataFrame.dropna()
Method
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Id": [621, 645, 210, 345, None],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data.dropna(subset=["Id"])
print("DataFrame after removing rows with NaN value in Id Column:")
print(data)
Output:
Initial DataFrame:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
4 NaN NaN NaN NaN
DataFrame after removing rows with NaN value in Id Column:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
It drops all the columns in the DataFrame, which have NaN
value only in the Id
Column.
Pandas Drop Rows With NaN
Values for Any Column Using DataFrame.dropna()
Method
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Id": [621, 645, 210, 345, None],
"Age": [19, None, 18, 21, None],
"Income($)": [4000, 5000, None, 3500, None],
"Expense($)": [3000, 2000, 2500, 25000, None],
}
)
print("Initial DataFrame:")
print(data)
print("")
data = data.dropna()
print("DataFrame after removing rows with NaN value in any column:")
print(data)
Output:
Initial DataFrame:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
1 645.0 NaN 5000.0 2000.0
2 210.0 18.0 NaN 2500.0
3 345.0 21.0 3500.0 25000.0
4 NaN NaN NaN NaN
DataFrame after removing rows with NaN value in any column:
Id Age Income($) Expense($)
0 621.0 19.0 4000.0 3000.0
3 345.0 21.0 3500.0 25000.0
By default, the dropna()
method will remove all the row which have at least one NaN
value.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedInRelated Article - Pandas DataFrame Row
- How to Get the Row Count of a Pandas DataFrame
- How to Randomly Shuffle DataFrame Rows in Pandas
- How to Filter Dataframe Rows Based on Column Values in Pandas
- How to Iterate Through Rows of a DataFrame in Pandas
- How to Get Index of All Rows Whose Particular Column Satisfies Given Condition in Pandas
- How to Find Duplicate Rows in a DataFrame Using Pandas