How to Check if NaN Exisits in Pandas DataFrame
NaN
stands for Not a Number
that represents missing values in Pandas. To detect NaN values in Python Pandas we can use isnull()
and isna()
methods for DataFrame objects.
pandas.DataFrame.isnull()
Method
We can check for NaN
values in DataFrame
using pandas.DataFrame.isnull()
method. The method returns DataFrame
of bool values whose elements are True if the corresponding elements in DataFrame to be checked have NaN
value, and the elements are False otherwise.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"Student": ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry"],
"Height": [1.63, 1.5, np.nan, np.nan, 1.4],
"Weight": [np.nan, 56, 73, np.nan, 44],
}
)
df_check = df.isnull()
print(df_check)
Output:
Student Height Weight
0 False False True
1 False False False
2 False True False
3 False True True
4 False False False
Here, the False
values in output represent the entries in DataFrame df
are not NaN
and True
values represent NaN
entries in the DataFrame df
.
If we want to know if there is any NaN
value in the DataFrame, we can use the isnull().values.any()
method that returns True
if there is any NaN
value in the DataFrame and returns False
if there is not even a single NaN
entry in the DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"Student": ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry"],
"Height": [1.63, 1.5, np.nan, np.nan, 1.4],
"Weight": [np.nan, 56, 73, np.nan, 44],
}
)
check_for_nan = df.isnull().values.any()
print(check_for_nan)
Output:
True
df.isnull().values
returns the NumPy representation of the dataframe. numpy.any()
returns True
if any of the elements is evaluated to be True
.
Therefore, df.isnull().values.any()
is True
if any NaN
exists in the dataframe.
df.isnull().any().any()
to Check if Any NaN
Exists
df.any()
returns whether any of the elements is True. It returns a pd.Series
when df
is a dataframe, and a boolean value when df
is pd.Series
.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"Student": ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry"],
"Height": [1.63, 1.5, np.nan, np.nan, 1.4],
"Weight": [np.nan, 56, 73, np.nan, 44],
}
)
check_for_nan = df.isnull().any().any()
print(check_for_nan)
Output:
True
Two cascaded any()
methods after isnull()
in the above example returns True
if any element is NaN
in the dataframe.
isnull().sum().sum()
to Check if Any NaN
Exists
If we wish to count total number of NaN
values in the particular DataFrame
, df.isnull().sum().sum()
method is the right solution. The method returns total number of NaN
values in the entire DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"Student": ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry"],
"Height": [1.63, 1.5, np.nan, np.nan, 1.4],
"Weight": [np.nan, 56, 73, np.nan, 44],
}
)
total_nan_values = df.isnull().sum().sum()
print(total_nan_values)
Output:
4
If the result is greater than 0, it means that NaN
exists in the dataframe.
pandas.DataFrame.isna()
Method
pandas.DataFrame.isna()
method is similar to
pandas.DataFrame.isnull()
. There isn’t any difference between the working of two methods. They differ in name only.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"Student": ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry"],
"Height": [1.63, 1.5, np.nan, np.nan, 1.4],
"Weight": [np.nan, 56, 73, np.nan, 44],
}
)
df_check = df.isna()
check_for_any_nan = df.isna().values.any()
# Or
check_for_any_nan = df.isna().any().any()
total_nan_values = df.isna().sum().sum()
print(df_check)
print("NaN Presence:" + str(check_for_any_nan))
print("Total Number of NaN values:" + str(total_nan_values))
Output:
Student Height Weight
0 False False True
1 False False False
2 False True False
3 False True True
4 False False False
NaN Presence:True
Total Number of NaN values:4
Here the method df.isna()
returns DataFrame whose entries contain boolean values denoting presence of NaN
values in df
. Similarly, df.isna().values.any()
, df.isna().any().any()
and df.isna().sum().sum()
return presence of NaN
value in the entire df
and number of NaN
entries in the df
.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn