Pandas DataFrame DataFrame.dropna() Function
-
Syntax of
pandas.DataFrame.dropna()
-
Example Codes:
DataFrame.dropna()
to Drop Row -
Example Codes:
DataFrame.dropna()
to Drop Column -
Example Codes:
DataFrame.dropna()
Withhow=all
-
Example Codes:
DataFrame.dropna()
With a Specified Subset or Thresh -
Example Codes:
DataFrame.dropna()
Withinplace=True
pandas.DataFrame.dropna()
function removes null values (missing values) from the DataFrame
by dropping the rows or columns containing the null values.
NaN
(not a number) and NaT
(Not a Time
) represent the null values. DataFrame.dropna()
detects these values and filters the DataFrame
accordingly.
Syntax of pandas.DataFrame.dropna()
DataFrame.dropna(axis, how, thresh, subset, inplace)
Parameters
axis |
It determines the axis to be either row or column. If it is 0 or 'index' , then it drops the rows containing missing values. If it is 1 or 'columns' , then it drops the columns containing the missing values. By default, its value is 0. |
how |
This parameter determines how the function drops rows or columns. It only accepts two strings , either any or all . By default, it’s set to any . any drops the row or column if there is any null value in it. all drops the row or column if all values are missing in it. |
thresh |
It is an integer that specifies the least number of non-missing values that prevent rows or columns from dropping. |
subset |
It is an array that has the names of rows or columns to specify the dropping procedure. |
inplace |
It is a Boolean value that changes the caller DataFrame if set to True . By default, its value is False . |
Return
It returns a filtered DataFrame
with dropped rows or columns according to the passed parameters.
Example Codes: DataFrame.dropna()
to Drop Row
By default, the axis is 0 i.e rows, so all the outputs have rows dropped.
import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: None, 2: 80,3: None, 4: 95},
'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
'Obtained Marks': {0: None, 1: 75, 2: 82, 3: 64, 4: None}})
print(dataframe)
The example DataFrame
is as follows.
Attendance Name Obtained Marks
0 60.0 Olivia NaN
1 NaN John 75.0
2 80.0 Laura 82.0
3 NaN Ben 64.0
4 95.0 Kevin NaN
All the parameters of this function are optional. If we pass no parameter, then the function drops all the rows containing a single null value.
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna()
print(dataframe1)
Output:
Attendance Name Obtained Marks
2 80.0 Laura 82.0
It has dropped all the rows that contained a single missing value.
Example Codes: DataFrame.dropna()
to Drop Column
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(axis=1)
print(dataframe1)
Output:
Name
0 Olivia
1 John
2 Laura
3 Ben
4 Kevin
It has dropped all the columns that contained a single missing value because we set axis=1
in the DataFrame.dropna()
method.
Example Codes: DataFrame.dropna()
With how=all
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)
Output:
Attendance Name Obtained Marks
0 60.0 Olivia NaN
1 NaN John 75.0
2 80.0 Laura 82.0
3 NaN Ben 64.0
4 95.0 Kevin NaN
The rows containing the missing values are not dropped because the how
parameter has value set to all
which means that all the values of the row should be null.
If all the values are missing in the specified axis, then DataFrame.dropna()
method drops that axis even when the how
is set to be all
.
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: None, 2: None, 3: None, 4: None},
}
)
print(dataframe)
print("--------")
dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)
Output:
Attendance Name Obtained Marks
0 60.0 Olivia None
1 NaN John None
2 80.0 Laura None
3 NaN Ben None
4 95.0 Kevin None
--------
Attendance Name
0 60.0 Olivia
1 NaN John
2 80.0 Laura
3 NaN Ben
4 95.0 Kevin
Example Codes: DataFrame.dropna()
With a Specified Subset or Thresh
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(thresh=3)
print(dataframe1)
Output:
Attendance Name Obtained Marks
2 80.0 Laura 82.0
The value of thresh
is 3 which means that to prevent dropping, at least 3 non-empty values are required.
We could also specify the subset
.
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(subset=["Attendance", "Name"])
print(dataframe1)
Output:
Attendance Name Obtained Marks
0 60.0 Olivia NaN
2 80.0 Laura 82.0
4 95.0 Kevin NaN
It drops rows with missing values on the basis of Attendance
and Name
column. It doesn’t drop rows if only the values in other columns, Obtained Marks
here, have missing values.
Example Codes: DataFrame.dropna()
With inplace=True
DataFrame.dropna()
changes the caller DataFrame
in-place if inplace
is set to True
.
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(inplace=True)
print(dataframe1)
Output:
None
The parameter has modified the caller DataFrame
in-place and returned None
.