How to Replace All the NaN Values With Zeros in a Column of a Pandas DataFrame
When we are working with large data sets, sometimes there are NaN
values in the dataset which you want to replace with some average value or with suitable value. For example, you have a grading list of students, and some students did not attempt the quiz so that the system has automatically entered NaN
instead of 0.0. Listed below are the different ways to achieve this task.
We will use the same DataFrame
in the next sections as follows,
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
print(df)
The following is the DataFrame with NaN
in grade.
name percentage grade
0 Oliver 90 88.0
1 Harry 99 NaN
2 George 50 95.0
3 Noah 65 NaN
df.fillna()
Method to Replace All NaN Values With Zeros
Let’s replace the NaN
values with the help of df.fillna()
method.
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
df = df.fillna(0)
print(df)
The following is the output with NaN
replaced with zero.
name percentage grade
0 Oliver 90 88.0
1 Harry 99 0.0
2 George 50 95.0
3 Noah 65 0.0
df.fillna()
method fills the NaN
values with the given value. It doesn’t change the object data but returns a new DataFrame by default unless the inplace
parameter is set to be True
.
We could rewrite the above codes with the inplace
parameter enabled to be True
.
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)
df.replace()
Method
This method works same as df.fillna()
to replace NaN
with 0. df.replace()
can also be used to replace other number. Let’s take a look at the codes.
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
nan_replaced = df.replace(np.nan, 0)
print(nan_replaced)
The following will be the output.
name percentage grade
0 Oliver 90 88.0
1 Harry 99 0.0
2 George 50 95.0
3 Noah 65 0.0