How to Replace All the NaN Values With Zeros in a Column of a Pandas DataFrame
When we are working with large data sets, sometimes there are NaN values in the dataset which you want to replace with some average value or with suitable value. For example, you have a grading list of students, and some students did not attempt the quiz so that the system has automatically entered NaN instead of 0.0. Listed below are the different ways to achieve this task.
We will use the same DataFrame in the next sections as follows,
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
print(df)
The following is the DataFrame with NaN in grade.
name percentage grade
0 Oliver 90 88.0
1 Harry 99 NaN
2 George 50 95.0
3 Noah 65 NaN
df.fillna() Method to Replace All NaN Values With Zeros
Let’s replace the NaN values with the help of df.fillna() method.
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
df = df.fillna(0)
print(df)
The following is the output with NaN replaced with zero.
name percentage grade
0 Oliver 90 88.0
1 Harry 99 0.0
2 George 50 95.0
3 Noah 65 0.0
df.fillna() method fills the NaN values with the given value. It doesn’t change the object data but returns a new DataFrame by default unless the inplace parameter is set to be True.
We could rewrite the above codes with the inplace parameter enabled to be True.
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)
df.replace() Method
This method works same as df.fillna() to replace NaN with 0. df.replace() can also be used to replace other number. Let’s take a look at the codes.
import pandas as pd
import numpy as np
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, np.nan, 95, np.nan],
}
df = pd.DataFrame(data)
nan_replaced = df.replace(np.nan, 0)
print(nan_replaced)
The following will be the output.
name percentage grade
0 Oliver 90 88.0
1 Harry 99 0.0
2 George 50 95.0
3 Noah 65 0.0