How to Replace NA Values in Multiple Columns using Pandas fillna()
This article explains how to use the fillna()
function to replace the NaN
values with numeric ones. We will also learn how to replace the NaN
values from the Pandas dataframe with strings.
Replace Multiple Columns of NaN Values With Any Data Type Using fillna()
in Pandas
The Pandas fillna()
function can replace the NaN
values with a specified value. The function can propagate this value within a column or row or replace NaN
values with different values based on the column.
We will make a new script with the Pandas library imported as pd
followed by the NumPy library imported as np
. Next, we will create a new dataframe called City_Temp
and set this equal to the Pandas dataframe.
Let’s say we are tracking the temperatures for 2 cities, Tulsa and Dallas, for one week, but the data is incomplete. We will create two rows of data containing temperature values for each city for 5 days.
The first row will contain the key of Tulsa
with some numeric values and np.nan
within a list, and the second row with the key of Dallas
also contains a few numeric values and np.nan
. The np.nan
value creates a null value in the dataframe.
Next, we will add the index
argument with a list containing the abbreviated names from Monday through Friday. When the result is printed to the console, we would see a dataframe with temperature values for two cities over 5 separate days.
import numpy as np
import pandas as pd
City_Temp = pd.DataFrame(
{
"Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
"Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
},
index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)
print(City_Temp)
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 NaN
Thu NaN NaN
Fri NaN 92.1
To fill the empty values within the City_Temp
dataframe, we can use the fillna()
function from Pandas within a print
statement. In the fillna()
function, we will add a single value of 80.0, and when the result is printed to the console, we will see all NaN
values from the dataframe have been replaced with the number 80.0.
print(City_Temp.fillna(80.0))
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 80.0
Thu 80.0 80.0
Fri 80.0 92.1
As an alternative to providing a specific value, we can use the method
argument with the value ffill
to tell the function to fill the NaN
values with the value above it in the same column. Here, the NaN
values for Tulsa
are replaced by 75.1, and the values for Dallas
are replaced by 93.1.
print(City_Temp.fillna(method="ffill"))
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 93.1
Thu 75.1 93.1
Fri 75.1 92.1
Similarly, we can perform a backfill when the value of the method
argument is set to bfill
. The result shows the NaN
values in the Dallas
column are filled with the value 92.1, but the values in the Tulsa
column are not replaced.
This is because there is no valid value below the row Friday
that can be used to replace the NaN
values.
print(City_Temp.fillna(method="bfill"))
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 92.1
Thu NaN 92.1
Fri NaN 92.1
If we want to fill the first NaN
element occurrence in the dataframe for each city, we can use the ffil
value for the method
type and add the limit
argument with a value of 1. The result shows that only the first occurrence of the NaN
value is replaced with a forward fill.
print(City_Temp.fillna(method="ffill", limit=1))
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 93.1
Thu 75.1 NaN
Fri NaN 92.1
Fill the NaN
Elements With the Average
Lastly, we can specify different values for the NaN
elements in different columns.
Let’s say we want to fill the NaN
elements with the average temperature value for that city. To do this, we will make a dictionary using curly braces within the fillna()
function.
In the dictionary, we will add the key Tulsa
with the average temperature value rounded to 2 decimal places using the np.round()
function with the np.mean()
function within it.
Within the mean()
function, we will add the City_Temp
dataframe as the first argument and specify the Tulsa
column within square brackets followed by the integer 2 to round the result to 2 decimal places. This will be repeated for the Dallas
key and value pair.
We can reassign the dataframe so that it will be changed permanently.
City_Temp = City_Temp.fillna(
{
"Tulsa": np.round(np.mean(City_Temp["Tulsa"]), 2),
"Dallas": np.round(np.mean(City_Temp["Dallas"]), 2),
}
)
print(City_Temp)
When the result is printed to the console, we see that the average value for Tulsa, 77.87, has replaced the NaN
elements for Thursday and Friday, and the average value for Dallas, 89.47, has replaced the NaN
elements for Wednesday and Thursday.
Tulsa Dallas
Mon 78.50 83.20
Tue 80.00 93.10
Wed 75.10 89.47
Thu 77.87 89.47
Fri 77.87 92.10
Replace the NaN
Values With a String
In your case, the data type could be anything, but in that case, you can also replace the NaN
value or null value using the following code.
City_Temp = pd.DataFrame(
{
"Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
"Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
},
index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)
print(City_Temp)
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 NaN
Thu NaN NaN
Fri NaN 92.1
Consider the following code to replace the NaN
values with a string.
df = City_Temp.fillna({"Tulsa": ".", "Dallas": "."}).fillna(0)
print(df)
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 .
Thu . .
Fri . 92.1
We can also use the following code snippet to replace the NaN
values with a string, which will work the same as the code we mentioned above.
print(City_Temp.fillna("."))
Output:
Tulsa Dallas
Mon 78.5 83.2
Tue 80.0 93.1
Wed 75.1 .
Thu . .
Fri . 92.1
Full Code:
# In[1]:
import numpy as np
import pandas as pd
City_Temp = pd.DataFrame(
{
"Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
"Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
},
index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)
print(City_Temp)
# In[2]:
print(City_Temp.fillna(80.0))
# In[3]:
print(City_Temp.fillna(method="ffill"))
# In[4]:
print(City_Temp.fillna(method="bfill"))
# In[5]:
print(City_Temp.fillna(method="ffill", limit=1))
# In[6]:
City_Temp = City_Temp.fillna(
{
"Tulsa": np.round(np.mean(City_Temp["Tulsa"]), 2),
"Dallas": np.round(np.mean(City_Temp["Dallas"]), 2),
}
)
print(City_Temp)
# ##### Replace nan with string
# In[7]:
City_Temp = pd.DataFrame(
{
"Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
"Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
},
index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)
print(City_Temp)
# In[8]:
df = City_Temp.fillna({"Tulsa": ".", "Dallas": "."}).fillna(0)
print(df)
# In[9]:
print(City_Temp.fillna("."))
You can read here for more solutions.
Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.
LinkedIn