How to Replace NA Values in Multiple Columns using Pandas fillna()

Salman Mehmood Feb 02, 2024
How to Replace NA Values in Multiple Columns using Pandas fillna()

This article explains how to use the fillna() function to replace the NaN values with numeric ones. We will also learn how to replace the NaN values from the Pandas dataframe with strings.

Replace Multiple Columns of NaN Values With Any Data Type Using fillna() in Pandas

The Pandas fillna() function can replace the NaN values with a specified value. The function can propagate this value within a column or row or replace NaN values with different values based on the column.

We will make a new script with the Pandas library imported as pd followed by the NumPy library imported as np. Next, we will create a new dataframe called City_Temp and set this equal to the Pandas dataframe.

Let’s say we are tracking the temperatures for 2 cities, Tulsa and Dallas, for one week, but the data is incomplete. We will create two rows of data containing temperature values for each city for 5 days.

The first row will contain the key of Tulsa with some numeric values and np.nan within a list, and the second row with the key of Dallas also contains a few numeric values and np.nan. The np.nan value creates a null value in the dataframe.

Next, we will add the index argument with a list containing the abbreviated names from Monday through Friday. When the result is printed to the console, we would see a dataframe with temperature values for two cities over 5 separate days.

import numpy as np
import pandas as pd

City_Temp = pd.DataFrame(
    {
        "Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
        "Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
    },
    index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)

print(City_Temp)

Output:

     Tulsa  Dallas
Mon   78.5    83.2
Tue   80.0    93.1
Wed   75.1     NaN
Thu    NaN     NaN
Fri    NaN    92.1

To fill the empty values within the City_Temp dataframe, we can use the fillna() function from Pandas within a print statement. In the fillna() function, we will add a single value of 80.0, and when the result is printed to the console, we will see all NaN values from the dataframe have been replaced with the number 80.0.

print(City_Temp.fillna(80.0))

Output:

     Tulsa  Dallas
Mon   78.5    83.2
Tue   80.0    93.1
Wed   75.1    80.0
Thu   80.0    80.0
Fri   80.0    92.1

As an alternative to providing a specific value, we can use the method argument with the value ffill to tell the function to fill the NaN values with the value above it in the same column. Here, the NaN values for Tulsa are replaced by 75.1, and the values for Dallas are replaced by 93.1.

print(City_Temp.fillna(method="ffill"))

Output:

     Tulsa  Dallas
Mon   78.5    83.2
Tue   80.0    93.1
Wed   75.1    93.1
Thu   75.1    93.1
Fri   75.1    92.1

Similarly, we can perform a backfill when the value of the method argument is set to bfill. The result shows the NaN values in the Dallas column are filled with the value 92.1, but the values in the Tulsa column are not replaced.

This is because there is no valid value below the row Friday that can be used to replace the NaN values.

print(City_Temp.fillna(method="bfill"))

Output:

     Tulsa  Dallas
Mon   78.5    83.2
Tue   80.0    93.1
Wed   75.1    92.1
Thu    NaN    92.1
Fri    NaN    92.1

If we want to fill the first NaN element occurrence in the dataframe for each city, we can use the ffil value for the method type and add the limit argument with a value of 1. The result shows that only the first occurrence of the NaN value is replaced with a forward fill.

print(City_Temp.fillna(method="ffill", limit=1))

Output:

     Tulsa  Dallas
Mon   78.5    83.2
Tue   80.0    93.1
Wed   75.1    93.1
Thu   75.1     NaN
Fri    NaN    92.1

Fill the NaN Elements With the Average

Lastly, we can specify different values for the NaN elements in different columns.

Let’s say we want to fill the NaN elements with the average temperature value for that city. To do this, we will make a dictionary using curly braces within the fillna() function.

In the dictionary, we will add the key Tulsa with the average temperature value rounded to 2 decimal places using the np.round() function with the np.mean() function within it.

Within the mean() function, we will add the City_Temp dataframe as the first argument and specify the Tulsa column within square brackets followed by the integer 2 to round the result to 2 decimal places. This will be repeated for the Dallas key and value pair.

We can reassign the dataframe so that it will be changed permanently.

City_Temp = City_Temp.fillna(
    {
        "Tulsa": np.round(np.mean(City_Temp["Tulsa"]), 2),
        "Dallas": np.round(np.mean(City_Temp["Dallas"]), 2),
    }
)
print(City_Temp)

When the result is printed to the console, we see that the average value for Tulsa, 77.87, has replaced the NaN elements for Thursday and Friday, and the average value for Dallas, 89.47, has replaced the NaN elements for Wednesday and Thursday.

     Tulsa  Dallas
Mon  78.50   83.20
Tue  80.00   93.10
Wed  75.10   89.47
Thu  77.87   89.47
Fri  77.87   92.10

Replace the NaN Values With a String

In your case, the data type could be anything, but in that case, you can also replace the NaN value or null value using the following code.

City_Temp = pd.DataFrame(
    {
        "Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
        "Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
    },
    index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)

print(City_Temp)

Output:

     Tulsa  Dallas
Mon   78.5    83.2
Tue   80.0    93.1
Wed   75.1     NaN
Thu    NaN     NaN
Fri    NaN    92.1

Consider the following code to replace the NaN values with a string.

df = City_Temp.fillna({"Tulsa": ".", "Dallas": "."}).fillna(0)
print(df)

Output:

    Tulsa Dallas
Mon  78.5   83.2
Tue  80.0   93.1
Wed  75.1      .
Thu     .      .
Fri     .   92.1

We can also use the following code snippet to replace the NaN values with a string, which will work the same as the code we mentioned above.

print(City_Temp.fillna("."))

Output:

    Tulsa Dallas
Mon  78.5   83.2
Tue  80.0   93.1
Wed  75.1      .
Thu     .      .
Fri     .   92.1

Full Code:

# In[1]:

import numpy as np
import pandas as pd

City_Temp = pd.DataFrame(
    {
        "Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
        "Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
    },
    index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)

print(City_Temp)

# In[2]:

print(City_Temp.fillna(80.0))

# In[3]:

print(City_Temp.fillna(method="ffill"))

# In[4]:

print(City_Temp.fillna(method="bfill"))

# In[5]:

print(City_Temp.fillna(method="ffill", limit=1))

# In[6]:

City_Temp = City_Temp.fillna(
    {
        "Tulsa": np.round(np.mean(City_Temp["Tulsa"]), 2),
        "Dallas": np.round(np.mean(City_Temp["Dallas"]), 2),
    }
)
print(City_Temp)

# ##### Replace nan with string

# In[7]:

City_Temp = pd.DataFrame(
    {
        "Tulsa": [78.5, 80.0, 75.1, np.nan, np.nan],
        "Dallas": [83.2, 93.1, np.nan, np.nan, 92.1],
    },
    index=["Mon", "Tue", "Wed", "Thu", "Fri"],
)
print(City_Temp)

# In[8]:

df = City_Temp.fillna({"Tulsa": ".", "Dallas": "."}).fillna(0)
print(df)
# In[9]:
print(City_Temp.fillna("."))

You can read here for more solutions.

Salman Mehmood avatar Salman Mehmood avatar

Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.

LinkedIn

Related Article - Pandas Column