How to Fill Missing Values in Pandas DataFrame
-
Syntax of the
ffill()
Method in Pandas -
Fill Missing Values in the DataFrame Using the
ffill()
Method in Pandas
Sometimes, we may have a dataset with missing values. There are many ways to replace the missing data using some methods.
The ffill()
(forward fill) is one of the methods to replace the missing values in the dataframe. This method substitutes NaN with the previous row or column values.
Syntax of the ffill()
Method in Pandas
# Python 3.x
dataframe.ffill(axis, inplace, limit, downcast)
The ffill()
method takes four optional arguments:
axis
specifies from where to fill the missing value. Value 0 indicates the row, and 1 represents the column.inplace
can either be True or False. True specifies making changes in the current dataframe, whereas False indicates creating a separate copy of the new dataframe with filled values.limit
specifies the maximum number of missing values to fill consecutively along the axis.downcast
specifies a dictionary of values to fill for a specific data type.
Fill Missing Values in the DataFrame Using the ffill()
Method in Pandas
Fill Missing Values Along the Row Axis
We have a dataframe with missing values denoted by None or NaN in the following code. We have displayed the actual dataframe and then applied the ffill()
method to that dataframe.
By default, the ffill()
method replaces the missing values along the row/ index axis. The NaN is replaced with the values from the previous row of that cell.
The first row still contains NaN in the output because there is no preceding row.
Example code:
# Python 3.x
import pandas as pd
df = pd.DataFrame(
{
"C1": [2, 7, None, 4],
"C2": [None, 2, None, 3],
"C3": [2, None, 6, 5],
"C4": [5, 2, 8, None],
}
)
display(df)
df2 = df.ffill()
display(df2)
Output:
Fill Missing Values Along the Column Axis
Here, we will specify axis=1
. It will fill the missing values by observing the value from the previous column of that corresponding cell.
In the output, all the values are filled except the two values. Because we have no previous column for column 1
, that value will still be NaN.
And the value in column 2 is NaN because the corresponding cell from the preceding column is also NaN.
Example code:
# Python 3.x
import pandas as pd
df = pd.DataFrame(
{
"C1": [2, 7, None, 4],
"C2": [None, 2, None, 3],
"C3": [2, None, 6, 5],
"C4": [5, 2, 8, None],
}
)
display(df)
df2 = df.ffill(axis=1)
display(df2)
Output:
Use limit
to Limit the Number of Consecutive NaN to Fill
We can use the limit
parameter to limit the number of consecutive missing values to fill along the row or column axis.
In the following code, we have the actual dataframe in which we have consecutive NaN’s in the last three rows. If we specify limit=2
, no more than two successive NaN’s can fill along the row axis.
That’s why the NaN in the last row is still not filled.
Example code:
# Python 3.x
import pandas as pd
df = pd.DataFrame(
{
"C1": [2, 7, None, 4],
"C2": [4, None, None, None],
"C3": [6, 6, 6, 5],
"C4": [None, 2, 8, None],
}
)
display(df)
df2 = df.ffill(axis=0, limit=2)
display(df2)
Output:
Use inplace
to Fill Values in the Original DataFrame
Suppose we want to make changes in the original dataframe instead of copying the dataframe with filled values in another dataframe. In that case, we can use the inplace
parameter with the value True.
Example code:
# Python 3.x
import pandas as pd
df = pd.DataFrame(
{
"C1": [2, 7, None, 4],
"C2": [4, None, None, None],
"C3": [6, 6, 6, 5],
"C4": [None, 2, 8, None],
}
)
display(df)
df.ffill(inplace=True)
display(df)
Output:
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn