How to Replace NA With Zero in R

Gustavo du Mortier Feb 02, 2024
  1. Replace NA With Zero in Bigger R Data Frames
  2. Replace NA With Zero in a Subset of R Data Frame
How to Replace NA With Zero in R

There is a simple way to replace NA with zeroes in a data frame in R. Suppose you have a data frame called my_data. To replace all NA values with zeroes in that data frame, you can execute this statement.

my_data[is.na(my_data)] <- 0

For example, if my_data has the below content.

   C1  C2    C3  C4  C5
1   4   3  <NA>   3   7
2   9   8   ABC   5  10
3   1   1   XYZ   3   6
4  NA   4  <NA>   7  10
5   1   2   ZC1  NA   2

When you execute my_data[is.na(my_data)] <- 0 the data frame’s content change to this.

   C1  C2    C3  C4  C5
1   4   3     0   3   7
2   9   8   ABC   5  10
3   1   1   XYZ   3   6
4   0   4     0   7  10
5   1   2   ZC1   0   2

Replace NA With Zero in Bigger R Data Frames

The previous solution uses the Base R subset reassigns, which work fine when you have relatively small data frames. But for bigger data sets, you might need a faster alternative, like the new hybrid evaluation approach implemented in recent versions of the dplyr package.

The new approach employed by the dplyr package recognizes entire expressions and uses C++ code to evaluate them. In this way, you can achieve up to 30% faster transforms when processing big data frames.

To replace NA values with zeroes using the dplyr package, you can use the mutate function with the _all scoped verb and the replace function in the purrr format, as in the below example.

my_data <- mutate_all(my_data, ~replace(., is.na(.), 0))

The use of the purrr notation allows us to apply the replace function to each data frame element.

Replace NA With Zero in a Subset of R Data Frame

Instead of the _all scoped verb in the mutate function, you can use the _at scoped verb to restrict the replacement action to specific columns. To do that, you can include a vector with the columns’ names where you want the replacement to be applied. Using the previous data frame, if you need to replace NA values only in columns C1 and C4, you can use the following command:

my_data <- mutate_at(my_data, c("C1", "C4"), ~replace(., is.na(.), 0))

In this way, only the NAs in columns C1 and C4 get replaced by 0, resulting in a data frame like below.

   C1  C2    C3  C4  C5
1   4   3  <NA>   3   7
2   9   8   ABC   5  10
3   1   1   XYZ   3   6
4   0   4  <NA>   7  10
5   1   2   ZC1   0   2

In the previous example, you might have wanted to replace NA with zeroes only in numeric columns to avoid including zero values in alphanumeric columns such as C3. If that is the case, instead of specifying the columns where you want to apply the replacement, you can use the mutate_if function with the is.numeric condition to tell R to replace NA with zeroes only in numeric columns. In the following example, you can find the complete code to try this out, from installing the dplyr package and populating the data frame to performing the replacements and displaying the results.

install.packages("dplyr")
library(dplyr)
C1 <- c(4, 9, 1, NA, 1)
C2 <- c(3, 8, 1, 4, 2)
C3 <- c(NA, 'ABC', 'XYZ', NA, 'ZC1')
C4 <- c(3, 5, 3, 7, NA)
C5 <- c(7, 10, NA, 10, 2)
my_data <- data.frame(C1, C2, C3, C4, C5)
my_data <- mutate_if(my_data, is.numeric, ~replace(., is.na(.), 0))
my_data

Output:

   C1  C2    C3  C4  C5
1   4   3  <NA>   3   7
2   9   8   ABC   5  10
3   1   1   XYZ   3   0
4   0   4  <NA>   7  10
5   1   2   ZC1   0   2

You can find more info on the mutate() function and its variants in the R Documentation.

Related Article - R Data Frame