How to Replace NA With Zero in R
There is a simple way to replace NA
with zeroes in a data frame in R. Suppose you have a data frame called my_data
. To replace all NA
values with zeroes in that data frame, you can execute this statement.
my_data[is.na(my_data)] <- 0
For example, if my_data
has the below content.
C1 C2 C3 C4 C5
1 4 3 <NA> 3 7
2 9 8 ABC 5 10
3 1 1 XYZ 3 6
4 NA 4 <NA> 7 10
5 1 2 ZC1 NA 2
When you execute my_data[is.na(my_data)] <- 0
the data frame’s content change to this.
C1 C2 C3 C4 C5
1 4 3 0 3 7
2 9 8 ABC 5 10
3 1 1 XYZ 3 6
4 0 4 0 7 10
5 1 2 ZC1 0 2
Replace NA With Zero in Bigger R Data Frames
The previous solution uses the Base R subset reassigns, which work fine when you have relatively small data frames. But for bigger data sets, you might need a faster alternative, like the new hybrid evaluation approach implemented in recent versions of the dplyr
package.
The new approach employed by the dplyr
package recognizes entire expressions and uses C++ code to evaluate them. In this way, you can achieve up to 30% faster transforms when processing big data frames.
To replace NA
values with zeroes using the dplyr
package, you can use the mutate
function with the _all
scoped verb and the replace
function in the purrr
format, as in the below example.
my_data <- mutate_all(my_data, ~replace(., is.na(.), 0))
The use of the purrr
notation allows us to apply the replace
function to each data frame element.
Replace NA With Zero in a Subset of R Data Frame
Instead of the _all
scoped verb in the mutate
function, you can use the _at
scoped verb to restrict the replacement action to specific columns. To do that, you can include a vector with the columns’ names where you want the replacement to be applied. Using the previous data frame, if you need to replace NA
values only in columns C1
and C4
, you can use the following command:
my_data <- mutate_at(my_data, c("C1", "C4"), ~replace(., is.na(.), 0))
In this way, only the NAs in columns C1
and C4
get replaced by 0, resulting in a data frame like below.
C1 C2 C3 C4 C5
1 4 3 <NA> 3 7
2 9 8 ABC 5 10
3 1 1 XYZ 3 6
4 0 4 <NA> 7 10
5 1 2 ZC1 0 2
In the previous example, you might have wanted to replace NA
with zeroes only in numeric columns to avoid including zero values in alphanumeric columns such as C3
. If that is the case, instead of specifying the columns where you want to apply the replacement, you can use the mutate_if
function with the is.numeric
condition to tell R to replace NA
with zeroes only in numeric columns. In the following example, you can find the complete code to try this out, from installing the dplyr
package and populating the data frame to performing the replacements and displaying the results.
install.packages("dplyr")
library(dplyr)
C1 <- c(4, 9, 1, NA, 1)
C2 <- c(3, 8, 1, 4, 2)
C3 <- c(NA, 'ABC', 'XYZ', NA, 'ZC1')
C4 <- c(3, 5, 3, 7, NA)
C5 <- c(7, 10, NA, 10, 2)
my_data <- data.frame(C1, C2, C3, C4, C5)
my_data <- mutate_if(my_data, is.numeric, ~replace(., is.na(.), 0))
my_data
Output:
C1 C2 C3 C4 C5
1 4 3 <NA> 3 7
2 9 8 ABC 5 10
3 1 1 XYZ 3 0
4 0 4 <NA> 7 10
5 1 2 ZC1 0 2
You can find more info on the mutate()
function and its variants in the R Documentation.