How to Check for Missing Values Using a Boolean Operator in R
- Error While Checking for Existing Values and Missing Values in R
-
Use the
is.na()
Function to Look for Missing Values in R - Conclusion
When analyzing data, we may import data from an external source, such as a CSV file. The data may contain missing values marked as NA
.
If we need to check for different values in our data, or if we need to check for NA
, we must first address the missing values to avoid errors. We will see how to do that in this article.
We will create a simple vector to demonstrate the problem and the solution.
Sample Code:
myVec = c(50, 60, NA, 40, 80)
Error While Checking for Existing Values and Missing Values in R
First, let us check for the value 60
, which we know exists in the vector.
After that, we will check for a missing value. Both give the same error.
Sample Code:
# Value 60 exists.
for(i in 1:length(myVec))
if(myVec[i] == 60) {print("Present")}
# A missing value, NA, exists.
for(i in 1:length(myVec))
if(myVec[i] == NA) {print("Missing")}
Output:
> # Value 60 exists.
> for(i in 1:length(myVec))
+ if(myVec[i] == 60) {print("Present")}
[1] "Present"
Error in if (myVec[i] == 60) { : missing value where TRUE/FALSE needed
> # A missing value, NA, exists.
> for(i in 1:length(myVec))
+ if(myVec[i] == NA) {print("Missing")}
Error in if (myVec[i] == NA) { : missing value where TRUE/FALSE needed
We got that error because the Boolean condition that we entered in the if
statement either compares a value to NA
or NA
to NA
. Such Boolean conditions evaluate NA
rather than TRUE
or FALSE
.
Sample Code:
# This evaluates to NA rather than TRUE.
NA == NA
# This evaluates to NA rather than FALSE.
NA != NA
# Therefore, the following code raises the error:
# "missing value where TRUE/FALSE needed".
if(NA) print("Correct")
Output:
> # This evaluates to NA rather than TRUE.
> NA == NA
[1] NA
>
> # This evaluates to NA rather than FALSE.
> NA != NA
[1] NA
>
> # Therefore, the following code raises the error:
> # "missing value where TRUE/FALSE needed".
> if(NA) print("Correct")
Error in if (NA) print("Correct") : missing value where TRUE/FALSE needed
Use the is.na()
Function to Look for Missing Values in R
To get around the problem caused by missing values, we need to identify missing values using the is.na()
function. They can be handled using a sequence of if
and else
conditions or nested if
and else
conditions.
The basic requirement is below.
- The
NA
values must be matched separately from all other values. - When checking for other values, we need to exclude
NA
values explicitly.
Sample Code:
# Using a sequence of if and else conditions.
for(i in 1:length(myVec)){
if(!is.na(myVec[i]) & myVec[i] == 60){
print("Match found")} else
if(!is.na(myVec[i]) & myVec[i] != 60){
print("Match not found")} else
if(is.na(myVec[i])) {
print("Found NA")}
}
# Using a nested if.
for(i in 1:length(myVec)){
if(!is.na(myVec[i])){
if(myVec[i]==60){
print("Match Found")} else {
print("Match not found")}
} else {
print("Found NA")}
}
Output:
> # Using a sequence of if and else conditions.
> for(i in 1:length(myVec)){
+ if(!is.na(myVec[i]) & myVec[i] == 60){
+ print("Match found")} else
+ if(!is.na(myVec[i]) & myVec[i] != 60){
+ print("Match not found")} else
+ if(is.na(myVec[i])) {
+ print("Found NA")}
+ }
[1] "Match not found"
[1] "Match found"
[1] "Found NA"
[1] "Match not found"
[1] "Match not found"
>
> # Using a nested if.
> for(i in 1:length(myVec)){
+ if(!is.na(myVec[i])){
+ if(myVec[i]==60){
+ print("Match Found")} else {
+ print("Match not found")}
+ } else {
+ print("Found NA")}
+ }
[1] "Match not found"
[1] "Match Found"
[1] "Found NA"
[1] "Match not found"
[1] "Match not found"
Conclusion
Whenever there is a chance that our data may have missing values, we must write code that separates the missing values from other values.