How to Use Vectorized if Function With Multiple Conditions in R
-
Limitation of the
if
Statement in R -
the Vectorized
ifelse()
Function in R -
the
if_else()
Function of thedplyr
Package in R -
Use Multiple Conditions in the
if_else()
Function in R - Conclusion
A common data analysis task is to create or update a data frame column using one or multiple conditions based on the other columns of the same row.
If we try to do this using an if
statement, only the first row is used to test the condition, and the entire column is updated based on that row.
When working with a data frame, we need tools and techniques that work on multiple rows. This article will learn vectorized if
functions and vectorized AND
and OR
operators to combine multiple conditions.
We will first create a small data frame for illustration.
# Create two vectors.
Col1 = rep(c("A", "B"), times = 2, each = 2)
Col2 = rep(c("x", "y"), times = 1, each = 4)
# Create a data frame.
cond_df = data.frame(Col1, Col2)
# View the data frame.
cond_df
Limitation of the if
Statement in R
According to the documentation, the if
statement takes a length-one logical vector that is not NA….only the first element is used.
In the following example, we will create a column based on a condition that uses another column.
# Try to use the if statement.
cond_df$NewCol = if(Col1 == "B"){cond_df$NewCol = "Col1 was B"} else{cond_df$NewCol = "Col1 was not B"}
# View the result.
cond_df
Output:
Col1 Col2 NewCol
1 A x Col1 was not B
2 A x Col1 was not B
3 B x Col1 was not B
4 B x Col1 was not B
5 A y Col1 was not B
6 A y Col1 was not B
7 B y Col1 was not B
8 B y Col1 was not B
R warned when we executed this if
statement but created the column. The result is not what we wanted.
Only the first row was evaluated, and the result was applied to all the data frame rows.
the Vectorized ifelse()
Function in R
Base R includes a vectorized ifelse()
function, which we can use to conditionally update a data frame column.
According to the documentation, this function “…returns a value with the same shape as test….”, and this makes it suitable for use on a data frame.
The syntax of the function is: ifelse(test, value_if_true, value_if_false)
. The following code illustrates the use of this function.
# Create a new data frame using the same vectors.
vect_df = data.frame(Col1, Col2)
# Use the vectorized ifelse() function.
vect_df$NewCol = ifelse(Col1 == "B", "Col1 was B", "F")
# view the result.
vect_df
Output:
> vect_df
Col1 Col2 NewCol
1 A x F
2 A x F
3 B x Col1 was B
4 B x Col1 was B
5 A y F
6 A y F
7 B y Col1 was B
8 B y Col1 was B
This function worked as expected. We can use it to create or update a data frame column using conditions based on values from other columns.
But this function has a limitation. The documentation states ifelse()
strips attributes. This is important when working with Dates and factors.
Let us see an example of the problem, which are:
- Create a vector of dates.
- Create a new vector using the
ifelse()
function on the first vector. The change caused by theifelse()
function is unexpected.
# Create and view a vector of dates.
datevec = seq(from = as.Date("2022-01-01"), to = as.Date("2022-01-05"), by = "day")
datevec
class(datevec)
# Create a new vector of dates using the ifelse() function on the previous vector. View it.
mod_datevec = ifelse(datevec < as.Date("2022-01-03"), datevec, as.Date("2022-02-01"))
mod_datevec # Not expected result.
class(mod_datevec) # Not date.
Output:
> datevec = seq(from = as.Date("2022-01-01"), to = as.Date("2022-01-05"), by = "day")
> datevec
[1] "2022-01-01" "2022-01-02" "2022-01-03" "2022-01-04" "2022-01-05"
> class(datevec)
[1] "Date"
>
> mod_datevec = ifelse(datevec < as.Date("2022-01-03"), datevec, as.Date("2022-02-01"))
> mod_datevec
[1] 18993 18994 19024 19024 19024
> class(mod_datevec)
[1] "numeric"
We find that dates have changed to numbers. The ifelse()
function does not work as expected on dates and factor variables.
Let us now look at a solution offered by the dplyr
package.
the if_else()
Function of the dplyr
Package in R
The if_else()
function from the dplyr
package addresses some of the issues associated with base R’s ifelse()
function.
- It ensures
value_if_true
andvalue_if_false
are of the same type. - It takes all other attributes from
value_if_true
.
Let us use this function as an example.
# First load the dplyr package.
library(dplyr)
# Create another data frame from the two vectors.
dplyr_df = data.frame(Col1, Col2)
# Use the vectorized if_else() function.
dplyr_df$NewCol = if_else(Col1 == "B", "Col1 was B", "F")
# view the result.
dplyr_df
We can inspect the output and see that the function worked as expected, like base R’s ifelse()
.
How does it work on dates? Let us check.
# Create a new vector using if_else() based on the vector created earlier. View it.
dplyr_datevec = if_else(datevec < as.Date("2022-01-03"), datevec, as.Date(NA))
dplyr_datevec
Output:
> dplyr_datevec = if_else(datevec < as.Date("2022-01-03"), datevec, as.Date(NA))
> dplyr_datevec
[1] "2022-01-01" "2022-01-02" NA NA NA
We find that dplyr
and if_else()
function works correctly on dates.
Use Multiple Conditions in the if_else()
Function in R
We can combine multiple conditions using the vectorized &
and |
operators, representing AND
and OR
.
These can be used in both ifelse()
and if_else()
. In our example, we will use if_else()
because it is the better one.
# Create a data frame from the same two vectors.
mult_df = data.frame(Col1, Col2)
# Create a new column based on multiple conditions combined with AND, using &.
mult_df$AND_Col = if_else((Col1 == "A" & Col2 == "y"), "AND", "F")
# View the data frame with the added column.
mult_df
# Create another column based on multiple conditions combined with Or, using |.
mult_df$OR_Col = if_else((Col1 == "A" | Col2 == "y"), "OR", "F")
# View the data frame with the added column.
mult_df
The output of the last command:
> mult_df
Col1 Col2 AND_Col OR_Col
1 A x F OR
2 A x F OR
3 B x F F
4 B x F F
5 A y AND OR
6 A y AND OR
7 B y F OR
8 B y F OR
Remember that R has vectorized and non-vectorized versions of the AND
and OR
operators. We used the vectorized &
and |
operators to combine two conditions because we wanted to test the conditions for each row.
The &
and |
are vectorized; &&
and ||
are non-vectorized.
References and Help:
In R Studio, for more information about the if
statement, ifelse()
function or if_else()
function, click Help > Search R Help
and type the statement/function name in the search box without parentheses.
Alternately, type a question mark followed by the statement/function name at the command prompt in the R Console.
Conclusion
Statements, functions and operators that work with a single variable may not work with data frames. We need to use the appropriate tools for the task.
To create/update a column of a data frame conditionally, we used the vectorized ifelse()
function and its better dplyr
version, if_else()
.
We used the vectorized AND
and OR
operators to combine multiple conditions, &
and |
.