How to Filter a Data Frame With Multiple Conditions in R
- Filter a Data Frame With Multiple Conditions in R
- Use of Boolean Operators
- Order of Precedence in Evaluation of Expressions
- Specify Desired Combinations Using Parentheses
-
Use the
%in%
Operator - Reference
Filtering the rows of a data frame is a common step in data analysis. In many cases, we need to restrict the rows of the result to those that meet more than one condition.
When such multiple conditions are involved, we need to combine our knowledge of several pieces of information to get the desired results. These pertain to:
- Use of Boolean operators.
- Order of precedence in the evaluation of expressions.
- Use of parentheses to specify the desired order of evaluation.
Filter a Data Frame With Multiple Conditions in R
To begin, we will create a sample data frame for this article. We will also load the dplyr
package to use its filter()
function for our demonstrations.
Example Code:
# Create a data frame for the article.
SN = 1:7
Alph = c("M", "M", "N", "N", "K", "K", "M")
Nmbr = c(11, 22, 11, 22, 11, 22, 0)
tb = data.frame(SN, Alph, Nmbr)
tb
# Load the dplyr package.
library(dplyr)
Output:
> tb
SN Alph Nmbr
1 1 M 11
2 2 M 22
3 3 N 11
4 4 N 22
5 5 K 11
6 6 K 22
7 7 M 0
Use of Boolean Operators
We will use Boolean operators to combine multiple conditions.
- The
AND
operator,&
, returnsTRUE
only when both the conditions on either side evaluate toTRUE
. - The
OR
operator,|
, returnsTRUE
if any of the conditions on either side or even both evaluate toTRUE
. - The
NOT
operator,!
, convertsFALSE
conditions toTRUE
and vice-versa.
We will use relational operators to specify conditions. To compare equality, the operator is ==
.
Example Code:
# Filter rows where Alph is "M" OR "N".
tb %>% filter(Alph=="M" | Alph == "N")
# Filter rows where Alph is "M" AND Nmbr == 22.
tb %>% filter(Alph=="M" & Nmbr == 22)
# Filter rows where Alph is NOT "M".
tb %>% filter(!Alph=="M")
Output:
> # Filter rows where Alph is "M" OR "N".
> tb %>% filter(Alph=="M" | Alph == "N")
SN Alph Nmbr
1 1 M 11
2 2 M 22
3 3 N 11
4 4 N 22
5 7 M 0
> # Filter rows where Alph is "M" AND Nmbr == 22.
> tb %>% filter(Alph=="M" & Nmbr == 22)
SN Alph Nmbr
1 2 M 22
> # Filter rows where Alph is NOT "M".
> tb %>% filter(!Alph=="M")
SN Alph Nmbr
1 3 N 11
2 4 N 22
3 5 K 11
4 6 K 22
- In the first example, we filtered (selected) rows for which
Alph
isM
orN
. Note that we have to specify the full condition on both sides of theOR
operator. - In the second example, we filtered rows for which
Alph
isM
andNmbr
is22
simultaneously. - In the third example, we filtered rows for which the column
Alph
is notM
.
Order of Precedence in Evaluation of Expressions
In the context of combining multiple conditions, we need to note the following order of precedence of operators.
NOT
operator,!
.AND
operator,&
.OR
operator,|
.- Operands are grouped from left to right, subject to the above unless overridden by parentheses.
Example Code:
# The ! operator only applies to the Alph column.
tb %>% filter(!Alph=="M" & Nmbr==11)
# The & operator only combines its two neighbors.
tb %>% filter(Alph=="M" & Nmbr==11 | Nmbr==22)
# The | operator gets applied AFTER &.
tb %>% filter(Nmbr==11 | Nmbr==22 & Alph=="M")
Output:
> # The ! operator only applies to the Alph column.
> tb %>% filter(!Alph=="M" & Nmbr==11)
SN Alph Nmbr
1 3 N 11
2 5 K 11
> # The & operator only combines its two neighbors.
> tb %>% filter(Alph=="M" & Nmbr==11 | Nmbr==22)
SN Alph Nmbr
1 1 M 11
2 2 M 22
3 4 N 22
4 6 K 22
> # The | operator gets applied AFTER &.
> tb %>% filter(Nmbr==11 | Nmbr==22 & Alph=="M")
SN Alph Nmbr
1 1 M 11
2 2 M 22
3 3 N 11
4 5 K 11
Because of the order of precedence, we saw that:
- The
!
operator got applied toAlph
before the&
operator combined the conditions in the first example. - The
&
operator combined the two conditions on either side before passing the result to the left side of the|
operator. - In the third example, the
&
got applied first. On the right side of|
, we have the result of first applying&
to its two operands.
Specify Desired Combinations Using Parentheses
We can use parentheses to group multiple conditions to get the desired result. Using parentheses, we can control the order of operations (precedence).
We will use the same examples in the previous section but modify them using parentheses to get different results.
Example Code:
# The ! operator now applies to the result of the & operation.
tb %>% filter(!(Alph=="M" & Nmbr==11))
# The & operator now has the result of the | operator on its right.
tb %>% filter(Alph=="M" & (Nmbr==11 | Nmbr==22))
# Now, the | operator gets applied first, and the result is passed to &.
tb %>% filter((Nmbr==11 | Nmbr==22) & Alph=="M")
Output:
> # The ! operator now applies to the result of the & operation.
> tb %>% filter(!(Alph=="M" & Nmbr==11))
SN Alph Nmbr
1 2 M 22
2 3 N 11
3 4 N 22
4 5 K 11
5 6 K 22
6 7 M 0
> # The & operator now has the result of the | operator on its right.
> tb %>% filter(Alph=="M" & (Nmbr==11 | Nmbr==22))
SN Alph Nmbr
1 1 M 11
2 2 M 22
> # Now, the | operator gets applied first, and the result is passed to &.
> tb %>% filter((Nmbr==11 | Nmbr==22) & Alph=="M")
SN Alph Nmbr
1 1 M 11
2 2 M 22
Because of the parentheses, we now get different results.
-
In the first example, the
!
operator is now applied after the&
operator combined the conditions. -
In the second example, the
|
operation is carried out first. The result is then passed as the operand on the right of&
. -
In the third example, once again, the
|
is carried out first. Notice that we get the same result as in the previous example.Compare this to the different results for these in the previous section.
Use the %in%
Operator
We can use R’s %in%
operator to filter rows for which the column value is any of the values mentioned in the vector passed to it. This is the equivalent of combining multiple conditions on the same column using the OR
operator.
Example Code:
# Filter rows where Alph is M or K.
tb %>% filter(Alph %in% c("M", "K"))
Output:
> # Filter rows where Alph is M or K.
> tb %>% filter(Alph %in% c("M", "K"))
SN Alph Nmbr
1 1 M 11
2 2 M 22
3 5 K 11
4 6 K 22
5 7 M 0
Reference
See section 10.4.2 Infix and prefix operators of the R Language Definition for more information about the order of precedence in the evaluation of expressions.