How to Split String by Delimiter in R
This article will discuss how to split string by delimiter in R.
Use strsplit
to Split String by Delimiter in R
strsplit
is provided with the R base library and should be available on most installations without additional packages. strsplit
splits character vector into sub-strings by the given delimiter, which is provided with a character vector as well. The first argument of the function is the character vector to be split up. In this case, we specify the space character to separate each word in the given sentence. Note that output is given as a list of character vectors.
library(dplyr)
library(stringr)
str <- "Lorem Ipsum is simply dummied text of the printing and typesetting industry."
strsplit(str, " ")
Output:
> strsplit(str, " ")
[[1]]
[1] "Lorem" "Ipsum" "is" "simply" "dummied" "text"
[7] "of" "the" "printing" "and" "typesetting" "industry."
Use str_split
to Split String by Delimiter in R
Alternatively, the str_split
function can also be utilized to split string by delimiter. str_split
is part of the stringr
package. It almost works in the same way as strsplit
does, except that str_split
also takes regular expressions as the pattern. In the following example, we only pass the fixed string to match. Note that the function can optionally take the third argument, which denotes the number of substrings to return.
library(dplyr)
library(stringr)
str <- "Lorem Ipsum is simply dummied text of the printing and typesetting industry."
str_split(str, " ")
Output:
> str_split(str, " ")
[[1]]
[1] "Lorem" "Ipsum" "is" "simply" "dummied" "text"
[7] "of" "the" "printing" "and" "typesetting" "industry."
Another optional parameter in the str_split
function is simplify
, which comes at fourth place. This parameter has the value of FALSE
by default, and this forces the function to return sub-strings as a list of character vectors. If we assign TRUE
to the given argument, str_split
returns a character matrix.
library(dplyr)
library(stringr)
fruits <- c(
"apples and oranges and pears and bananas",
"pineapples and mangos and raspberries"
)
str_split(fruits, " and ")
str_split(fruits, " and ", simplify = TRUE)
Output:
> str_split(fruits, " and ")
[[1]]
[1] "apples" "oranges" "pears" "bananas"
[[2]]
[1] "pineapples" "mangos" "raspberries"
> str_split(fruits, " and ", simplify = TRUE)
[,1] [,2] [,3] [,4]
[1,] "apples" "oranges" "pears" "bananas"
[2,] "pineapples" "mangos" "raspberries" ""
Founder of DelftStack.com. Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.
LinkedIn Facebook