How to Remove the First Character From a String in R
- Techniques to Remove the First Character From a String
-
Use the
substr()
Function in R -
Use the
sub()
Function With a Regular Expression in R -
Use the
stringr
Package in R - Getting Help
- Conclusion
R is well known as a programming environment for statistical analysis. It means analyzing numbers, but statistics is not just about numbers. We may need to count instances of a word, or we may need to remove the first character from a string.
R provides many functions to analyze and manipulate strings of characters. While some functionality is built into base R, more is available through packages.
This article focuses on three techniques to remove the first character from a string.
Before diving into the techniques, it’s important to note these two points.
- In R, the indexing of elements of vectors starts with 1, not 0. Unlike most programming languages, it’s like the indexing of vectors and matrices in mathematics, and the last range index is included in the range.
- All the example codes in this article will use valid values. It will help keep the article’s focus on the main concept explained.
Techniques to Remove the First Character From a String
We will explore three techniques to remove the first character from a string or a vector of strings.
- Base R
substr()
function. - Base R
sub()
function with a regular expression. - The
str_sub()
function from thestringr
package.
We will demonstrate each technique to facilitate comprehension with both a single string and a vector of strings.
We will use the word dictionary as our example string. We will use R’s combine function, c()
, to create a vector of three strings.
The words that will make up our vector are dictionary, thesaurus, and diary. The following code creates the vector.
Example Code:
myVector = c('dictionary', 'thesaurus', 'diary')
Output:
> myVector = c('dictionary', 'thesaurus', 'diary')
Use the substr()
Function in R
The first technique will demonstrate the substr()
function of base R to remove the first character from a string. The main points about the use of this function are as follows.
- The
substr()
function extracts a part of a string. - It takes three arguments.
- The first argument is the string of text. In R terminology, a string is a character vector, and each of its characters has a position index.
- The second argument is the character’s index to start the substring. The first character’s index is 1.
- The third argument is the character’s index vector to end the substring. The last character’s index is the same as the length of the string.
The example code below removes the string’s first character because we start the substring at index two and end the substring at the index position of the last character.
Example Code:
substr('dictionary', 2,10)
Output:
> substr('dictionary', 2,10)
[1] "ictionary"
In most practical cases, we will not work with one individual string. We will have a vector of strings, and we need to remove the first character from each string in the vector.
To demonstrate the substr()
function on a vector of strings, we will introduce a new function, nchar()
, from base R.
The nchar()
function gives us the number of characters of each element of a vector of strings.
Why do we need this function?
To extract a substring using substr()
, we need to pass the index of the character at which to stop the substring.
In our previous example, we only wanted to remove the first character; we used the last character’s index position of the string for this purpose. In our example of a single word, we gave a specific numerical value for the position of the last character.
When we have a vector of strings of different lengths, we need a general way to specify the index position of the last character of each string. The nchar()
function allows us to do that.
For example, nchar('dictionary')
gives us 10
; it’s the number of characters in the string dictionary
. And since indexing starts at 1 in R, the index position of the last character of this string is 10.
We continue to use the same substr()
function to process a vector of strings. However, note two major differences in this case.
- The first argument to
substr()
is now a vector of strings. (It is a vector of character vectors.) - As
substr()
gets applied to each vector element,nchar()
passes the number of characters in that element. Unlike procedural programming languages, it’s unnecessary to write a loop to iterate through each vector element.
The code below removes the first character from each vector element.
Example Code:
substr(myVector, 2, nchar(myVector))
Output:
> substr(myVector, 2, nchar(myVector))
[1] "ictionary" "hesaurus" "iary"
Use the sub()
Function With a Regular Expression in R
Regular expressions refer to a very elaborate string pattern matching system. It’s a potent tool for identifying patterns in strings.
A general presentation about regular expressions is beyond the scope of this article; several books have been written on the subject.
Nonetheless, since we need to use a regular expression with the sub()
function, only the essential features will be introduced.
- The dot, or full stop, matches any character, a single instance of any character.
- The caret,
^
at the start, matches a pattern at the start of a string.
Using these two characters, we make the following two regular expression patterns:
- The pattern (
.
) matches any one character. - The pattern
^
matches any one character at the beginning of the string.
The sub()
function of base R is a pattern matching and replacement function in which we will use a regular expression to remove the first character from a string.
The main points concerning the use of this function are as follows:
- The function is of the form
sub('searchpattern', 'replacement', 'string')
. - The first argument is the pattern that we search.
- The second is the string that replaces the search pattern’s first instance once it is found.
- The third is the string in which we search for the pattern and replace the pattern.
- By default, this function is case-sensitive.
- This function matches and replaces only the first instance of the search pattern.
The following code illustrates the sub()
function with the dot as the regular expression (the search pattern). The first character, which is the first match, gets replaced with A
.
Example Code:
sub('.','A','dictionary')
Output:
> sub('.','A','dictionary')
[1] "Aictionary"
Our main task is to remove the first character from a string rather than replace it.
To remove the string’s first character using this technique, we need to use an empty string, ''
, as the second argument. Look at the code.
Example Code:
sub('.','','dictionary')
Output:
> sub('.','','dictionary')
[1] "ictionary"
The next example demonstrates the sub()
function with the vector of strings that we have already created. This code removes the first character from each element of the vector.
Example Code:
sub('.','', myVector)
Output:
> sub('.','', myVector)
[1] "ictionary" "hesaurus" "iary"
Before moving on to the next technique, a word of caution is in place. Base R has another pattern matching and replacement function, gsub()
.
gsub()
matches and replaces all instances of the search pattern, unlike the sub()
function, which matches and replaces the first instance of the search pattern.
You may have noticed that while this section introduces two regular expressions, the example code only used one.
The reason is that with the sub()
function, only the dot is sufficient to match any first character because it only matches the first instance of the search pattern. gsub()
behaves differently; the dot matches every character.
Example Code:
gsub('.','A','dictionary')
Output:
> gsub('.','A','dictionary')
[1] "AAAAAAAAAA"
We find that gsub()
has replaced every character with the replacement string, 'A'
in this case.
To force gsub()
to only match the search pattern at the start of the string, we need to precede the dot with the caret, as in the following example.
Example Code:
gsub('^.','A','dictionary')
Output:
> gsub('^.','A','dictionary')
[1] "Aictionary"
For the task of removing just the first character from a string or a vector of strings, the sub()
function is a simpler option compared to its close counterpart, gsub()
.
Use the stringr
Package in R
The stringr
package provides the str_sub()
function to remove the first character from a string.
Unlike the base R substr()
function, we don’t need another function such as nchar()
to get the index position of the last character when we use the str_sub()
function on a vector of strings.
However, since the stringr
package provides this function, users will need first to install that package (one-time task) and load it before using it (in each session).
The main points concerning the use of the str_sub()
function are as follows:
- The function takes three arguments.
- The first argument is the string, or vector of strings.
- The second argument is the index position from where to start the substring. To remove the first character, we need to start at index position 2.
- The third argument is the index position at which to end the substring. To keep all the characters till the end of the string, we need to give it the index position of the last character, which is -1. It’s what makes this function particularly useful.
When using the substr()
function of base R, we calculated the index position of the last character of each string using the nchar()
function.
However, the str_sub()
function specifies index positions from the end of a string using negative integers.
The index position of the last character is -1, the index position of the second-last character is -2, and so on. This feature allows us to specify our substring using this function alone.
For example, str_sub('thesaurus',2,-1)
starts extracting (keeping) the substring from index position 2 of the original string, that is, from the letter h, and keeps all characters till index position -1 of the original string, that is, the last character, s. It thus returns the string hesaurus
.
This code illustrates the function’s use with the single string dictionary
and the vector of strings that we created.
Example Code:
# Install the stringr package using the install.packages() function.
# This is a one-time task.
install.packages("stringr")
# Load the stringr package in each R session using the library() function.
library(stringr)
# Use of the str_sub() function on a string -- a character vector.
str_sub('dictionary', 2,-1)
# Use of the str_sub() function on a vector of strings.
str_sub(myVector, 2,-1) # Removes the first character.
Output (after installing and loading the package):
> # Use of the str_sub() function on a string -- a character vector.
> str_sub('dictionary', 2,-1)
[1] "ictionary"
>
> # Use of the str_sub() function on a vector of strings.
> str_sub(myVector, 2,-1) # Removes the first character.
[1] "ictionary" "hesaurus" "iary"
Getting Help
RStudio makes it convenient to get more information about any function or command.
Click Help > R Help
to bring up the Help pane in the Files / Plots / Packages / Help / Viewer
window at the bottom right of the RStudio interface.
Search any of the functions mentioned in this article using the search box at the top of this page. Don’t type the parentheses after the function name in the search box.
Clicking Help > Search R Help
places the cursor in the search box of this page.
Conclusion
Several techniques are available to remove the first character from a string. The base R substr()
function is readily available but needs another base R function, the nchar()
function, for a vector of strings.
The sub()
function is powerful, but all its power and complexity don’t require the simple task of removing the string’s first character. The str_sub()
function is very convenient but needs the installation and loading of the stringr
package.
Each of these techniques delivers the expected result. The choice is with the user.