How to Convert Multiple Columns From Integer to Numeric Type in R
- Convert Multiple Columns From Integer to Numeric Type in R
-
Use the
lapply()
Function to Convert Multiple Columns From Integer to Numeric Type in R -
Use the
dplyr
Package Functions to Convert Multiple Columns From Integer to Numeric Type in R - Convert Multiple Columns From Factor to Numeric Type in R
- Conclusion
R has vectorized functions that convert multiple columns from integer to numeric type with a single line of code and without resorting to loops. This article explores two approaches to this task.
In both cases, the actual conversion of each column is done by the as.numeric()
function.
Convert Multiple Columns From Integer to Numeric Type in R
First, we will create some sample data.
Example code:
# Create vectors.
n = letters[1:5]
p = as.integer(c(11:15))
q = as.integer(c(51:55))
# Create a data frame.
df = data.frame(Names = n, Col1 = p, Col2 = q)
df
# See the structure of the data frame.
# Note that two columns are of integer type.
str(df)
Output:
> df
Names Col1 Col2
1 a 11 51
2 b 12 52
3 c 13 53
4 d 14 54
5 e 15 55
>
> # See the structure of the data frame.
> # Note that two columns are of integer type.
> str(df)
'data.frame': 5 obs. of 3 variables:
$ Names: chr "a" "b" "c" "d" ...
$ Col1 : int 11 12 13 14 15
$ Col2 : int 51 52 53 54 55
Use the lapply()
Function to Convert Multiple Columns From Integer to Numeric Type in R
Base R’s lapply()
function allows us to apply a function to elements of a list. We will apply the as.numeric()
function.
The documentation of the lapply()
function recommends using a wrapper function for the function name that we specify inside it.
Example code:
# First, we will create a copy of our data frame.
df1 = df
# Columns 2 and 3 are integer type.
# We will convert these to numeric.
# We will use a wrapper function as recommended.
df1[2:3] = lapply(df1[2:3], FUN = function(y){as.numeric(y)})
# Check that the columns are converted to numeric.
str(df1)
Output:
> df1[2:3] = lapply(df1[2:3], FUN = function(y){as.numeric(y)})
>
> # Check that the columns are converted to numeric.
> str(df1)
'data.frame': 5 obs. of 3 variables:
$ Names: chr "a" "b" "c" "d" ...
$ Col1 : num 11 12 13 14 15
$ Col2 : num 51 52 53 54 55
Use the dplyr
Package Functions to Convert Multiple Columns From Integer to Numeric Type in R
We can use dplyr
’s mutate()
and across()
functions to convert integer columns to numeric. The advantage of this is that the entire family of tidyselect
functions is available to select columns.
We will select columns using standard list syntax and the tidyselect
function where()
in the example code.
Example code:
# Load the dplyr package.
library(dplyr)
# USING STANDARD LIST SYNTAX.
# Convert the columns.
df2 = df %>% mutate(across(.cols=2:3, .fns=as.numeric))
# Check that the columns are converted.
str(df2)
# USING TIDYSELECT WHERE FUNCTION.
# Convert ALL integer columns to numeric.
df3 = df %>% mutate(across(.cols=where(is.integer), .fns=as.numeric))
# Check that the columns are converted.
str(df3)
Output:
# USING STANDARD LIST SYNTAX.
# Convert the columns.
df2 = df %>% mutate(across(.cols=2:3, .fns=as.numeric))
# Check that the columns are converted.
str(df2)
# USING TIDYSELECT WHERE FUNCTION.
# Convert ALL integer columns to numeric.
df3 = df %>% mutate(across(.cols=where(is.integer), .fns=as.numeric))
# Check that the columns are converted.
str(df3)
Convert Multiple Columns From Factor to Numeric Type in R
Sometimes, factor levels are coded with numbers, mostly integers. We will not want to convert such columns.
However, at other times, columns with integers may be represented as factors in R. Converting such columns to numbers poses a challenge.
The example code shows what happens when a factor column is converted to numeric.
Example code:
# Create a factor vector.
x = factor(c(15,15,20,25,30,30,30))
# See that these are 4 levels of factors.
# They are not numbers.
str(x)
# Convert the factor vector to numeric.
as.numeric(x) # This is not the result we want.
Output:
> # Create a factor vector.
> x = factor(c(15,15,20,25,30,30,30))
>
> # See that these are 4 levels of factors.
> # They are not numbers.
> str(x)
Factor w/ 4 levels "15","20","25",..: 1 1 2 3 4 4 4
>
> # Convert the factor vector to numeric.
> as.numeric(x) # This is not the result we want.
[1] 1 1 2 3 4 4 4
When an integer column happens to be wrongly represented as factors, we need to add one preliminary step to convert it to numeric correctly.
We must first convert the factors to a character type and then convert the character to a numeric type.
Example code:
# First, convert the factor vector to a character type.
# Then convert the character type to numeric.
# Both the above can be done in a single step, as follows.
y = as.numeric(as.character(x))
y
# Check that y is numeric.
str(y)
Output:
> y = as.numeric(as.character(x))
> y
[1] 15 15 20 25 30 30 30
>
> # Check that y is numeric.
> str(y)
num [1:7] 15 15 20 25 30 30 30
Let us see an example with a data frame. We will use the dplyr
approach.
Example code:
# Create a factor vector.
f = factor(c(20,20,30,30,30))
# Create a data frame.
df4 = data.frame(Name=n, Col1=p, Col2=q, Fac=f)
df4
# Check the structure.
str(df4)
# We will use the dplyr approach.
# First only convert integer type columns.
df5 = df4 %>% mutate(across(.cols=where(is.integer), .fns=as.numeric))
# Factor column did not get converted.
str(df5)
# Now, we will START AGAIN, and convert the factor column as well.
# To modify an existing column by name, we will give it the SAME name.
df6 = df4 %>% mutate(across(.cols=where(is.integer), .fns=as.numeric), Fac=as.numeric(as.character(Fac)))
df6
# Check that the factor column has also got converted.
str(df6)
Output:
> # Create a factor vector.
> f = factor(c(20,20,30,30,30))
>
> # Create a data frame.
> df4 = data.frame(Name=n, Col1=p, Col2=q, Fac=f)
> df4
Name Col1 Col2 Fac
1 a 11 51 20
2 b 12 52 20
3 c 13 53 30
4 d 14 54 30
5 e 15 55 30
>
> # Check the structure.
> str(df4)
'data.frame': 5 obs. of 4 variables:
$ Name: chr "a" "b" "c" "d" ...
$ Col1: int 11 12 13 14 15
$ Col2: int 51 52 53 54 55
$ Fac : Factor w/ 2 levels "20","30": 1 1 2 2 2
>
> # We will use the dplyr approach.
>
> # First only convert integer type columns.
> df5 = df4 %>% mutate(across(.cols=where(is.integer), .fns=as.numeric))
> # Factor column did not get converted.
> str(df5)
'data.frame': 5 obs. of 4 variables:
$ Name: chr "a" "b" "c" "d" ...
$ Col1: num 11 12 13 14 15
$ Col2: num 51 52 53 54 55
$ Fac : Factor w/ 2 levels "20","30": 1 1 2 2 2
>
> # Now, we will START AGAIN, and convert the factor column as well.
> # To modify an existing column by name, we will give it the SAME name.
> df6 = df4 %>% mutate(across(.cols=where(is.integer), .fns=as.numeric), Fac=as.numeric(as.character(Fac)))
> df6
Name Col1 Col2 Fac
1 a 11 51 20
2 b 12 52 20
3 c 13 53 30
4 d 14 54 30
5 e 15 55 30
> # Check that the factor column has also got converted.
> str(df6)
'data.frame': 5 obs. of 4 variables:
$ Name: chr "a" "b" "c" "d" ...
$ Col1: num 11 12 13 14 15
$ Col2: num 51 52 53 54 55
$ Fac : num 20 20 30 30 30
The tidyselect
functions are documented at the selection language web page. Refer to R’s documentation of the lapply()
function to understand the need for a wrapper function.
The documentation of the as.numeric()
function gives a second approach to convert integers represented as factors to a numeric type.
Conclusion
Before initiating the conversion of integer columns to a numeric type, we need to check whether the integer columns are of integer type. If they are represented as factors and want to convert them to numeric, we need to take one additional step to ensure proper conversion.
The conversion can be done using base R’s lapply()
function, or a combination of dplyr
’s mutate()
and across()
functions. The actual conversion is done using the as.numeric()
function.