How to Drop Column by Name in R
-
How to Drop a Column by Name From a Data Frame in R Using the
dplyr
Package -
How to Drop a Column by Name From a Data Frame in R Using the
names()
Function -
How to Drop a Column by Name From a Data Frame in R Using the
subset()
Function -
How to Drop a Column by Name From a Data Frame in R Using the
data.table
Package - Conclusion
Data frames are fundamental structures in R for storing and manipulating data, commonly used in data analysis, statistics, and machine learning tasks. In many real-world scenarios, it becomes necessary to remove specific columns from a data frame, either due to redundancy or irrelevance, or to streamline data processing pipelines.
One common operation is dropping a column by its name. In this article, we will explore various methods to achieve this task in R.
We’ll delve into techniques leveraging popular packages such as dplyr
and data.table
, as well as built-in functions like names()
.
How to Drop a Column by Name From a Data Frame in R Using the dplyr
Package
dplyr
is a powerful package in R designed to make data manipulation tasks easier and more intuitive. It provides a set of functions that streamline common data manipulation tasks, such as filtering, selecting, arranging, and summarizing data.
One of the key functions in the dplyr
package is select()
, which allows us to subset columns from a data frame based on their names.
The syntax for dropping a column by name using the select()
function from the dplyr
package is as follows:
select(dataframe, -column_name)
Here, dataframe
refers to the data frame from which we want to drop the column, and column_name
is the name of the column we wish to remove. By using the -
sign before the column name, we specify that we want to drop that particular column from the data frame.
The function returns the data frame without the specified column.
Before we proceed to the example, ensure you have the dplyr
package installed. If not, you can install it from CRAN using:
install.packages("dplyr")
Let’s demonstrate the usage of the select()
function to drop a column by name from a data frame. Consider a data frame containing employee information with columns for Name
, LastName
, Id
, and Designation
.
library(dplyr)
Delftstack <- data.frame(
Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
Id = c(101, 102, 103, 104, 105),
Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)
print("The dataframe before dropping the column:-")
print(Delftstack)
print("The dataframe after dropping the column:-")
print(select(Delftstack, -Name))
Initially, we load the dplyr
library to access its powerful data frame manipulation functions. Then, we create a data frame named Delftstack
containing employee information with columns for Name
, LastName
, Id
, and Designation
.
Before making any modifications, we print the data frame to understand its structure.
Moving on to the actual operation, we use the select()
function from the dplyr
package to drop the Name
column from the Delftstack
data frame. This function takes the data frame as its first argument, and the column name prefixed with a minus sign (-
) as its second argument.
The minus sign signifies that we want to exclude the specified column from the resulting data frame. Upon execution, the select()
function returns the modified data frame without the Name
column.
To validate the operation, we print the updated data frame after dropping the column. This allows us to verify that the modification was successful.
Output:
In the printed output, we observe that the Name
column is absent from the data frame, while the other columns (LastName
, Id
, and Designation
) remain intact. This confirms that the select()
function effectively removed the specified column as intended.
How to Drop a Column by Name From a Data Frame in R Using the names()
Function
Another approach to dropping a column by name from a data frame involves using the names()
function. This method provides a straightforward way to specify columns to be removed.
The syntax for dropping a column by name using the names()
function is as follows:
dataframe <- dataframe[, !(names(dataframe) %in%
columns_to_drop)]
Here, dataframe
refers to the data frame from which we want to drop the column, and columns_to_drop
is a vector containing the names of the columns to be removed.
The names()
function retrieves the column names of a data frame as a character vector. By using the %in%
operator along with the negation operator !
, we specify which columns to retain and which to drop.
The resulting logical vector is used for subsetting the data frame, effectively removing the specified columns.
Let’s demonstrate the usage of the names()
function to drop a column by name from a data frame. We’ll utilize the same example data frame containing employee information.
Delftstack <- data.frame(
Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
Id = c(101, 102, 103, 104, 105),
Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)
print("The dataframe before dropping the column:-")
print(Delftstack)
columns_to_drop <- c("Name", "Id")
Delftstack <- Delftstack[, !(names(Delftstack) %in%
columns_to_drop)]
print("The dataframe after dropping the column:-")
print(Delftstack)
In this code snippet, we begin by creating a data frame called Delftstack
containing employee information with columns for Name
, LastName
, Id
, and Designation
. Before proceeding, we print the data frame to observe its structure.
Moving on to the actual operation, we define a vector columns_to_drop
containing the names of the columns we wish to remove, which in this case are "Name"
and "Id"
.
We then use the names()
function to retrieve the column names of the data frame and create a logical vector indicating which columns to retain. By negating this vector, we effectively specify which columns to drop.
Finally, we subset the data frame using the logical vector obtained from the names()
function, resulting in a data frame without the specified columns. We print the updated data frame to confirm the success of the operation.
Output:
The output confirms that the Name
and Id
columns have been successfully dropped from the data frame. The updated data frame now contains only the LastName
and Designation
columns, reflecting the desired operation.
How to Drop a Column by Name From a Data Frame in R Using the subset()
Function
In R, the subset()
function provides another method to drop a column by name from a data frame. This approach allows for concise and intuitive data frame manipulation.
The syntax for dropping a column by name using the subset()
function is as follows:
subset(dataframe, select = -c(column_name))
Here, dataframe
refers to the data frame from which we want to drop the column, and column_name
is the name of the column we wish to remove.
The subset()
function is primarily used to create subsets of data frames based on specified conditions. However, it can also be utilized to drop columns by name.
By specifying the select
argument as -c(column_name)
, we indicate that we want to exclude the specified column from the resulting data frame. This effectively removes the column from the data frame.
Let’s demonstrate the usage of the subset()
function to drop a column by name from a data frame. We’ll use the same example data frame.
Delftstack <- data.frame(
Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
Id = c(101, 102, 103, 104, 105),
Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)
print("The dataframe before dropping the column:-")
print(Delftstack)
Delftstack <- subset(Delftstack, select = -c(Name, Id))
print("The dataframe after dropping the column:-")
print(Delftstack)
In this code snippet, we start by creating a data frame called Delftstack
containing employee information with columns for Name
, LastName
, Id
, and Designation
. Before proceeding, we print the data frame to observe its structure.
Next, we use the subset()
function to drop the Name
and Id
columns from the Delftstack
data frame. We specify the columns to be excluded by setting the select
argument as -c(Name, Id)
. This instructs the function to create a data frame excluding the specified columns.
Finally, we print the updated data frame to confirm the success of the operation.
Output:
The output confirms that the Name
and Id
columns have been successfully dropped from the data frame. The updated data frame now contains only the LastName
and Designation
columns, reflecting the desired operation.
How to Drop a Column by Name From a Data Frame in R Using the data.table
Package
In R, the data.table
package offers efficient methods for data manipulation, including dropping a column by name from a data frame. This package provides a concise and powerful approach to working with data frames.
The syntax for dropping a column by name using the data.table
package is as follows:
setDT(dataframe)
dataframe <- dataframe[, !"column_name", with = FALSE]
Here, dataframe
refers to the data frame from which we want to drop the column, and column_name
is the name of the column we wish to remove.
The data.table
package enhances data manipulation capabilities in R by providing a variety of functions optimized for performance. To drop a column by name, we first convert the data frame to a data.table
using the setDT()
function.
Then, we specify the column to be dropped using the [
operator, followed by the column name prefixed with a !
to indicate negation. The with = FALSE
argument ensures that we refer to column names directly rather than evaluating them in the data frame’s environment.
Let’s demonstrate the usage of the data.table
package to drop a column by name from a data frame. We’ll utilize the same example data frame containing employee information.
library(data.table)
Delftstack <- data.frame(
Name = c("Jack", "John", "Mike", "Michelle", "Jhonny"),
LastName = c("Danials", "Cena", "Chandler", "McCool", "Nitro"),
Id = c(101, 102, 103, 104, 105),
Designation = c("CEO", "Project Manager", "Senior Dev", "Junior Dev", "Intern")
)
setDT(Delftstack)
print("The dataframe before dropping the column:-")
print(Delftstack)
Delftstack <- Delftstack[, !"Name", with = FALSE]
print("The dataframe after dropping the column:-")
print(Delftstack)
In this code snippet, we first load the data.table
package to access its functions for efficient data frame manipulation. Then, we create a data frame called Delftstack
containing employee information with columns for Name
, LastName
, Id
, and Designation
.
Before proceeding, we print the data frame to observe its structure.
Next, we convert the data frame to a data.table
using the setDT()
function. This allows us to utilize data.table
-specific operations for efficient manipulation.
We then drop the Name
column from the Delftstack
data.table
by specifying the column name within square brackets, prefixed with a !
for negation. The with = FALSE
argument ensures that column names are referred to directly.
Finally, we print the updated data.table
to confirm the success of the operation.
Output:
The output confirms that the Name
column has been successfully dropped from the data frame. The updated data.table
now contains only the LastName
, Id
, and Designation
columns, reflecting the desired operation.
Conclusion
In conclusion, dropping a column by name from a dataframe in R is a common task in data manipulation. Throughout this article, we explored several methods to accomplish this task using different R packages.
We discussed how to drop a column using the dplyr
package’s select()
function, the names()
function, the subset()
function, and the data.table
package’s efficient operations. Each method offers its syntax and approach, providing flexibility for users to choose the one that best fits their workflow and preferences.
Whether it’s through the concise syntax of dplyr
, the simplicity of names()
, the flexibility of subset()
, or the performance optimization of data.table
, R provides versatile tools for data frame manipulation. By understanding and using these methods, analysts and data scientists can efficiently manage and process data, ensuring it meets the requirements of their analysis or modeling tasks.
Sheeraz is a Doctorate fellow in Computer Science at Northwestern Polytechnical University, Xian, China. He has 7 years of Software Development experience in AI, Web, Database, and Desktop technologies. He writes tutorials in Java, PHP, Python, GoLang, R, etc., to help beginners learn the field of Computer Science.
LinkedIn Facebook