How to Drop Columns by Index in Pandas DataFrame
DataFrames can be very large and can contain hundreds of rows and columns. It is necessary to be proficient in basic maintenance operations of a DataFrame, like dropping multiple columns. We can use the dataframe.drop()
method to drop columns or rows from the DataFrame depending on the axis
specified, 0 for rows and 1 for columns. It identifies the elements to be removed based on some labels. For example, we will drop column 'a'
from the following DataFrame.
import pandas as pd
df = pd.DataFrame(
[[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)
print(df)
df.drop(["a"], axis=1, inplace=True)
print(df)
Output:
a b c d
0 10 6 7 8
1 1 9 12 14
2 5 8 10 6
b c d
0 6 7 8
1 9 12 14
2 8 10 6
Notice the use of the inplace
parameter in the drop function. With the inplace
parameter set as True
, the columns are removed from the original DataFrame; otherwise, a copy of the original is returned.
In our example, we have removed column 'a'
, but we need to pass its label name to the dataframe.drop()
function. When dealing with large datasets, we should handle such tasks for many columns at once and by using column indexes instead of their names.
We can achieve this by using the dataframe.columns()
method, which returns all the columns of a DataFrame and passing the required column labels using their indexes to the dataframe.drop()
function. The following code snippet explains how we can do this.
import pandas as pd
df = pd.DataFrame(
[[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)
df.drop(df.columns[[1, 2]], axis=1, inplace=True)
print(df)
Output:
a d
0 10 8
1 1 14
2 5 6
It drops columns whose index is 1
or 2
.
We can also avoid using the axis
parameter by merely mentioning the columns
parameter in the dataframe.drop()
function, which automatically indicates that columns are to be deleted. Example:
import pandas as pd
df = pd.DataFrame(
[[10, 6, 7, 8], [1, 9, 12, 14], [5, 8, 10, 6]], columns=["a", "b", "c", "d"]
)
df.drop(columns=df.columns[[1, 2]], inplace=True)
print(df)
Output:
a d
0 10 8
1 1 14
2 5 6
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn