How to Select Multiple Columns in Pandas Dataframe
-
Use
__getitem__
Syntax ([]
) to Select Multiple Columns -
Use
iloc()
andloc()
Methods to Select Multiple Columns in Pandas
We may face problems when extracting data of multiple columns from a Pandas DataFrame, mainly because they treat the Dataframe like a 2-dimensional array. To select multiple columns from a DataFrame, we can use either the basic indexing method by passing column names list to the getitem
syntax ([]
), or iloc()
and loc()
methods provided by Pandas library. For this tutorial, we will select multiple columns from the following DataFrame.
Example DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])
print(df)
Output:
a b c d
0 0.255086 0.282203 0.342223 0.263599
1 0.744271 0.591687 0.861554 0.871859
2 0.420066 0.713664 0.770193 0.207427
3 0.014447 0.352515 0.535801 0.119759
Use __getitem__
Syntax ([]
) to Select Multiple Columns
By storing the names of the columns to be extracted in a list and then passing it to the []
, we can select multiple columns from the DataFrame. The following code will explain how we can select columns a
and c
from the previously shown DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])
print(df[["a", "c"]])
Output:
a c
0 0.255086 0.342223
1 0.744271 0.861554
2 0.420066 0.770193
3 0.014447 0.535801
Use iloc()
and loc()
Methods to Select Multiple Columns in Pandas
We can also use the iloc()
and loc()
methods to select multiple columns.
When we want to use the column indexes to extract them, we can use iloc()
as shown in the below example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])
print(df.iloc[:, [0, 2]])
Output:
a c
0 0.255086 0.342223
1 0.744271 0.861554
2 0.420066 0.770193
3 0.014447 0.535801
Similarly, we can use loc()
when we want to select columns using their names as shown below:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(4, 4), columns=["a", "b", "c", "d"])
print(df.loc[:, ["a", "c"]])
Output:
a c
0 0.255086 0.342223
1 0.744271 0.861554
2 0.420066 0.770193
3 0.014447 0.535801
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn