How to Select Multiple Columns in Pandas Dataframe

  1. Using Getitem Syntax
  2. Using iloc()
  3. Using loc()
  4. Conclusion
  5. FAQ
How to Select Multiple Columns in Pandas Dataframe

Selecting multiple columns from a Pandas DataFrame is a fundamental skill for anyone working with data in Python. Whether you’re analyzing data for insights or preparing it for visualization, knowing how to efficiently access multiple columns can save you time and streamline your workflow.

In this tutorial, we’ll explore three different methods: using the getitem syntax (the [] operator), the iloc() method, and the loc() method. Each method has its own advantages and is suited to different scenarios. By the end of this article, you’ll have a clear understanding of how to select multiple columns in a Pandas DataFrame, along with practical code examples to enhance your skills.

Using Getitem Syntax

The simplest way to select multiple columns in a Pandas DataFrame is by using the getitem syntax, which involves enclosing the column names in a list within square brackets. This method is intuitive and straightforward, making it a favorite among many data analysts.

Here’s how you can do it:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 35],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
    'Salary': [70000, 80000, 65000, 90000]
}

df = pd.DataFrame(data)

selected_columns = df[['Name', 'Salary']]
print(selected_columns)

Output:

      Name  Salary
0    Alice   70000
1      Bob   80000
2  Charlie   65000
3    David   90000

In this example, we created a DataFrame with four columns: Name, Age, City, and Salary. By using the getitem syntax, we selected only the ‘Name’ and ‘Salary’ columns. This method is particularly useful when you know the exact names of the columns you want to retrieve. It’s also versatile, allowing you to select as many columns as needed by simply adding their names to the list. This approach is highly readable and is often the first choice for many data manipulation tasks in Pandas.

Using iloc()

The iloc() method is another powerful way to select multiple columns from a DataFrame. This method is index-based, meaning you can specify the integer positions of the columns you want to select. This is particularly useful when you don’t know the column names or when you’re working with large datasets.

Here’s an example of using iloc():

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 35],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
    'Salary': [70000, 80000, 65000, 90000]
}

df = pd.DataFrame(data)

selected_columns = df.iloc[:, [0, 3]]
print(selected_columns)

Output:

      Name  Salary
0    Alice   70000
1      Bob   80000
2  Charlie   65000
3    David   90000

In this example, we used iloc() to select the first and fourth columns of the DataFrame, which correspond to ‘Name’ and ‘Salary’, respectively. The colon (:) before the comma indicates that we want to include all rows. The list [0, 3] specifies the column indices we want to select. This method is particularly beneficial when working with DataFrames that have many columns or when you want to select columns based on their position rather than their names. It provides flexibility and can be a quick way to access data without needing to remember the exact column names.

Using loc()

The loc() method is similar to iloc(), but it is label-based, meaning you can select columns using their names. This method is particularly useful when you want to filter rows based on certain conditions while also selecting specific columns.

Here’s how to use loc():

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 35],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
    'Salary': [70000, 80000, 65000, 90000]
}

df = pd.DataFrame(data)

selected_columns = df.loc[:, ['City', 'Age']]
print(selected_columns)

Output:

           City  Age
0      New York   24
1  Los Angeles   30
2       Chicago   22
3       Houston   35

In the example above, we used loc() to select the ‘City’ and ‘Age’ columns. The colon (:) again indicates that we want to include all rows. The list ['City', 'Age'] specifies the column names we want to retrieve. This method is particularly advantageous when you want to filter rows based on specific conditions while also selecting certain columns. For example, you could modify the selection to include only rows where the age is greater than 25, enhancing your data analysis capabilities.

Conclusion

Selecting multiple columns in a Pandas DataFrame is a crucial skill for data manipulation and analysis. Whether you prefer the intuitive getitem syntax, the index-based iloc(), or the label-based loc(), each method has its unique strengths. By mastering these techniques, you can efficiently access and analyze your data, enhancing your productivity and insight generation. Practice these methods with your own datasets, and you’ll find that selecting multiple columns becomes second nature.

FAQ

  1. How do I select columns by their names in Pandas?
    You can use the getitem syntax by enclosing the column names in a list within square brackets, like this: df[['column1', 'column2']].

  2. What is the difference between iloc() and loc() in Pandas?
    iloc() is index-based and selects columns using their integer positions, while loc() is label-based and selects columns using their names.

  1. Can I select both rows and columns using loc()?
    Yes, you can specify both rows and columns using the syntax df.loc[row_indexer, column_indexer].

  2. Is it possible to select columns based on conditions?
    Yes, you can first filter the DataFrame based on your conditions and then select the desired columns using either loc() or iloc().

  3. What should I do if I want to select a large number of columns?
    If you want to select many columns, using the getitem syntax with a list of column names is often the easiest approach. Alternatively, you can use iloc() for index-based selection.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe
Author: Manav Narula
Manav Narula avatar Manav Narula avatar

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

LinkedIn

Related Article - Pandas DataFrame