How to Pivot a DataFrame in Pandas

  1. Understanding the Pivot Method
  2. Handling MultiIndex with Pivot
  3. Conclusion
  4. FAQ
How to Pivot a DataFrame in Pandas

When working with data in Python, the Pandas library is an indispensable tool for data manipulation and analysis. One of its powerful features is the ability to pivot DataFrames. The pivot() method allows you to rearrange your data based on specified index and column values. Unlike other methods, pivot() does not aggregate data, which means that if multiple values exist for a single index/column combination, they will create a MultiIndex in the columns.

In this article, we will explore how to use the pivot() method effectively, providing clear examples to help you master this essential skill.

Understanding the Pivot Method

The pivot() method in Pandas is designed to reshape your DataFrame without performing any aggregation. This makes it particularly useful when you want to reorganize your data for better visualization or analysis. By specifying the index, columns, and values, you can create a new DataFrame that presents your data in a more meaningful way.

To illustrate the pivot() method, let’s consider a simple example. Suppose you have sales data that includes the salesperson, the product sold, and the amount. You want to rearrange this data to see the total sales amount for each product by each salesperson.

Example DataFrame

First, let’s create a sample DataFrame:

import pandas as pd

data = {
    'Salesperson': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob'],
    'Product': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Amount': [100, 150, 200, 250, 300, 350]
}

df = pd.DataFrame(data)
print(df)

Output:

  Salesperson Product  Amount
0       Alice       A     100
1         Bob       A     150
2       Alice       B     200
3         Bob       B     250
4       Alice       C     300
5         Bob       C     350

This DataFrame contains sales data for two salespeople, Alice and Bob, across three products. Now, let’s pivot this data to see how much each salesperson sold for each product.

Using the Pivot Method

To pivot the DataFrame, you can use the following code:

pivot_df = df.pivot(index='Salesperson', columns='Product', values='Amount')
print(pivot_df)

Output:

Product       A    B    C
Salesperson              
Alice      100  200  300
Bob        150  250  350

In this example, we specified ‘Salesperson’ as the index, ‘Product’ as the columns, and ‘Amount’ as the values. The resulting DataFrame clearly shows the total sales amount for each product by each salesperson.

The pivot() method is particularly useful when you need to analyze data without losing any information. However, keep in mind that if there are multiple entries for the same index/column combination, you will encounter a ValueError. In such cases, you might want to use the pivot_table() method, which allows for aggregation.

Handling MultiIndex with Pivot

As mentioned earlier, if your data contains multiple values for the same index/column pair, the pivot() method will result in a MultiIndex. Let’s explore this with another example.

Example DataFrame with Multiple Values

Consider the following DataFrame:

data_multi = {
    'Salesperson': ['Alice', 'Bob', 'Alice', 'Bob', 'Alice', 'Bob', 'Alice'],
    'Product': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
    'Amount': [100, 150, 200, 250, 300, 350, 400]
}

df_multi = pd.DataFrame(data_multi)
print(df_multi)

Output:

  Salesperson Product  Amount
0       Alice       A     100
1         Bob       A     150
2       Alice       B     200
3         Bob       B     250
4       Alice       C     300
5         Bob       C     350
6       Alice       C     400

In this DataFrame, Alice has two entries for product C. Now let’s pivot this DataFrame.

Pivoting with MultiIndex

Using the pivot() method will result in a MultiIndex:

pivot_multi_df = df_multi.pivot(index='Salesperson', columns='Product', values='Amount')
print(pivot_multi_df)

Output:

Product          A      B          C
Salesperson                     
Alice      100.0  200.0  [300.0, 400.0]
Bob        150.0  250.0      350.0

Here, you can see that the pivot() method has created a MultiIndex for product C, where Alice’s sales amounts are stored as a list.

If you want to flatten the MultiIndex, consider using pivot_table() instead, which allows you to specify an aggregation function. This way, you can easily manage situations where multiple values exist for a specific index/column combination.

Conclusion

Pivots are a powerful way to reshape your data in Pandas, allowing you to view it from different perspectives. The pivot() method is straightforward and effective for organizing your data without aggregation. However, when faced with multiple values for the same index/column pair, you may need to use pivot_table() to aggregate the data properly. By mastering these techniques, you can enhance your data analysis skills and gain deeper insights into your datasets.

FAQ

  1. What is the difference between pivot() and pivot_table() in Pandas?
    pivot() reshapes data without aggregation, while pivot_table() allows for aggregation when multiple values exist.

  2. Can pivot() handle duplicate values in the DataFrame?
    No, pivot() will raise a ValueError if there are duplicate values for the specified index/column combination.

  3. How do I reset the index after pivoting a DataFrame?
    You can use the reset_index() method on the pivoted DataFrame to reset the index.

  4. Is it possible to pivot data with multiple index columns?
    Yes, you can specify multiple columns for the index parameter in the pivot() method.

  5. Can I use pivot() with non-numeric data?
    Yes, pivot() can be used with non-numeric data, but keep in mind that it won’t perform any aggregation.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe
Author: Fariba Laiq
Fariba Laiq avatar Fariba Laiq avatar

I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.

LinkedIn

Related Article - Pandas DataFrame