How to GroupBy and Aggregate Multiple Columns in Pandas
The Pandas library is a powerful data analysis library in Python. We can perform many different types of manipulation on a dataframe using Pandas in Python.
groupby()
is a method that splits the data into multiple groups based on specific criteria. After that, we can perform certain operations on the grouped data.
Apply the groupby()
and the aggregate()
Functions on Multiple Columns in Pandas Python
Sometimes we need to group the data from multiple columns and apply some aggregate()
methods. The aggregate()
methods are those methods that combine the values from multiple rows and return a single value, for example, count()
, size()
, mean()
, sum()
, mean()
, etc.
In the following code, we have the students’ data that contains redundant values for some columns. If you want to group the data based on the student’s Name
and Section
to get their total marks, we will group the data according to the name and section and then calculate the total marks using the aggregate()
method.
We have stored the returned result and displayed it.
Example Code:
# Python 3.x
import pandas as pd
student = {
"Name": ["Jhon", "Alia", "Jhon", "Alia", "Sam", "Smith", "Sam", "Smith"],
"Section": ["A", "A", "A", "A", "B", "B", "B", "B"],
"Course": ["Java", "Java", "Dart", "Dart", "Java", "Java", "Dart", "Dart"],
"Marks": [70, 80, 90, 60, 80, 90, 50, 80],
}
df = pd.DataFrame(student)
display(df)
result = df.groupby(["Name", "Section"]).aggregate("sum")
display(result)
Output:
We can also perform multiple aggregate
operations at a time. We will pass the list of operation names to the aggregate()
method.
Here, we have calculated students’ mean and total marks at once using the aggregate()
method by passing the list of operation names.
Example Code:
# Python 3.x
import pandas as pd
student = {
"Name": ["Jhon", "Alia", "Jhon", "Alia", "Sam", "Smith", "Sam", "Smith"],
"Section": ["A", "A", "A", "A", "B", "B", "B", "B"],
"Course": ["Java", "Java", "Dart", "Dart", "Java", "Java", "Dart", "Dart"],
"Marks": [70, 80, 90, 60, 80, 90, 50, 80],
}
df = pd.DataFrame(student)
display(df)
result = df.groupby(["Name", "Section"]).aggregate(["mean", "sum"])
display(result)
Output:
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn