Pandas Groupby Count
-
Count Value of Unique Row Values Using
Series.value_counts()
Method -
Count Values of DataFrame Groups Using
DataFrame.groupby()
Function -
Get Multiple Statistics Values of Each Group Using
pandas.DataFrame.agg()
Method
This tutorial explains how we can get statistics like count
, sum
, max
and much more for groups derived using the DataFrame.groupby()
method.
We will use the automobile_data_df
shown in the above example to explain the concepts. The DataFrame consists of employees, and the car and bike brands used by them.
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha"],
["Toyota", "Ducati"],
["Volkswagen", "BMW"],
["Cherlovet", "Kawasaki"],
["Toyota", "Ducati"],
["Toyota", "BMW"],
["Volkswagen", "BMW"],
]
column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
print(automobile_data_df)
Output:
Car Brand Motorbike Brand
Steven Chevrolet Yamaha
Robert Toyota Ducati
Jimmy Volkswagen BMW
Mike Cherlovet Kawasaki
Phoenix Toyota Ducati
Scott Toyota BMW
John Volkswagen BMW
Count Value of Unique Row Values Using Series.value_counts()
Method
If we want counts of each unique value of a single column in a DataFrame, we can use the Series.value_counts()
method.
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha"],
["Toyota", "Ducati"],
["Volkswagen", "BMW"],
["Cherlovet", "Kawasaki"],
["Toyota", "Ducati"],
["Toyota", "BMW"],
["Volkswagen", "BMW"],
]
column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
print(automobile_data_df["Car Brand"].value_counts())
Output:
Toyota 3
Volkswagen 2
Cherlovet 1
Chevrolet 1
Name: Car Brand, dtype: int64
It will print all the unique values in the Car Brand
column of automobile_data_df
along with their counts. The method is only applicable if we want to see value counts of only one column at a time.
Count Values of DataFrame Groups Using DataFrame.groupby()
Function
We use the DataFrame.groupby()
function to split a DataFrame into multiple groups and then use the size()
method to count each created group’s values.
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha"],
["Toyota", "Ducati"],
["Volkswagen", "BMW"],
["Cherlovet", "Kawasaki"],
["Toyota", "Ducati"],
["Toyota", "BMW"],
["Volkswagen", "BMW"],
]
column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
group = automobile_data_df.groupby(["Car Brand", "Motorbike Brand"])
print(group.size().reset_index(name="counts"))
Output:
Car Brand Motorbike Brand counts
0 Cherlovet Kawasaki 1
1 Chevrolet Yamaha 1
2 Toyota BMW 1
3 Toyota Ducati 2
4 Volkswagen BMW 2
It prints a DatFrame with all unique combinations of values of Car Brand
and Motorbike Brand
columns from the automobile_data_df
along with the count of occurrences of these combinations in the automobile_data_df
DataFrame.
The groupby()
method splits the automobile_data_df
into groups. The rows with the same values of Car Brand
and Motorbike Brand
columns will be placed in the same group. The size()
method will give the count of values in each group and finally we generate DataFrame from the count of values in each group.
Get Multiple Statistics Values of Each Group Using pandas.DataFrame.agg()
Method
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha", 4000],
["Toyota", "Ducati", 5000],
["Volkswagen", "BMW", 4500],
["Cherlovet", "Kawasaki", 3800],
["Toyota", "Ducati", 4500],
["Toyota", "BMW", 5000],
["Volkswagen", "BMW", 6000],
]
column_names = ["Car Brand", "Motorbike Brand", "Monthly Salary"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
group = automobile_data_df.groupby(["Car Brand", "Motorbike Brand"])
print(group.agg({"Monthly Salary": [min, max, sum]}))
Output:
Monthly Salary
min max sum
Car Brand Motorbike Brand
Cherlovet Kawasaki 3800 3800 3800
Chevrolet Yamaha 4000 4000 4000
Toyota BMW 5000 5000 5000
Ducati 4500 5000 9500
Volkswagen BMW 4500 6000 10500
It displays the minimum, maximum, and sum of Monthly Salary
values for each generated group.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn