How to Groupby Two Columns in Pandas
This tutorial explains how we can use the DataFrame.groupby()
method in Pandas for two columns to separate the DataFrame into groups. We can also gain much more information from the created groups.
We will use the below DataFrame in this article.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Pandas Groupby Multiple Columns
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(["Gender", "Employed"])
for group_key, group_value in groups:
group = groups.get_group(group_key)
print(group)
print("")
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Groups in DataFrame:
Name Gender Employed Age
3 Emma Female No 24
Name Gender Employed Age
0 Jennifer Female Yes 30
4 Luna Female Yes 28
Name Gender Employed Age
1 Travis Male No 28
5 Anish Male No 25
Name Gender Employed Age
2 Bob Male Yes 27
It creates 4 groups from the DataFrame. All the rows with the same value of Gender
and Employed
column are placed in the same group.
Count Number of Rows in Each Group Pandas
To count the number of rows in each created group using the DataFrame.groupby()
method, we can use the size()
method.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(["Gender", "Employed"]).size().reset_index(name="Count")
print(grouped_df)
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Count of Each group:
Gender Employed Count
0 Female No 1
1 Female Yes 2
2 Male No 2
3 Male Yes 1
It displays the DataFrame, groups created from the DataFrame, and the number of entries in each group.
If we want the largest count value for each value in the Employed
column, we can form another group from the created group above and count values and then get the maximum value of count using the max()
method.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
groups = data.groupby(["Gender", "Employed"]).size().groupby(level=1)
print(groups.max())
Output:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Employed
No 2
Yes 2
dtype: int64
It shows the maximum count of values of the Employed
column among created groups from Gender
and Employed
columns.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedInRelated Article - Pandas DataFrame Column
- How to Get Pandas DataFrame Column Headers as a List
- How to Delete Pandas DataFrame Column
- How to Convert Pandas Column to Datetime
- How to Get the Sum of Pandas Column
- How to Change the Order of Pandas DataFrame Columns
- How to Convert DataFrame Column to String in Pandas