Pandas 由两列来 groupby
Suraj Joshi
2023年1月30日
本教程介绍了如何在 Pandas 中使用 DataFrame.groupby()
方法将两列的 DataFrame 分成若干组。我们还可以从创建的组中获得更多的信息。
我们将在本文中使用下面的 DataFrame。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
输出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Pandas Groupby 多列分组
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(["Gender", "Employed"])
for group_key, group_value in groups:
group = groups.get_group(group_key)
print(group)
print("")
输出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Groups in DataFrame:
Name Gender Employed Age
3 Emma Female No 24
Name Gender Employed Age
0 Jennifer Female Yes 30
4 Luna Female Yes 28
Name Gender Employed Age
1 Travis Male No 28
5 Anish Male No 25
Name Gender Employed Age
2 Bob Male Yes 27
它从 DataFrame 中创建了 4 个组。所有 Gender
和 Employed
列值相同的行都会被放在同一个组。
计算每组的行数 Pandas
要使用 DataFrame.groupby()
方法统计每个创建的组的行数,我们可以使用 size()
方法。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(["Gender", "Employed"]).size().reset_index(name="Count")
print(grouped_df)
输出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Count of Each group:
Gender Employed Count
0 Female No 1
1 Female Yes 2
2 Male No 2
3 Male Yes 1
它显示 DataFrame,从 DataFrame 中创建的组,以及每个组的元素数。
如果我们想得到 Employed
列中每个值的最大计数值,我们可以从上面创建的组再组成一个组,并对值进行计数,然后使用 max()
方法得到计数的最大值。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
groups = data.groupby(["Gender", "Employed"]).size().groupby(level=1)
print(groups.max())
输出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Employed
No 2
Yes 2
dtype: int64
它显示了从 Gender
和 Employed
列创建的组中,Employed
列值的最大计数。
作者: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn相关文章 - Pandas DataFrame Column
- 如何将 Pandas DataFrame 列标题获取为列表
- 如何删除 Pandas DataFrame 列
- 如何在 Pandas 中将 DataFrame 列转换为日期时间
- 如何获得 Pandas 列中元素总和
- 如何更改 Panas DataFrame 列的顺序
- 如何在 Pandas 中将 DataFrame 列转换为字符串