Pandas 由兩列來 groupby
Suraj Joshi
2023年1月30日
本教程介紹瞭如何在 Pandas 中使用 DataFrame.groupby()
方法將兩列的 DataFrame 分成若干組。我們還可以從建立的組中獲得更多的資訊。
我們將在本文中使用下面的 DataFrame。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
輸出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Pandas Groupby 多列分組
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(["Gender", "Employed"])
for group_key, group_value in groups:
group = groups.get_group(group_key)
print(group)
print("")
輸出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Groups in DataFrame:
Name Gender Employed Age
3 Emma Female No 24
Name Gender Employed Age
0 Jennifer Female Yes 30
4 Luna Female Yes 28
Name Gender Employed Age
1 Travis Male No 28
5 Anish Male No 25
Name Gender Employed Age
2 Bob Male Yes 27
它從 DataFrame 中建立了 4 個組。所有 Gender
和 Employed
列值相同的行都會被放在同一個組。
計算每組的行數 Pandas
要使用 DataFrame.groupby()
方法統計每個建立的組的行數,我們可以使用 size()
方法。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(["Gender", "Employed"]).size().reset_index(name="Count")
print(grouped_df)
輸出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Count of Each group:
Gender Employed Count
0 Female No 1
1 Female Yes 2
2 Male No 2
3 Male Yes 1
它顯示 DataFrame,從 DataFrame 中建立的組,以及每個組的元素數。
如果我們想得到 Employed
列中每個值的最大計數值,我們可以從上面建立的組再組成一個組,並對值進行計數,然後使用 max()
方法得到計數的最大值。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
data = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Employed": ["Yes", "No", "Yes", "No", "Yes", "No"],
"Age": [30, 28, 27, 24, 28, 25],
}
)
print(data)
print("")
groups = data.groupby(["Gender", "Employed"]).size().groupby(level=1)
print(groups.max())
輸出:
Name Gender Employed Age
0 Jennifer Female Yes 30
1 Travis Male No 28
2 Bob Male Yes 27
3 Emma Female No 24
4 Luna Female Yes 28
5 Anish Male No 25
Employed
No 2
Yes 2
dtype: int64
它顯示了從 Gender
和 Employed
列建立的組中,Employed
列值的最大計數。
作者: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn相關文章 - Pandas DataFrame Column
- 如何將 Pandas DataFrame 列標題獲取為列表
- 如何刪除 Pandas DataFrame 列
- 如何在 Pandas 中將 DataFrame 列轉換為日期時間
- 如何獲得 Pandas 列中元素總和
- 如何更改 Panas DataFrame 列的順序
- 如何在 Pandas 中將 DataFrame 列轉換為字串