Pandas groupby 計數
Suraj Joshi
2023年1月30日
-
使用
Series.value_counts()
方法計算唯一行值的數值 -
使用
DataFrame.groupby()
函式對 DataFrame 組的值進行計數 -
使用
pandas.DataFrame.agg()
方法獲取每組的多個統計值
本教程解釋瞭如何從 DataFrame.groupby()
方法中獲取像 count
、sum
、max
等派生組的統計資料。
我們將用上面例子中所示的 automobile_data_df
來解釋這些概念。DataFrame 由員工和他們使用的汽車和自行車品牌組成。
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha"],
["Toyota", "Ducati"],
["Volkswagen", "BMW"],
["Cherlovet", "Kawasaki"],
["Toyota", "Ducati"],
["Toyota", "BMW"],
["Volkswagen", "BMW"],
]
column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
print(automobile_data_df)
輸出:
Car Brand Motorbike Brand
Steven Chevrolet Yamaha
Robert Toyota Ducati
Jimmy Volkswagen BMW
Mike Cherlovet Kawasaki
Phoenix Toyota Ducati
Scott Toyota BMW
John Volkswagen BMW
使用 Series.value_counts()
方法計算唯一行值的數值
如果我們想要一個 DataFrame 中單列的每個唯一值的計數,我們可以使用 Series.value_counts()
方法。
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha"],
["Toyota", "Ducati"],
["Volkswagen", "BMW"],
["Cherlovet", "Kawasaki"],
["Toyota", "Ducati"],
["Toyota", "BMW"],
["Volkswagen", "BMW"],
]
column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
print(automobile_data_df["Car Brand"].value_counts())
輸出:
Toyota 3
Volkswagen 2
Cherlovet 1
Chevrolet 1
Name: Car Brand, dtype: int64
它將列印 automobile_data_df
中 Car Brand
列中所有的唯一值及其計數。該方法只適用於我們希望一次只看到一列的數值計數。
使用 DataFrame.groupby()
函式對 DataFrame 組的值進行計數
我們使用 DataFrame.groupby()
函式將一個 DataFrame 分割成多個組,然後使用 size()
方法計算每個建立組的值。
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha"],
["Toyota", "Ducati"],
["Volkswagen", "BMW"],
["Cherlovet", "Kawasaki"],
["Toyota", "Ducati"],
["Toyota", "BMW"],
["Volkswagen", "BMW"],
]
column_names = ["Car Brand", "Motorbike Brand"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
group = automobile_data_df.groupby(["Car Brand", "Motorbike Brand"])
print(group.size().reset_index(name="counts"))
輸出:
Car Brand Motorbike Brand counts
0 Cherlovet Kawasaki 1
1 Chevrolet Yamaha 1
2 Toyota BMW 1
3 Toyota Ducati 2
4 Volkswagen BMW 2
它列印一個 DatFrame,其中包含了所有來自 automobile_data_df
的 Car Brand
和 Motorbike Brand
列的值的唯一組合,以及這些組合在 automobile_data_df
DataFrame 中出現的次數。
groupby()
方法將 automobile_data_df
分割成組。Car Brand
和 Motorbike Brand
兩列數值相同的行將被放置在同一組中。size()
方法會給出每組的值的數量,最後我們根據每組的值的數量生成 DataFrame。
使用 pandas.DataFrame.agg()
方法獲取每組的多個統計值
import pandas as pd
automobile_data = [
["Chevrolet", "Yamaha", 4000],
["Toyota", "Ducati", 5000],
["Volkswagen", "BMW", 4500],
["Cherlovet", "Kawasaki", 3800],
["Toyota", "Ducati", 4500],
["Toyota", "BMW", 5000],
["Volkswagen", "BMW", 6000],
]
column_names = ["Car Brand", "Motorbike Brand", "Monthly Salary"]
index_names = ["Steven", "Robert", "Jimmy", "Mike", "Phoenix", "Scott", "John"]
automobile_data_df = pd.DataFrame(
automobile_data, index=index_names, columns=column_names
)
group = automobile_data_df.groupby(["Car Brand", "Motorbike Brand"])
print(group.agg({"Monthly Salary": [min, max, sum]}))
輸出:
Monthly Salary
min max sum
Car Brand Motorbike Brand
Cherlovet Kawasaki 3800 3800 3800
Chevrolet Yamaha 4000 4000 4000
Toyota BMW 5000 5000 5000
Ducati 4500 5000 9500
Volkswagen BMW 4500 6000 10500
它顯示每個生成組的 Monthly Salary
值的最小值、最大值和總和。
作者: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn