計數 Pandas 中每組的唯一值
Ahmed Waheed
2023年1月30日
當我們處理大型資料集時,有時我們必須將某些功能應用於特定的資料組。例如,我們有一個國家資料集和用於私人事務的私人程式碼資料集。我們要計算一個國家使用的程式碼數量。下面列出了計數唯一值的不同方法。
在以下各節中,我們將使用相同的 DataFrame
,如下所示:
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
print(df)
以下是輸出。
code Countries
0 999 Switzerland
1 113 Switzerland
2 112 Japan
3 112 Switzerland
4 113 Canada
5 114 Japan
6 100 Germany
7 114 Japan
8 115 Germany
df.groupby().nunique()
方法
讓我們看看 df.groupby().nunique()
函式如何對我們的國家進行分組。
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
result = df.groupby("Countries")["code"].nunique()
print(result)
輸出:
Countries
Canada 1
Germany 2
Japan 2
Switzerland 3
Name: code, dtype: int64
這表明加拿大使用的是一個程式碼,德國使用的是兩個程式碼,依此類推。
df.groupby().agg()
方法
該方法與 df.groupby().nunique()
相同。我們需要將 nunique()
函式傳遞給 agg()
函式。
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
result = df.groupby(by="Countries", as_index=False).agg({"code": pd.Series.nunique})
print(result)
輸出:
Countries code
0 Canada 1
1 Germany 2
2 Japan 2
3 Switzerland 3
.agg({'code': pd.Series.nunique})
它使用 pd.Series.nunique
函式對列 code
進行聚集。
df.groupby().unique()
方法
當你想檢視哪個國家使用哪些程式碼時,此方法很有用。
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
result = df.groupby("Countries")["code"].unique()
print(result)
輸出:
Countries
Canada [113]
Germany [100, 115]
Japan [112, 114]
Switzerland [999, 113, 112]
Name: code, dtype: object