How to Count Unique Values Per Group(s) in Pandas
When we are working with large data sets, sometimes we have to apply some function to a specific group of data. For example, we have a data set of countries
and the private code
they use for private matters. We want to count the number of codes a country uses. Listed below are the different methods from groupby()
to count unique values.
We will use the same DataFrame
in the next sections as follows,
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
print(df)
The following will be the output.
code Countries
0 999 Switzerland
1 113 Switzerland
2 112 Japan
3 112 Switzerland
4 113 Canada
5 114 Japan
6 100 Germany
7 114 Japan
8 115 Germany
df.groupby().nunique()
Method
Let’s see how df.groupby().nunique()
function will groupby
our countries.
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
result = df.groupby("Countries")["code"].nunique()
print(result)
The following will be output.
Countries
Canada 1
Germany 2
Japan 2
Switzerland 3
Name: code, dtype: int64
This shows that Canada is using one code, Germany is using two codes, and so on.
df.groupby().agg()
Method
This method works same as df.groupby().nunique()
. We need pass nunique()
function to agg()
function.
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
df = pd.DataFrame(data, columns=["code", "Countries"])
result = df.groupby(by="Countries", as_index=False).agg({"code": pd.Series.nunique})
print(result)
Output:
Countries code
0 Canada 1
1 Germany 2
2 Japan 2
3 Switzerland 3
.agg({'code': pd.Series.nunique})
It gggregates using function pd.Series.nunique
over the column code
.
df.groupby().unique()
Method
This method is useful when you want to see which country is using which codes.
import pandas as pd
data = [
[999, "Switzerland"],
[113, "Switzerland"],
[112, "Japan"],
[112, "Switzerland"],
[113, "Canada"],
[114, "Japan"],
[100, "Germany"],
[114, "Japan"],
[115, "Germany"],
]
result = df.groupby("Countries")["code"].unique()
print(result)
The following will be output.
Countries
Canada [113]
Germany [100, 115]
Japan [112, 114]
Switzerland [999, 113, 112]
Name: code, dtype: object