Pandas DataFrame 的唯一值計數

Suraj Joshi 2023年1月30日
  1. 使用 Series.value_counts() 計算 DataFrame 中的唯一值
  2. 使用 DataFrame.nunique() 計算 DataFrame 中的唯一值
Pandas DataFrame 的唯一值計數

本教程解釋瞭如何使用 Series.value_counts()DataFrame.nunique() 方法獲得 DataFrame 中所有唯一值的計數。

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print(patients_df)

輸出:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

我們將使用 DataFrame patients_df,其中包含患者的姓名、預約日期和年齡,來解釋如何獲得 DataFrame 中所有唯一值的計數。

使用 Series.value_counts() 計算 DataFrame 中的唯一值

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print("The DataFrame is:")
print(patients_df, "\n")

print("No of appointments for each date:")
print(patients_df["Date"].value_counts())

輸出:

The DataFrame is:
       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

No of appointments for each date:
2020-12-02    3
2020-12-01    2
2020-12-03    1
Name: Date, dtype: int64

它顯示 DataFrame 中 Date 列的每個唯一值的計數。

使用 DataFrame.nunique() 計算 DataFrame 中的唯一值

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print(patients_df, "\n")

print(patients_df.groupby("Date").Name.nunique())

輸出:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

Date
2020-12-01    2
2020-12-02    3
2020-12-03    1
Name: Name, dtype: int64

它根據 Date 列的值將 DataFrame 分割開來,即把 Date 值相同的行放在同一組,然後計算每一個名字在某一組中的出現次數,以瞭解 DataFrame 中 Date 列的每一個唯一值的數量。

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn