Pandas contam valores únicos

Suraj Joshi 30 janeiro 2023
  1. Contar valores únicos num DataFrame utilizando Series.value_counts()
  2. Contagem de valores únicos num DataFrame utilizando DataFrame.nunique()
Pandas contam valores únicos

Este tutorial explica como podemos contar todos os valores únicos num DataFrame utilizando os métodos Series.value_counts() e DataFrame.nunique().

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print(patients_df)

Resultado:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

Utilizaremos o DataFrame patients_df, que contém os nomes dos pacientes, a sua data de consulta, e idade, para explicar como podemos obter a contagem de todos os valores únicos num DataFrame.

Contar valores únicos num DataFrame utilizando Series.value_counts()

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print("The DataFrame is:")
print(patients_df, "\n")

print("No of appointments for each date:")
print(patients_df["Date"].value_counts())

Resultado:

The DataFrame is:
       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

No of appointments for each date:
2020-12-02    3
2020-12-01    2
2020-12-03    1
Name: Date, dtype: int64

Mostra a contagem de cada valor único da coluna DataFrame no DataFrame.

Contagem de valores únicos num DataFrame utilizando DataFrame.nunique()

import pandas as pd

patients_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Date": [
            "2020-12-01",
            "2020-12-01",
            "2020-12-02",
            "2020-12-02",
            "2020-12-02",
            "2020-12-03",
        ],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

print(patients_df, "\n")

print(patients_df.groupby("Date").Name.nunique())

Resultado:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

Date
2020-12-01    2
2020-12-02    3
2020-12-03    1
Name: Name, dtype: int64

Divide o DataFrame com base no valor da coluna Date, ou seja, linhas com o mesmo valor de Date são colocadas no mesmo grupo e depois conta a ocorrência de cada nome num determinado grupo para saber a contagem de cada valor único da coluna Date no DataFrame.

Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn