Pandas loc vs iloc

Suraj Joshi 30 gennaio 2023 Pandas Pandas Filter

Selezionare un valore particolare da DataFrame specificando l’indice e l’etichetta della colonna utilizzando il metodo .loc()
Seleziona colonne particolari dal DataFrame usando il metodo .loc()
Filtrare le righe applicando la condizione alle colonne utilizzando il metodo .loc()
Filtra righe con indici utilizzando iloc
Filtra righe e colonne particolari dal DataFrame
Filtra l’intervallo di righe e colonne da DataFrame utilizzando iloc
Pandas loc vs iloc

Questo tutorial spiega come filtrare i dati da un Pandas DataFrame usando loc e iloc in Python. Per filtrare le voci dal DataFrame usando iloc usiamo l’indice intero per righe e colonne, e per filtrare le voci dal DataFrame usando loc, usiamo nomi di riga e colonna.

Per dimostrare il filtraggio dei dati usando loc, useremo il DataFrame descritto nel seguente esempio.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print(student_df)

Produzione:

        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Selezionare un valore particolare da DataFrame specificando l’indice e l’etichetta della colonna utilizzando il metodo `.loc()`

Possiamo passare un’etichetta di indice e un’etichetta di colonna come argomento al metodo .loc() per estrarre il valore corrispondente all’indice e all’etichetta di colonna specificati.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)
print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The Grade of student with Roll No. 504 is:")
value = student_df.loc[504, "Grade"]
print(value)

Produzione:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The Grade of student with Roll No. 504 is:
A-

Seleziona il valore nel DataFrame con etichetta indice come 504 e etichetta di colonna Grado. Il primo argomento del metodo .loc() rappresenta il nome dell’indice, mentre il secondo argomento si riferisce al nome della colonna.

Seleziona colonne particolari dal DataFrame usando il metodo `.loc()`

Possiamo anche filtrare le colonne richieste dal DataFrame utilizzando il metodo .loc(). Passiamo l’lista dei nomi di colonna richiesti come secondo argomento al metodo .loc() per filtrare le colonne specificate.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The name and age of students in the DataFrame are:")
value = student_df.loc[:, ["Name", "Age"]]
print(value)

Produzione:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The name and age of students in the DataFrame are:
        Name Age
501    Alice   17
502   Steven   20
503 Neesham   18
504    Chris   21
505    Alice   15

Il primo argomento di .loc() è :, che denota tutte le righe nel DataFrame. Allo stesso modo passiamo ["Name", "Age"] come secondo argomento al metodo .loc() che rappresenta di selezionare solo le colonne Name e Age dal DataFrame.

Filtrare le righe applicando la condizione alle colonne utilizzando il metodo `.loc()`

Possiamo anche filtrare le righe che soddisfano la condizione specificata per i valori delle colonne utilizzando il metodo .loc().

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Students with Grade A are:")
value = student_df.loc[student_df.Grade == "A"]
print(value)

Produzione:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Students with Grade A are:
      Name Age      City Grade
501 Alice   17 New York     A
505 Alice   15    Austin     A

Seleziona tutti gli studenti nel DataFrame con voto A.

Filtra righe con indici utilizzando `iloc`

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("2nd and 3rd rows in the DataFrame:")
filtered_rows = student_df.iloc[[1, 2]]
print(filtered_rows)

Produzione:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

2nd and 3rd rows in the DataFrame:
        Name  Age      City Grade
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+

Filtra la seconda e la terza riga dal DataFrame.

Passiamo l’indice intero delle righe come argomento al metodo iloc per filtrare le righe dal DataFrame. Qui, l’indice intero per la seconda e la terza riga è 1 e 2 rispettivamente, poiché l’indice inizia da 0.

Filtra righe e colonne particolari dal DataFrame

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(filtered_values)

Produzione:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

Filtra la prima e l’ultima colonna cioè Name e Grade della seconda, terza e quarta riga dal DataFrame. Passiamo la lista con indici interi della riga come primo argomento e la lista con indici interi della colonna come secondo argomento al metodo iloc.

Filtra l’intervallo di righe e colonne da DataFrame utilizzando `iloc`

Per filtrare l’intervallo di righe e colonne, possiamo usare la suddivisione in liste e passare le sezioni per ogni riga e colonna come argomento al metodo iloc.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[1:4, 0:2]
print(filtered_values)

Produzione:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

Seleziona la seconda, la terza e la quarta riga e la prima e la seconda colonna dal DataFrame. 1:4 rappresenta le righe con un indice che va da 1 a 3 e 4 è esclusivo nell’intervallo. Allo stesso modo, 0:2 rappresenta le colonne con un indice che va da 0 a 1.