Pandas loc vs iloc

Suraj Joshi 30 enero 2023 Pandas Pandas Filter

Seleccione un valor particular de DataFrame especificando el índice y la etiqueta de columna usando el método .loc()
Seleccione columnas particulares del DataFrame usando el método .loc()
Filtrar filas aplicando la condición a las columnas mediante el método .loc()
Filtrar filas con índices usando iloc
Filtrar filas y columnas particulares del DataFrame
Filtrar el rango de filas y columnas de DataFrame usando iloc
Pandas loc vs iloc

Este tutorial explica cómo podemos filtrar datos de un Pandas DataFrame usando loc e iloc en Python. Para filtrar entradas del DataFrame usando iloc usamos el índice entero para filas y columnas, y para filtrar entradas del DataFrame usando loc, usamos nombres de filas y columnas.

Para demostrar el filtrado de datos usando loc, usaremos el DataFrame descrito en el siguiente ejemplo.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print(student_df)

Producción :

        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Seleccione un valor particular de DataFrame especificando el índice y la etiqueta de columna usando el método `.loc()`

Podemos pasar una etiqueta de índice y una etiqueta de columna como argumento al método .loc() para extraer el valor correspondiente al índice y la etiqueta de columna dados.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)
print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The Grade of student with Roll No. 504 is:")
value = student_df.loc[504, "Grade"]
print(value)

Producción :

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The Grade of student with Roll No. 504 is:
A-

Selecciona el valor en el DataFrame con etiqueta de índice como 504 y etiqueta de columna Grado. El primer argumento del método .loc() representa el nombre del índice, mientras que el segundo argumento se refiere al nombre de la columna.

Seleccione columnas particulares del DataFrame usando el método `.loc()`

También podemos filtrar las columnas requeridas del DataFrame usando el método .loc(). Pasamos la lista de nombres de columna requeridos como un segundo argumento al método .loc() para filtrar columnas especificadas.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The name and age of students in the DataFrame are:")
value = student_df.loc[:, ["Name", "Age"]]
print(value)

Producción :

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The name and age of students in the DataFrame are:
        Name Age
501    Alice   17
502   Steven   20
503 Neesham   18
504    Chris   21
505    Alice   15

El primer argumento de .loc() es :, que denota todas las filas del DataFrame. De manera similar, pasamos ["Name", "Age"] como segundo argumento al método .loc() que representa seleccionar sólo las columnas Name y Age del DataFrame.

Filtrar filas aplicando la condición a las columnas mediante el método `.loc()`

También podemos filtrar filas que satisfagan la condición especificada para valores de columna usando el método .loc().

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Students with Grade A are:")
value = student_df.loc[student_df.Grade == "A"]
print(value)

Producción :

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Students with Grade A are:
      Name Age      City Grade
501 Alice   17 New York     A
505 Alice   15    Austin     A

Selecciona a todos los estudiantes en el DataFrame con calificación A.

Filtrar filas con índices usando `iloc`

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("2nd and 3rd rows in the DataFrame:")
filtered_rows = student_df.iloc[[1, 2]]
print(filtered_rows)

Producción :

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

2nd and 3rd rows in the DataFrame:
        Name  Age      City Grade
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+

Filtra la segunda y tercera filas del DataFrame.

Pasamos el índice entero de las filas como argumento al método iloc para filtrar las filas del DataFrame. Aquí, el índice entero para la segunda y tercera filas es 1 y 2 respectivamente, ya que el índice comienza desde 0.

Filtrar filas y columnas particulares del DataFrame

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(filtered_values)

Producción :

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

Filtra la primera y la última columna, es decir, Name y Grado de la segunda, tercera y cuarta fila del DataFrame. Pasamos la lista con índices enteros de la fila como primer argumento y la lista con índices enteros de la columna como segundo argumento al método iloc.

Filtrar el rango de filas y columnas de DataFrame usando `iloc`

Para filtrar el rango de filas y columnas, podemos usar la división de listas y pasar las secciones de cada fila y columna como argumento al método iloc.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[1:4, 0:2]
print(filtered_values)

Producción :

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

Selecciona la segunda, tercera y cuarta filas y la primera y segunda columnas del DataFrame. 1:4 representa las filas con un índice que va de 1 a 3 y 4 es exclusivo en el rango. Del mismo modo, 0:2 representa columnas con un índice que va de 0 a 1.