Pandas loc vs iloc

Suraj Joshi 2023年1月30日 Pandas Pandas Filter
  1. 使用 .loc() 方法从 DataFrame 中选择指定索引和列标签的特定值
  2. 使用 .loc() 方法从 DataFrame 中选择特定的列
  3. 使用 .loc() 方法通过对列应用条件来过滤行
  4. 使用 iloc 通过索引来过滤行
  5. 从 DataFrame 中过滤特定的行和列
  6. 使用 iloc 方法从 DataFrame 中过滤行和列的范围
  7. Pandas lociloc 的比较
Pandas loc vs iloc

本教程介绍了如何使用 Python 中的 lociloc 从 Pandas DataFrame 中过滤数据。要使用 iloc 从 DataFrame 中过滤元素,我们使用行和列的整数索引,而要使用 loc 从 DataFrame 中过滤元素,我们使用行名和列名。

为了演示使用 loc 的数据过滤,我们将使用下面例子中描述的 DataFrame。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print(student_df)

输出:

        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

使用 .loc() 方法从 DataFrame 中选择指定索引和列标签的特定值

我们可以将索引标签和列标签作为参数传递给 .loc() 方法,以提取给定索引和列标签对应的值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)
print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The Grade of student with Roll No. 504 is:")
value = student_df.loc[504, "Grade"]
print(value)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The Grade of student with Roll No. 504 is:
A-

在 DataFrame 中选择索引标签为 504 且列标签为 Grade 的值。.loc() 方法的第一个参数代表索引名,第二个参数是指列名。

使用 .loc() 方法从 DataFrame 中选择特定的列

我们还可以使用 .loc() 方法从 DataFrame 中过滤所需的列。我们将所需的列名列表作为第二个参数传递给 .loc() 方法来过滤指定的列。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The name and age of students in the DataFrame are:")
value = student_df.loc[:, ["Name", "Age"]]
print(value)

输出:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The name and age of students in the DataFrame are:
        Name Age
501    Alice   17
502   Steven   20
503 Neesham   18
504    Chris   21
505    Alice   15

.loc() 的第一个参数是:,它表示 DataFrame 中的所有行。同样,我们将 ["Name", "Age"] 作为第二个参数传递给 .loc() 方法,表示只选择 DataFrame 中的 NameAge 列。

使用 .loc() 方法通过对列应用条件来过滤行

我们也可以使用 .loc() 方法过滤满足指定条件的列值的行。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Students with Grade A are:")
value = student_df.loc[student_df.Grade == "A"]
print(value)

输出:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Students with Grade A are:
      Name Age      City Grade
501 Alice   17 New York     A
505 Alice   15    Austin     A

它选择了 DataFrame 中所有成绩为 A 的学生。

使用 iloc 通过索引来过滤行

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("2nd and 3rd rows in the DataFrame:")
filtered_rows = student_df.iloc[[1, 2]]
print(filtered_rows)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

2nd and 3rd rows in the DataFrame:
        Name  Age      City Grade
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+

它从 DataFrame 中过滤第 2 和第 3 行。

我们将行的整数索引作为参数传递给 iloc 方法,以便从 DataFrame 中过滤行。在这里,第二和第三行的整数索引分别是 12,因为索引从 0 开始。

从 DataFrame 中过滤特定的行和列

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(filtered_values)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

它从 DataFrame 中过滤第 2、3、4 行的第一列和最后一列,即 NameGrade。我们将行的整数索引列表作为第一个参数,列的整数索引列表作为第二个参数传递给 iloc 方法。

使用 iloc 方法从 DataFrame 中过滤行和列的范围

为了过滤行和列的范围,我们可以使用列表切片,并将每行和每列的切片作为参数传递给 iloc 方法。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[1:4, 0:2]
print(filtered_values)

输出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

它从 DataFrame 中选择第 2、3、4 行和第 1、2 列。1:4 代表索引范围从 13 的行,4 在范围内是排他性的。同理,0:2 代表索引范围从 01 的列。

Pandas lociloc 的比较

要使用 loc() 从 DataFrame 中过滤行和列,我们需要传递要过滤掉的行和列的名称。同样,我们需要传递要过滤掉的行和列的整数索引以使用 iloc() 来过滤值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame using loc:")
iloc_filtered_values = student_df.loc[[502, 503, 504], ["Name", "Age"]]
print(iloc_filtered_values)
print("")
print("Filtered values from the DataFrame using iloc:")
iloc_filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(iloc_filtered_values)
The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame using loc:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

Filtered values from the DataFrame using iloc:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

它显示了我们如何使用 lociloc 从 DataFrame 中过滤相同的值。

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe
作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相关文章 - Pandas Filter