Pandas loc vs iloc

Suraj Joshi 2023年1月30日 Pandas Pandas Filter
  1. 使用 .loc() 方法從 DataFrame 中選擇指定索引和列標籤的特定值
  2. 使用 .loc() 方法從 DataFrame 中選擇特定的列
  3. 使用 .loc() 方法通過對列應用條件來過濾行
  4. 使用 iloc 通過索引來過濾行
  5. 從 DataFrame 中過濾特定的行和列
  6. 使用 iloc 方法從 DataFrame 中過濾行和列的範圍
  7. Pandas lociloc 的比較
Pandas loc vs iloc

本教程介紹瞭如何使用 Python 中的 lociloc 從 Pandas DataFrame 中過濾資料。要使用 iloc 從 DataFrame 中過濾元素,我們使用行和列的整數索引,而要使用 loc 從 DataFrame 中過濾元素,我們使用行名和列名。

為了演示使用 loc 的資料過濾,我們將使用下面例子中描述的 DataFrame。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print(student_df)

輸出:

        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

使用 .loc() 方法從 DataFrame 中選擇指定索引和列標籤的特定值

我們可以將索引標籤和列標籤作為引數傳遞給 .loc() 方法,以提取給定索引和列標籤對應的值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)
print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The Grade of student with Roll No. 504 is:")
value = student_df.loc[504, "Grade"]
print(value)

輸出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The Grade of student with Roll No. 504 is:
A-

在 DataFrame 中選擇索引標籤為 504 且列標籤為 Grade 的值。.loc() 方法的第一個引數代表索引名,第二個引數是指列名。

使用 .loc() 方法從 DataFrame 中選擇特定的列

我們還可以使用 .loc() 方法從 DataFrame 中過濾所需的列。我們將所需的列名列表作為第二個引數傳遞給 .loc() 方法來過濾指定的列。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("The name and age of students in the DataFrame are:")
value = student_df.loc[:, ["Name", "Age"]]
print(value)

輸出:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

The name and age of students in the DataFrame are:
        Name Age
501    Alice   17
502   Steven   20
503 Neesham   18
504    Chris   21
505    Alice   15

.loc() 的第一個引數是:,它表示 DataFrame 中的所有行。同樣,我們將 ["Name", "Age"] 作為第二個引數傳遞給 .loc() 方法,表示只選擇 DataFrame 中的 NameAge 列。

使用 .loc() 方法通過對列應用條件來過濾行

我們也可以使用 .loc() 方法過濾滿足指定條件的列值的行。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Students with Grade A are:")
value = student_df.loc[student_df.Grade == "A"]
print(value)

輸出:

The DataFrame of students with marks is:
        Name Age      City Grade
501    Alice   17 New York     A
502   Steven   20 Portland    B-
503 Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Students with Grade A are:
      Name Age      City Grade
501 Alice   17 New York     A
505 Alice   15    Austin     A

它選擇了 DataFrame 中所有成績為 A 的學生。

使用 iloc 通過索引來過濾行

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("2nd and 3rd rows in the DataFrame:")
filtered_rows = student_df.iloc[[1, 2]]
print(filtered_rows)

輸出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

2nd and 3rd rows in the DataFrame:
        Name  Age      City Grade
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+

它從 DataFrame 中過濾第 2 和第 3 行。

我們將行的整數索引作為引數傳遞給 iloc 方法,以便從 DataFrame 中過濾行。在這裡,第二和第三行的整數索引分別是 12,因為索引從 0 開始。

從 DataFrame 中過濾特定的行和列

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(filtered_values)

輸出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

它從 DataFrame 中過濾第 2、3、4 行的第一列和最後一列,即 NameGrade。我們將行的整數索引列表作為第一個引數,列的整數索引列表作為第二個引數傳遞給 iloc 方法。

使用 iloc 方法從 DataFrame 中過濾行和列的範圍

為了過濾行和列的範圍,我們可以使用列表切片,並將每行和每列的切片作為引數傳遞給 iloc 方法。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame:")
filtered_values = student_df.iloc[1:4, 0:2]
print(filtered_values)

輸出:

The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

它從 DataFrame 中選擇第 2、3、4 行和第 1、2 列。1:4 代表索引範圍從 13 的行,4 在範圍內是排他性的。同理,0:2 代表索引範圍從 01 的列。

Pandas lociloc 的比較

要使用 loc() 從 DataFrame 中過濾行和列,我們需要傳遞要過濾掉的行和列的名稱。同樣,我們需要傳遞要過濾掉的行和列的整數索引以使用 iloc() 來過濾值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Alice", "Steven", "Neesham", "Chris", "Alice"],
        "Age": [17, 20, 18, 21, 15],
        "City": ["New York", "Portland", "Boston", "Seattle", "Austin"],
        "Grade": ["A", "B-", "B+", "A-", "A"],
    },
    index=roll_no,
)

print("The DataFrame of students with marks is:")
print(student_df)
print("")
print("Filtered values from the DataFrame using loc:")
iloc_filtered_values = student_df.loc[[502, 503, 504], ["Name", "Age"]]
print(iloc_filtered_values)
print("")
print("Filtered values from the DataFrame using iloc:")
iloc_filtered_values = student_df.iloc[[1, 2, 3], [0, 3]]
print(iloc_filtered_values)
The DataFrame of students with marks is:
        Name  Age      City Grade
501    Alice   17  New York     A
502   Steven   20  Portland    B-
503  Neesham   18    Boston    B+
504    Chris   21   Seattle    A-
505    Alice   15    Austin     A

Filtered values from the DataFrame using loc:
        Name  Age
502   Steven   20
503  Neesham   18
504    Chris   21

Filtered values from the DataFrame using iloc:
        Name Grade
502   Steven    B-
503  Neesham    B+
504    Chris    A-

它顯示了我們如何使用 lociloc 從 DataFrame 中過濾相同的值。

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe
作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相關文章 - Pandas Filter