在 Pandas join 方法中指定字尾

Suraj Joshi 2023年1月30日
  1. 使用 DataFrame.join() 方法連線兩個 DataFrame
  2. 使用 DataFrame.join() 方法連線具有共同列名的 DataFrame
在 Pandas join 方法中指定字尾

本教程解釋瞭如何在 Pandas 中使用 DataFrame.join() 方法加入兩個 DataFrame,並在加入時指定字尾。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df)

輸出:

Student DataFrame:
       Name  Gender  Age
0  Jennifer  Female   17
1    Travis    Male   18
2       Bob    Male   17
3      Emma  Female   16
4      Luna  Female   18
5     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+

我們將通過演示連線 students_dfgrades_df 兩個 DataFrame 來解釋 DataFrame.join() 方法。

使用 DataFrame.join() 方法連線兩個 DataFrame

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

joined_df = student_df.join(grades_df)

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df, "\n")

print("Joined DataFrame:")
print(joined_df, "\n")

輸出:

Student DataFrame:
       Name  Gender  Age
0  Jennifer  Female   17
1    Travis    Male   18
2       Bob    Male   17
3      Emma  Female   16
4      Luna  Female   18
5     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+ 

Joined DataFrame:
       Name  Gender  Age  Roll No Grades
0  Jennifer  Female   17      501      A
1    Travis    Male   18      502     B+
2       Bob    Male   17      503     A-
3      Emma  Female   16      504      A
4      Luna  Female   18      505      B
5     Anish    Male   16      506     A+ 

它將 student_dfgrades_df 連線起來,並建立 joined_df。預設情況下,join() 方法使用兩個 DataFrame 的索引來連線它們。加入方法預設為 Left Join。在這裡,左邊 DataFrame 中的所有行,即 student_df 被儲存在 joined_df 中,而右邊 DataFrame 中的一行與左邊 DataFrame 中的一行具有相同的索引值,被加入並放置在同一行中。

使用 DataFrame.join() 方法連線具有共同列名的 DataFrame

如果我們使用 DataFrame.join() 方法試圖連線的兩個 DataFrames 中都有一個名稱相同的列,我們會得到一個錯誤資訊 ValueError: columns overlap but no suffix specified。我們可以在 DataFrame.join() 方法中設定 lsuffixrsuffix 引數的值來解決這個錯誤。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

joined_df = student_df.join(grades_df, lsuffix="_left", rsuffix="_right")

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df, "\n")

print("Joined DataFrame:")
print(joined_df, "\n")

輸出:

Student DataFrame:
   Roll No      Name  Gender  Age
0      501  Jennifer  Female   17
1      502    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+ 

Joined DataFrame:
   Roll No_left      Name  Gender  Age  Roll No_right Grades
0           501  Jennifer  Female   17            501      A
1           502    Travis    Male   18            502     B+
2           503       Bob    Male   17            503     A-
3           504      Emma  Female   16            504      A
4           505      Luna  Female   18            505      B
5           506     Anish    Male   16            506     A+ 

它將 grades_df 加入到 student_df 的右邊。DataFrame.join() 並不能合併各個 DataFrame,也就是說,即使 Roll No 列是兩個 DataFrame 共同的,它們也會在 join() 之後被作為單獨的欄位放置。為了區分具有共同名稱的列名,我們使用 lsuffixrsuffix 引數為左右兩個 DataFrame 中的列提供字尾。

另外,我們也可以通過 DataFrame.merge() 方法,將常用的列名作為 on 引數傳入該方法中來解決這個問題。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

merged_df = student_df.merge(grades_df, on="Roll No")

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df, "\n")

print("Merged DataFrame:")
print(merged_df, "\n")

輸出:

Student DataFrame:
   Roll No      Name  Gender  Age
0      501  Jennifer  Female   17
1      502    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+ 

Merged DataFrame:
   Roll No      Name  Gender  Age Grades
0      501  Jennifer  Female   17      A
1      502    Travis    Male   18     B+
2      503       Bob    Male   17     A-
3      504      Emma  Female   16      A
4      505      Luna  Female   18      B
5      506     Anish    Male   16     A+ 

它將 DataFrames student_dfgrades_df 合併成一個 DataFrame。在這種情況下,Roll No 列將被合併為兩個 DataFrame 的單一列。

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn