Pandas DataFrame DataFrame.merge() 函式

Suraj Joshi 2023年1月30日
  1. pandas.DataFrame.merge() 語法
  2. 示例程式碼:DataFrame.merge() 函式合併兩個 DataFrame
  3. 示例程式碼:在 merge 方法中設定 how 引數,使用各種技術合併 DataFrame
  4. 示例程式碼:在 Pandas 中使用 DataFrame.merge() 函式只合並特定的列
  5. 示例程式碼:使用索引作為連線鍵來合併 DataFrame
Pandas DataFrame DataFrame.merge() 函式

Python Pandas DataFrame.merge() 函式合併 DataFrame 或命名的 Series 物件。

pandas.DataFrame.merge() 語法

DataFrame.merge(
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes="_x",
    "_y",
    copy=True,
    indicator=False,
    validate=None,
)

引數

right DataFrame 或命名的 Series。要合併的物件
how left, right,innerouter。如何進行合併操作
on 標籤或列表。要合併的列或索引名稱
left_on 標籤或列表。要合併在左側 DataFrame 中的列名或索引名。
right_on 標籤或列表。列名或索引名要合併到右邊的 DataFrame 中。
left_index 布林型。使用左側 DataFrame 的索引作為連線鍵(left_index=True)
right_index 布林型。使用右側 DataFrame 的索引作為連線鍵(right_index=True)
sort 布林型。在輸出中按字母順序對連線鍵進行排序(sort=True)
suffixes 字尾分別應用於左側和右側重疊的列名
copy 布林型。避免複製 copy=False
indicator 在輸出的 DataFrame 中增加一列名為 _merge 的列,其中包含每行的來源資訊(indicator=True),並在輸出的 DataFrame 中增加一列名為 string 的列(indicator=string)
validate 檢查合併是否為指定型別的引數

返回值

它返回一個合併給定物件的 DataFrame

示例程式碼:DataFrame.merge() 函式合併兩個 DataFrame

import pandas as pd

df1 = pd.DataFrame(
    {"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2)
print("Merged DataFrame:")
print(merged_df)

輸出:

1st DataFrame:
    Name  Working Hours
0  Suraj              1
1  Zeppy              2
2  Alish              3
3  Sarah              5
2nd DataFrame:
     Name  Pay
0   Suraj    5
1    Zack    6
2   Alish    7
3  Raphel    8
Merged DataFrame:
    Name  Working Hours  Pay
0  Suraj              1    5
1  Alish              3    7

它使用 SQL 的內連線技術將 df1df2 合併為一個 DataFrame

對於 inner-join 方法,我們必須確保兩個 DataFrame 至少有一列是共同的。

在這裡,merge() 函式將把具有相同值的公共列的行連線到兩個 DataFrame

示例程式碼:在 merge 方法中設定 how 引數,使用各種技術合併 DataFrame

import pandas as pd

df1 = pd.DataFrame(
    {"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, how="right")
print("Merged DataFrame:")
print(merged_df)

輸出:

1st DataFrame:
    Name  Working Hours
0  Suraj              1
1  Zeppy              2
2  Alish              3
3  Sarah              5
2nd DataFrame:
     Name  Pay
0   Suraj    5
1    Zack    6
2   Alish    7
3  Raphel    8
Merged DataFrame:
     Name  Working Hours  Pay
0   Suraj            1.0    5
1   Alish            3.0    7
2    Zack            NaN    6
3  Raphel            NaN   8 

它使用 SQLright-join 技術將 df1df2 合併為一個 DataFrame

在這裡,merge() 函式從右邊的 DataFrame 返回所有的行。然而,只存在於左側 DataFrame 中的行將得到 NaN 值。

同樣,我們也可以使用 how 引數的 leftouter 值。

示例程式碼:在 Pandas 中使用 DataFrame.merge() 函式只合並特定的列

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, on="Name")
print("Merged DataFrame:")
print(merged_df)

輸出:


1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
    Name  Working Hours Position_x  Pay Position_y
0  Suraj              1   Salesman    5   Salesman
1  Alish              3    Manager    7    Manager

它只合並 df1df2Name 列。由於預設的連線方法是內連線,因此只有兩個 DataFrame 的共同行才會被連線。position 列是兩個 DataFrame 共同的,因此有兩個位置列,即 Position_xPosition_y

預設情況下,_x_y 字尾被附加到重疊列的名稱中。我們可以使用 suffixes 引數指定字尾。

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, on="Name", suffixes=("_left", "_right"))
print("Merged DataFrame:")
print(merged_df)

輸出:

1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
    Name  Working Hours Position_left  Pay Position_right
0  Suraj              1      Salesman    5       Salesman
1  Alish              3       Manager    7        Manager

示例程式碼:使用索引作為連線鍵來合併 DataFrame

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(
    df2, left_index=True, right_index=True, suffixes=("_left", "_right")
)
print("Merged DataFrame:")
print(merged_df)

輸出:


1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
  Name_left  Working Hours Position_left Name_right  Pay Position_right
0     Suraj              1      Salesman      Suraj    5       Salesman
1     Zeppy              2           CEO       Zack    6            CEO
2     Alish              3       Manager      Alish    7        Manager
3     Sarah              5    Sales Head     Raphel    8     Sales Head

它合併兩個 DataFrame 的相應行,不考慮列的相似性。如果兩個 DataFrame 上出現相同的列名,則在合併後將字尾附加到列名上,使之成為不同的列。

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相關文章 - Pandas DataFrame