Pandas DataFrame DataFrame.merge() 函数

Suraj Joshi 2023年1月30日
  1. pandas.DataFrame.merge() 语法
  2. 示例代码:DataFrame.merge() 函数合并两个 DataFrame
  3. 示例代码:在 merge 方法中设置 how 参数,使用各种技术合并 DataFrame
  4. 示例代码:在 Pandas 中使用 DataFrame.merge() 函数只合并特定的列
  5. 示例代码:使用索引作为连接键来合并 DataFrame
Pandas DataFrame DataFrame.merge() 函数

Python Pandas DataFrame.merge() 函数合并 DataFrame 或命名的 Series 对象。

pandas.DataFrame.merge() 语法

DataFrame.merge(
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes="_x",
    "_y",
    copy=True,
    indicator=False,
    validate=None,
)

参数

right DataFrame 或命名的 Series。要合并的对象
how left, right,innerouter。如何进行合并操作
on 标签或列表。要合并的列或索引名称
left_on 标签或列表。要合并在左侧 DataFrame 中的列名或索引名。
right_on 标签或列表。列名或索引名要合并到右边的 DataFrame 中。
left_index 布尔型。使用左侧 DataFrame 的索引作为连接键(left_index=True)
right_index 布尔型。使用右侧 DataFrame 的索引作为连接键(right_index=True)
sort 布尔型。在输出中按字母顺序对连接键进行排序(sort=True)
suffixes 后缀分别应用于左侧和右侧重叠的列名
copy 布尔型。避免复制 copy=False
indicator 在输出的 DataFrame 中增加一列名为 _merge 的列,其中包含每行的来源信息(indicator=True),并在输出的 DataFrame 中增加一列名为 string 的列(indicator=string)
validate 检查合并是否为指定类型的参数

返回值

它返回一个合并给定对象的 DataFrame

示例代码:DataFrame.merge() 函数合并两个 DataFrame

import pandas as pd

df1 = pd.DataFrame(
    {"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2)
print("Merged DataFrame:")
print(merged_df)

输出:

1st DataFrame:
    Name  Working Hours
0  Suraj              1
1  Zeppy              2
2  Alish              3
3  Sarah              5
2nd DataFrame:
     Name  Pay
0   Suraj    5
1    Zack    6
2   Alish    7
3  Raphel    8
Merged DataFrame:
    Name  Working Hours  Pay
0  Suraj              1    5
1  Alish              3    7

它使用 SQL 的内连接技术将 df1df2 合并为一个 DataFrame

对于 inner-join 方法,我们必须确保两个 DataFrame 至少有一列是共同的。

在这里,merge() 函数将把具有相同值的公共列的行连接到两个 DataFrame

示例代码:在 merge 方法中设置 how 参数,使用各种技术合并 DataFrame

import pandas as pd

df1 = pd.DataFrame(
    {"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, how="right")
print("Merged DataFrame:")
print(merged_df)

输出:

1st DataFrame:
    Name  Working Hours
0  Suraj              1
1  Zeppy              2
2  Alish              3
3  Sarah              5
2nd DataFrame:
     Name  Pay
0   Suraj    5
1    Zack    6
2   Alish    7
3  Raphel    8
Merged DataFrame:
     Name  Working Hours  Pay
0   Suraj            1.0    5
1   Alish            3.0    7
2    Zack            NaN    6
3  Raphel            NaN   8 

它使用 SQLright-join 技术将 df1df2 合并为一个 DataFrame

在这里,merge() 函数从右边的 DataFrame 返回所有的行。然而,只存在于左侧 DataFrame 中的行将得到 NaN 值。

同样,我们也可以使用 how 参数的 leftouter 值。

示例代码:在 Pandas 中使用 DataFrame.merge() 函数只合并特定的列

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, on="Name")
print("Merged DataFrame:")
print(merged_df)

输出:


1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
    Name  Working Hours Position_x  Pay Position_y
0  Suraj              1   Salesman    5   Salesman
1  Alish              3    Manager    7    Manager

它只合并 df1df2Name 列。由于默认的连接方法是内连接,因此只有两个 DataFrame 的共同行才会被连接。position 列是两个 DataFrame 共同的,因此有两个位置列,即 Position_xPosition_y

默认情况下,_x_y 后缀被附加到重叠列的名称中。我们可以使用 suffixes 参数指定后缀。

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, on="Name", suffixes=("_left", "_right"))
print("Merged DataFrame:")
print(merged_df)

输出:

1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
    Name  Working Hours Position_left  Pay Position_right
0  Suraj              1      Salesman    5       Salesman
1  Alish              3       Manager    7        Manager

示例代码:使用索引作为连接键来合并 DataFrame

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(
    df2, left_index=True, right_index=True, suffixes=("_left", "_right")
)
print("Merged DataFrame:")
print(merged_df)

输出:


1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
  Name_left  Working Hours Position_left Name_right  Pay Position_right
0     Suraj              1      Salesman      Suraj    5       Salesman
1     Zeppy              2           CEO       Zack    6            CEO
2     Alish              3       Manager      Alish    7        Manager
3     Sarah              5    Sales Head     Raphel    8     Sales Head

它合并两个 DataFrame 的相应行,不考虑列的相似性。如果两个 DataFrame 上出现相同的列名,则在合并后将后缀附加到列名上,使之成为不同的列。

作者: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相关文章 - Pandas DataFrame