Pandas DataFrame DataFrame.merge() 函式
Suraj Joshi
2023年1月30日
-
pandas.DataFrame.merge()
語法 -
示例程式碼:
DataFrame.merge()
函式合併兩個DataFrame
-
示例程式碼:在
merge
方法中設定how
引數,使用各種技術合併DataFrame
-
示例程式碼:在 Pandas 中使用
DataFrame.merge()
函式只合並特定的列 -
示例程式碼:使用索引作為連線鍵來合併
DataFrame
Python Pandas DataFrame.merge()
函式合併 DataFrame
或命名的 Series 物件。
pandas.DataFrame.merge()
語法
DataFrame.merge(
right,
how="inner",
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes="_x",
"_y",
copy=True,
indicator=False,
validate=None,
)
引數
right |
DataFrame 或命名的 Series 。要合併的物件 |
how |
left , right ,inner 或 outer 。如何進行合併操作 |
on |
標籤或列表。要合併的列或索引名稱 |
left_on |
標籤或列表。要合併在左側 DataFrame 中的列名或索引名。 |
right_on |
標籤或列表。列名或索引名要合併到右邊的 DataFrame 中。 |
left_index |
布林型。使用左側 DataFrame 的索引作為連線鍵(left_index=True ) |
right_index |
布林型。使用右側 DataFrame 的索引作為連線鍵(right_index=True ) |
sort |
布林型。在輸出中按字母順序對連線鍵進行排序(sort=True ) |
suffixes |
字尾分別應用於左側和右側重疊的列名 |
copy |
布林型。避免複製 copy=False 。 |
indicator |
在輸出的 DataFrame 中增加一列名為 _merge 的列,其中包含每行的來源資訊(indicator=True ),並在輸出的 DataFrame 中增加一列名為 string 的列(indicator=string ) |
validate |
檢查合併是否為指定型別的引數 |
返回值
它返回一個合併給定物件的 DataFrame
。
示例程式碼:DataFrame.merge()
函式合併兩個 DataFrame
import pandas as pd
df1 = pd.DataFrame(
{"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2)
print("Merged DataFrame:")
print(merged_df)
輸出:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1 5
1 Alish 3 7
它使用 SQL
的內連線技術將 df1
和 df2
合併為一個 DataFrame
。
對於 inner-join
方法,我們必須確保兩個 DataFrame
至少有一列是共同的。
在這裡,merge()
函式將把具有相同值的公共列的行連線到兩個 DataFrame
。
示例程式碼:在 merge
方法中設定 how
引數,使用各種技術合併 DataFrame
import pandas as pd
df1 = pd.DataFrame(
{"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, how="right")
print("Merged DataFrame:")
print(merged_df)
輸出:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1.0 5
1 Alish 3.0 7
2 Zack NaN 6
3 Raphel NaN 8
它使用 SQL
的 right-join
技術將 df1
和 df2
合併為一個 DataFrame
。
在這裡,merge()
函式從右邊的 DataFrame
返回所有的行。然而,只存在於左側 DataFrame
中的行將得到 NaN
值。
同樣,我們也可以使用 how
引數的 left
和 outer
值。
示例程式碼:在 Pandas 中使用 DataFrame.merge()
函式只合並特定的列
import pandas as pd
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, on="Name")
print("Merged DataFrame:")
print(merged_df)
輸出:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_x Pay Position_y
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
它只合並 df1
和 df2
的 Name
列。由於預設的連線方法是內連線,因此只有兩個 DataFrame
的共同行才會被連線。position
列是兩個 DataFrame
共同的,因此有兩個位置列,即 Position_x
和 Position_y
。
預設情況下,_x
和 _y
字尾被附加到重疊列的名稱中。我們可以使用 suffixes
引數指定字尾。
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, on="Name", suffixes=("_left", "_right"))
print("Merged DataFrame:")
print(merged_df)
輸出:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_left Pay Position_right
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
示例程式碼:使用索引作為連線鍵來合併 DataFrame
import pandas as pd
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(
df2, left_index=True, right_index=True, suffixes=("_left", "_right")
)
print("Merged DataFrame:")
print(merged_df)
輸出:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name_left Working Hours Position_left Name_right Pay Position_right
0 Suraj 1 Salesman Suraj 5 Salesman
1 Zeppy 2 CEO Zack 6 CEO
2 Alish 3 Manager Alish 7 Manager
3 Sarah 5 Sales Head Raphel 8 Sales Head
它合併兩個 DataFrame
的相應行,不考慮列的相似性。如果兩個 DataFrame
上出現相同的列名,則在合併後將字尾附加到列名上,使之成為不同的列。
作者: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn