Pandas DataFrame DataFrame.merge() Function
-
Syntax of
pandas.DataFrame.merge()
: -
Example Codes:
DataFrame.merge()
Function to Merge TwoDataFrames
-
Example Codes: Set
how
Parameter in themerge
Method to Merge DataFrames Using Various Techniques -
Example Codes:Merge Only Specific Columns Using
DataFrame.merge()
Function in Pandas -
Example Codes: Use Index as the Join Keys for Merging DataFrames
DataFrame.merge()
Function
Python Pandas DataFrame.merge()
function merges DataFrame
or named Series objects.
Syntax of pandas.DataFrame.merge()
:
DataFrame.merge(
right,
how="inner",
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes="_x",
"_y",
copy=True,
indicator=False,
validate=None,
)
Parameters
right |
DataFrame or named Series. Object to merge with |
how |
left , right , inner or outer . How to perform merge operation |
on |
label or list. Column or index names to merge |
left_on |
label or list. Column or index names to merge in the left DataFrame |
right_on |
label or list. Column or index names to merge in the right DataFrame |
left_index |
Boolean. Use the index from the left DataFrame as the join key(left_index=True ) |
right_index |
Boolean. Use the index from the right DataFrame as the join key(right_index=True ) |
sort |
Boolean. Sort the join keys alphabetically in the output(sort=True ) |
suffixes |
Suffix to be applied to overlapping column names in the left and right side, respectively |
copy |
Boolean. Avoid copy for copy=False |
indicator |
add a column to output DataFrame called _merge with information on the source of each row(indicator=True ) and a column named string will be added to output DataFrame (indicator=string ) |
validate |
parameter to check if the merge is of a specified type. |
Return
It returns a DataFrame
merging the given objects.
Example Codes: DataFrame.merge()
Function to Merge Two DataFrames
import pandas as pd
df1 = pd.DataFrame(
{"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2)
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1 5
1 Alish 3 7
It merges df1
and df2
to a single DataFrame
using the inner-join
technique of SQL.
For the inner-join
method, we must ensure there is at least one column common to both DataFrames
.
Here, the merge()
function will join the rows having the same values of the common column to both DataFrames
.
Example Codes: Set how
Parameter in the merge
Method to Merge DataFrames Using Various Techniques
import pandas as pd
df1 = pd.DataFrame(
{"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, how="right")
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1.0 5
1 Alish 3.0 7
2 Zack NaN 6
3 Raphel NaN 8
It merges df1
and df2
to a single DataFrame
using the right-join
technique of SQL.
Here, the merge()
function returns all rows from the right DataFrame
. However, the rows only present in the left DataFrame
will get the NaN
value.
Similarly, we can also use left
and outer
values of the how
parameter.
Example Codes:Merge Only Specific Columns Using DataFrame.merge()
Function in Pandas
import pandas as pd
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, on="Name")
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_x Pay Position_y
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
It merges only the Name
column of df1
and df2
. Since the default join method is inner-join
only the common rows to both DataFrame
will be joined. The Position
column is common to both DataFrames
and hence two-position columns viz. Position_x
and Position_y
.
By default, _x
and _y
suffixes are appended to the name of the overlapping column. We can specify suffixes using the suffixes
parameter.
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(df2, on="Name", suffixes=("_left", "_right"))
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_left Pay Position_right
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
Example Codes: Use Index as the Join Keys for Merging DataFrames DataFrame.merge()
Function
import pandas as pd
df1 = pd.DataFrame(
{
"Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
"Working Hours": [1, 2, 3, 5],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
df2 = pd.DataFrame(
{
"Name": ["Suraj", "Zack", "Alish", "Raphel"],
"Pay": [5, 6, 7, 8],
"Position": ["Salesman", "CEO", "Manager", "Sales Head"],
}
)
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df = df1.merge(
df2, left_index=True, right_index=True, suffixes=("_left", "_right")
)
print("Merged DataFrame:")
print(merged_df)
Output:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name_left Working Hours Position_left Name_right Pay Position_right
0 Suraj 1 Salesman Suraj 5 Salesman
1 Zeppy 2 CEO Zack 6 CEO
2 Alish 3 Manager Alish 7 Manager
3 Sarah 5 Sales Head Raphel 8 Sales Head
It merges the corresponding rows of two DataFrames
without considering column similarities. If the same column name appears on both DataFrame
, the suffixes are attached to column names and made different columns after merging.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn