Pandas DataFrame DataFrame.merge() Function

Suraj Joshi Jan 30, 2023 Pandas Pandas DataFrame

Syntax of pandas.DataFrame.merge():
Example Codes: DataFrame.merge() Function to Merge Two DataFrames
Example Codes: Set how Parameter in the merge Method to Merge DataFrames Using Various Techniques
Example Codes:Merge Only Specific Columns Using DataFrame.merge() Function in Pandas
Example Codes: Use Index as the Join Keys for Merging DataFrames DataFrame.merge() Function

Pandas DataFrame DataFrame.merge() Function

Python Pandas DataFrame.merge() function merges DataFrame or named Series objects.

Syntax of `pandas.DataFrame.merge()`:

DataFrame.merge(
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes="_x",
    "_y",
    copy=True,
    indicator=False,
    validate=None,
)

Parameters


`right`	`DataFrame` or named Series. Object to merge with
`how`	`left`, `right`, `inner` or `outer`. How to perform merge operation
`on`	label or list. Column or index names to merge
`left_on`	label or list. Column or index names to merge in the left `DataFrame`
`right_on`	label or list. Column or index names to merge in the right `DataFrame`
`left_index`	Boolean. Use the index from the left `DataFrame` as the join key(`left_index=True`)
`right_index`	Boolean. Use the index from the right `DataFrame` as the join key(`right_index=True`)
`sort`	Boolean. Sort the join keys alphabetically in the output(`sort=True`)
`suffixes`	Suffix to be applied to overlapping column names in the left and right side, respectively
`copy`	Boolean. Avoid copy for `copy=False`
`indicator`	add a column to output `DataFrame` called `_merge` with information on the source of each row(`indicator=True`) and a column named `string` will be added to output `DataFrame`(`indicator=string`)
`validate`	parameter to check if the merge is of a specified type.

Return

It returns a DataFrame merging the given objects.

Example Codes: `DataFrame.merge()` Function to Merge Two `DataFrames`

import pandas as pd

df1 = pd.DataFrame(
    {"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2)
print("Merged DataFrame:")
print(merged_df)

Output:

1st DataFrame:
    Name  Working Hours
0  Suraj              1
1  Zeppy              2
2  Alish              3
3  Sarah              5
2nd DataFrame:
     Name  Pay
0   Suraj    5
1    Zack    6
2   Alish    7
3  Raphel    8
Merged DataFrame:
    Name  Working Hours  Pay
0  Suraj              1    5
1  Alish              3    7

It merges df1 and df2 to a single DataFrame using the inner-join technique of SQL.

For the inner-join method, we must ensure there is at least one column common to both DataFrames.

Here, the merge() function will join the rows having the same values of the common column to both DataFrames.

Example Codes: Set `how` Parameter in the `merge` Method to Merge DataFrames Using Various Techniques

import pandas as pd

df1 = pd.DataFrame(
    {"Name": ["Suraj", "Zeppy", "Alish", "Sarah"], "Working Hours": [1, 2, 3, 5]}
)
df2 = pd.DataFrame({"Name": ["Suraj", "Zack", "Alish", "Raphel"], "Pay": [5, 6, 7, 8]})

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, how="right")
print("Merged DataFrame:")
print(merged_df)

Output:

1st DataFrame:
    Name  Working Hours
0  Suraj              1
1  Zeppy              2
2  Alish              3
3  Sarah              5
2nd DataFrame:
     Name  Pay
0   Suraj    5
1    Zack    6
2   Alish    7
3  Raphel    8
Merged DataFrame:
     Name  Working Hours  Pay
0   Suraj            1.0    5
1   Alish            3.0    7
2    Zack            NaN    6
3  Raphel            NaN   8

It merges df1 and df2 to a single DataFrame using the right-join technique of SQL.

Here, the merge() function returns all rows from the right DataFrame. However, the rows only present in the left DataFrame will get the NaN value.

Similarly, we can also use left and outer values of the how parameter.

Example Codes:Merge Only Specific Columns Using `DataFrame.merge()` Function in Pandas

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, on="Name")
print("Merged DataFrame:")
print(merged_df)

Output:


1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
    Name  Working Hours Position_x  Pay Position_y
0  Suraj              1   Salesman    5   Salesman
1  Alish              3    Manager    7    Manager

It merges only the Name column of df1 and df2. Since the default join method is inner-join only the common rows to both DataFrame will be joined. The Position column is common to both DataFrames and hence two-position columns viz. Position_x and Position_y.

By default, _x and _y suffixes are appended to the name of the overlapping column. We can specify suffixes using the suffixes parameter.

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(df2, on="Name", suffixes=("_left", "_right"))
print("Merged DataFrame:")
print(merged_df)

Output:

1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
    Name  Working Hours Position_left  Pay Position_right
0  Suraj              1      Salesman    5       Salesman
1  Alish              3       Manager    7        Manager

Example Codes: Use Index as the Join Keys for Merging DataFrames `DataFrame.merge()` Function

import pandas as pd

df1 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zeppy", "Alish", "Sarah"],
        "Working Hours": [1, 2, 3, 5],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)
df2 = pd.DataFrame(
    {
        "Name": ["Suraj", "Zack", "Alish", "Raphel"],
        "Pay": [5, 6, 7, 8],
        "Position": ["Salesman", "CEO", "Manager", "Sales Head"],
    }
)

print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)

merged_df = df1.merge(
    df2, left_index=True, right_index=True, suffixes=("_left", "_right")
)
print("Merged DataFrame:")
print(merged_df)

Output:


1st DataFrame:
    Name  Working Hours    Position
0  Suraj              1    Salesman
1  Zeppy              2         CEO
2  Alish              3     Manager
3  Sarah              5  Sales Head
2nd DataFrame:
     Name  Pay    Position
0   Suraj    5    Salesman
1    Zack    6         CEO
2   Alish    7     Manager
3  Raphel    8  Sales Head
Merged DataFrame:
  Name_left  Working Hours Position_left Name_right  Pay Position_right
0     Suraj              1      Salesman      Suraj    5       Salesman
1     Zeppy              2           CEO       Zack    6            CEO
2     Alish              3       Manager      Alish    7        Manager
3     Sarah              5    Sales Head     Raphel    8     Sales Head

It merges the corresponding rows of two DataFrames without considering column similarities. If the same column name appears on both DataFrame, the suffixes are attached to column names and made different columns after merging.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Suraj Joshi

Suraj Joshi is a backend software engineer at Matrice.ai.

Syntax of pandas.DataFrame.merge():

Parameters

Return

Example Codes: DataFrame.merge() Function to Merge Two DataFrames

Example Codes: Set how Parameter in the merge Method to Merge DataFrames Using Various Techniques

Example Codes:Merge Only Specific Columns Using DataFrame.merge() Function in Pandas

Example Codes: Use Index as the Join Keys for Merging DataFrames DataFrame.merge() Function

Related Article - Pandas DataFrame

Syntax of `pandas.DataFrame.merge()`:

Example Codes: `DataFrame.merge()` Function to Merge Two `DataFrames`

Example Codes: Set `how` Parameter in the `merge` Method to Merge DataFrames Using Various Techniques

Example Codes:Merge Only Specific Columns Using `DataFrame.merge()` Function in Pandas

Example Codes: Use Index as the Join Keys for Merging DataFrames `DataFrame.merge()` Function