How to Merge on Multiple Pandas Columns

Suraj Joshi Feb 02, 2024
  1. Default Pandas DataFrame Merge Without Any Key Column
  2. Set Value of on Parameter to Specify the Key Value for Merge in Pandas
  3. Merge DataFrames Using left_on and right_on
How to Merge on Multiple Pandas Columns

This tutorial explains how we can merge two DataFrames in Pandas using the DataFrame.merge() method.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [500, 501, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

print("1st DataFrame:")
print(student_df, "\n")

print("2nd DataFrame:")
print(grades_df, "\n")

print("Merged df:")
print(merged_df)

Output:

1st DataFrame:
   Roll No      Name  Gender  Age
0      500  Jennifer  Female   17
1      501    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

2nd DataFrame:
   Roll No      Name Grades
0      501  Jennifer      A
1      502    Travis     B+
2      503       Bob     A-
3      504      Emma      A
4      505      Luna      B
5      506     Anish     A+ 

We will be using the DataFrames student_df and grades_df to demonstrate the working of DataFrame.merge().

Default Pandas DataFrame Merge Without Any Key Column

If we use only pass two DataFrames to be merged to the merge() method, the method will collect all the common columns in both DataFrames and replace each common column in both DataFrame with a single one.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [500, 501, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

merged_df = pd.merge(student_df, grades_df)

print("1st DataFrame:")
print(student_df, "\n")

print("2nd DataFrame:")
print(grades_df, "\n")

print("Merged df:")
print(merged_df)

Output:

1st DataFrame:
   Roll No      Name  Gender  Age
0      500  Jennifer  Female   17
1      501    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

2nd DataFrame:
   Roll No      Name Grades
0      501  Jennifer      A
1      502    Travis     B+
2      503       Bob     A-
3      504      Emma      A
4      505      Luna      B
5      506     Anish     A+ 

Merged df:
   Roll No   Name  Gender  Age Grades
0      503    Bob    Male   17     A-
1      504   Emma  Female   16      A
2      505   Luna  Female   18      B
3      506  Anish    Male   16     A+

It merges the DataFrames student_df and grades_df and assigns to merged_df. We have the columns Roll No and Name common to both the DataFrames but the merge() function will merge each common column into a single column.

Set Value of on Parameter to Specify the Key Value for Merge in Pandas

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [500, 501, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

merged_df = pd.merge(student_df, grades_df, on="Roll No")

print("1st DataFrame:")
print(student_df, "\n")

print("2nd DataFrame:")
print(grades_df, "\n")

print("Merged df:")
print(merged_df)

Output:

1st DataFrame:
   Roll No      Name  Gender  Age
0      500  Jennifer  Female   17
1      501    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

2nd DataFrame:
   Roll No      Name Grades
0      501  Jennifer      A
1      502    Travis     B+
2      503       Bob     A-
3      504      Emma      A
4      505      Luna      B
5      506     Anish     A+ 

Merged df:
   Roll No  Name_x  Gender  Age    Name_y Grades
0      501  Travis    Male   18  Jennifer      A
1      503     Bob    Male   17       Bob     A-
2      504    Emma  Female   16      Emma      A
3      505    Luna  Female   18      Luna      B
4      506   Anish    Male   16     Anish     A+

Here, we set on="Roll No" and the merge() function will find Roll No named column in both DataFrames and we have only a single Roll No column for the merged_df. Although the column Name is also common to both the DataFrames, we have a separate column for the Name column of left and right DataFrame represented by Name_x and Name_y as Name is not passed as on parameter.

Merge DataFrames Using left_on and right_on

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [500, 501, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {"Id": [501, 502, 503, 504, 505, 506], "Grades": ["A", "B+", "A-", "A", "B", "A+"]}
)

merged_df = pd.merge(student_df, grades_df, left_on="Roll No", right_on="Id")

print("1st DataFrame:")
print(student_df, "\n")

print("2nd DataFrame:")
print(grades_df, "\n")

print("Merged df:")
print(merged_df)

Output:

1st DataFrame:
   Roll No      Name  Gender  Age
0      500  Jennifer  Female   17
1      501    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

2nd DataFrame:
    Id Grades
0  501      A
1  502     B+
2  503     A-
3  504      A
4  505      B
5  506     A+ 

Merged df:
   Roll No    Name  Gender  Age   Id Grades
0      501  Travis    Male   18  501      A
1      503     Bob    Male   17  503     A-
2      504    Emma  Female   16  504      A
3      505    Luna  Female   18  505      B
4      506   Anish    Male   16  506     A+

If we have different column names in DataFrames to be merged for a column on which we want to merge, we can use left_on and right_on parameters. The left_on will be set to the name of the column in the left DataFrame and right_on will be set to the name of the column in the right DataFrame.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn