How to Merge on Multiple Pandas Columns
- Default Pandas DataFrame Merge Without Any Key Column
-
Set Value of
on
Parameter to Specify the Key Value for Merge in Pandas -
Merge DataFrames Using
left_on
andright_on
This tutorial explains how we can merge two DataFrames in Pandas using the DataFrame.merge()
method.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Roll No": [500, 501, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Grades": ["A", "B+", "A-", "A", "B", "A+"],
}
)
print("1st DataFrame:")
print(student_df, "\n")
print("2nd DataFrame:")
print(grades_df, "\n")
print("Merged df:")
print(merged_df)
Output:
1st DataFrame:
Roll No Name Gender Age
0 500 Jennifer Female 17
1 501 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
2nd DataFrame:
Roll No Name Grades
0 501 Jennifer A
1 502 Travis B+
2 503 Bob A-
3 504 Emma A
4 505 Luna B
5 506 Anish A+
We will be using the DataFrames student_df
and grades_df
to demonstrate the working of DataFrame.merge()
.
Default Pandas DataFrame Merge Without Any Key Column
If we use only pass two DataFrames to be merged to the merge()
method, the method will collect all the common columns in both DataFrames and replace each common column in both DataFrame with a single one.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Roll No": [500, 501, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Grades": ["A", "B+", "A-", "A", "B", "A+"],
}
)
merged_df = pd.merge(student_df, grades_df)
print("1st DataFrame:")
print(student_df, "\n")
print("2nd DataFrame:")
print(grades_df, "\n")
print("Merged df:")
print(merged_df)
Output:
1st DataFrame:
Roll No Name Gender Age
0 500 Jennifer Female 17
1 501 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
2nd DataFrame:
Roll No Name Grades
0 501 Jennifer A
1 502 Travis B+
2 503 Bob A-
3 504 Emma A
4 505 Luna B
5 506 Anish A+
Merged df:
Roll No Name Gender Age Grades
0 503 Bob Male 17 A-
1 504 Emma Female 16 A
2 505 Luna Female 18 B
3 506 Anish Male 16 A+
It merges the DataFrames student_df
and grades_df
and assigns to merged_df
. We have the columns Roll No
and Name
common to both the DataFrames but the merge()
function will merge each common column into a single column.
Set Value of on
Parameter to Specify the Key Value for Merge in Pandas
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Roll No": [500, 501, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Grades": ["A", "B+", "A-", "A", "B", "A+"],
}
)
merged_df = pd.merge(student_df, grades_df, on="Roll No")
print("1st DataFrame:")
print(student_df, "\n")
print("2nd DataFrame:")
print(grades_df, "\n")
print("Merged df:")
print(merged_df)
Output:
1st DataFrame:
Roll No Name Gender Age
0 500 Jennifer Female 17
1 501 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
2nd DataFrame:
Roll No Name Grades
0 501 Jennifer A
1 502 Travis B+
2 503 Bob A-
3 504 Emma A
4 505 Luna B
5 506 Anish A+
Merged df:
Roll No Name_x Gender Age Name_y Grades
0 501 Travis Male 18 Jennifer A
1 503 Bob Male 17 Bob A-
2 504 Emma Female 16 Emma A
3 505 Luna Female 18 Luna B
4 506 Anish Male 16 Anish A+
Here, we set on="Roll No"
and the merge()
function will find Roll No
named column in both DataFrames and we have only a single Roll No
column for the merged_df
. Although the column Name
is also common to both the DataFrames, we have a separate column for the Name
column of left and right DataFrame represented by Name_x
and Name_y
as Name
is not passed as on
parameter.
Merge DataFrames Using left_on
and right_on
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Roll No": [500, 501, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{"Id": [501, 502, 503, 504, 505, 506], "Grades": ["A", "B+", "A-", "A", "B", "A+"]}
)
merged_df = pd.merge(student_df, grades_df, left_on="Roll No", right_on="Id")
print("1st DataFrame:")
print(student_df, "\n")
print("2nd DataFrame:")
print(grades_df, "\n")
print("Merged df:")
print(merged_df)
Output:
1st DataFrame:
Roll No Name Gender Age
0 500 Jennifer Female 17
1 501 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
2nd DataFrame:
Id Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Merged df:
Roll No Name Gender Age Id Grades
0 501 Travis Male 18 501 A
1 503 Bob Male 17 503 A-
2 504 Emma Female 16 504 A
3 505 Luna Female 18 505 B
4 506 Anish Male 16 506 A+
If we have different column names in DataFrames to be merged for a column on which we want to merge, we can use left_on
and right_on
parameters. The left_on
will be set to the name of the column in the left DataFrame and right_on
will be set to the name of the column in the right DataFrame.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn