How to Specify Suffix in Pandas join() Method
-
Join Two DataFrames Using the
DataFrame.join()
Method -
Join DataFrames With a Common Column Name Using
DataFrame.join()
Method
This tutorial explains how we can join two DataFrames in Pandas using the DataFrame.join()
method and specify the suffix when joining.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"],
}
)
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df)
Output:
Student DataFrame:
Name Gender Age
0 Jennifer Female 17
1 Travis Male 18
2 Bob Male 17
3 Emma Female 16
4 Luna Female 18
5 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
We will explain the DataFrame.join()
method by demonstrating the joining of students_df
and grades_df
DataFrame.
Join Two DataFrames Using the DataFrame.join()
Method
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"],
}
)
joined_df = student_df.join(grades_df)
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Joined DataFrame:")
print(joined_df, "\n")
Output:
Student DataFrame:
Name Gender Age
0 Jennifer Female 17
1 Travis Male 18
2 Bob Male 17
3 Emma Female 16
4 Luna Female 18
5 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Joined DataFrame:
Name Gender Age Roll No Grades
0 Jennifer Female 17 501 A
1 Travis Male 18 502 B+
2 Bob Male 17 503 A-
3 Emma Female 16 504 A
4 Luna Female 18 505 B
5 Anish Male 16 506 A+
It joins the student_df
and grades_df
, and creates joined_df
. By default, the join()
method uses the index of both DataFrames to join them. The join method is Left Join
by default. Here, all the rows of left DataFrame i.e student_df
are kept in the joined_df
, and a row with right DataFrame having the same value of the index as of row in left DataFrame are joined and placed in the same row.
Join DataFrames With a Common Column Name Using DataFrame.join()
Method
If we have a column with the same name in both the DataFrames we are trying to join using the DataFrame.join()
method, we get an error with the message ValueError: columns overlap but no suffix specified
. We can set the values of lsuffix
and rsuffix
parameters in the DataFrame.join()
method to solve the error.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"],
}
)
joined_df = student_df.join(grades_df, lsuffix="_left", rsuffix="_right")
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Joined DataFrame:")
print(joined_df, "\n")
Output:
Student DataFrame:
Roll No Name Gender Age
0 501 Jennifer Female 17
1 502 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Joined DataFrame:
Roll No_left Name Gender Age Roll No_right Grades
0 501 Jennifer Female 17 501 A
1 502 Travis Male 18 502 B+
2 503 Bob Male 17 503 A-
3 504 Emma Female 16 504 A
4 505 Luna Female 18 505 B
5 506 Anish Male 16 506 A+
It joins grades_df
at the right of the student_df
. The DataFrame.join()
does not merge the individual DataFrames i.e. even if the Roll No
column is common to both the DataFrames, they will be placed as separate fields after join()
. To distinguish the column name with a common name, we provide suffix for both columns in the left and right DataFrame using the lsuffix
and rsuffix
parameters.
Alternatively, we can also the DataFrame.merge()
method to solve the issue by passing the common column’s name as an on
parameter into the method.
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
"Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
"Age": [17, 18, 17, 16, 18, 16],
}
)
grades_df = pd.DataFrame(
{
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"],
}
)
merged_df = student_df.merge(grades_df, on="Roll No")
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Merged DataFrame:")
print(merged_df, "\n")
Output:
Student DataFrame:
Roll No Name Gender Age
0 501 Jennifer Female 17
1 502 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Merged DataFrame:
Roll No Name Gender Age Grades
0 501 Jennifer Female 17 A
1 502 Travis Male 18 B+
2 503 Bob Male 17 A-
3 504 Emma Female 16 A
4 505 Luna Female 18 B
5 506 Anish Male 16 A+
It merges the DataFrames student_df
and grades_df
into a single DataFrame. In this case, the column Roll No
will be merged into a single column for both the DataFrames.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn