How to Specify Suffix in Pandas join() Method

Suraj Joshi Feb 02, 2024
  1. Join Two DataFrames Using the DataFrame.join() Method
  2. Join DataFrames With a Common Column Name Using DataFrame.join() Method
How to Specify Suffix in Pandas join() Method

This tutorial explains how we can join two DataFrames in Pandas using the DataFrame.join() method and specify the suffix when joining.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df)

Output:

Student DataFrame:
       Name  Gender  Age
0  Jennifer  Female   17
1    Travis    Male   18
2       Bob    Male   17
3      Emma  Female   16
4      Luna  Female   18
5     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+

We will explain the DataFrame.join() method by demonstrating the joining of students_df and grades_df DataFrame.

Join Two DataFrames Using the DataFrame.join() Method

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

joined_df = student_df.join(grades_df)

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df, "\n")

print("Joined DataFrame:")
print(joined_df, "\n")

Output:

Student DataFrame:
       Name  Gender  Age
0  Jennifer  Female   17
1    Travis    Male   18
2       Bob    Male   17
3      Emma  Female   16
4      Luna  Female   18
5     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+ 

Joined DataFrame:
       Name  Gender  Age  Roll No Grades
0  Jennifer  Female   17      501      A
1    Travis    Male   18      502     B+
2       Bob    Male   17      503     A-
3      Emma  Female   16      504      A
4      Luna  Female   18      505      B
5     Anish    Male   16      506     A+ 

It joins the student_df and grades_df, and creates joined_df. By default, the join() method uses the index of both DataFrames to join them. The join method is Left Join by default. Here, all the rows of left DataFrame i.e student_df are kept in the joined_df, and a row with right DataFrame having the same value of the index as of row in left DataFrame are joined and placed in the same row.

Join DataFrames With a Common Column Name Using DataFrame.join() Method

If we have a column with the same name in both the DataFrames we are trying to join using the DataFrame.join() method, we get an error with the message ValueError: columns overlap but no suffix specified. We can set the values of lsuffix and rsuffix parameters in the DataFrame.join() method to solve the error.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

joined_df = student_df.join(grades_df, lsuffix="_left", rsuffix="_right")

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df, "\n")

print("Joined DataFrame:")
print(joined_df, "\n")

Output:

Student DataFrame:
   Roll No      Name  Gender  Age
0      501  Jennifer  Female   17
1      502    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+ 

Joined DataFrame:
   Roll No_left      Name  Gender  Age  Roll No_right Grades
0           501  Jennifer  Female   17            501      A
1           502    Travis    Male   18            502     B+
2           503       Bob    Male   17            503     A-
3           504      Emma  Female   16            504      A
4           505      Luna  Female   18            505      B
5           506     Anish    Male   16            506     A+ 

It joins grades_df at the right of the student_df. The DataFrame.join() does not merge the individual DataFrames i.e. even if the Roll No column is common to both the DataFrames, they will be placed as separate fields after join(). To distinguish the column name with a common name, we provide suffix for both columns in the left and right DataFrame using the lsuffix and rsuffix parameters.

Alternatively, we can also the DataFrame.merge() method to solve the issue by passing the common column’s name as an on parameter into the method.

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Name": ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
        "Gender": ["Female", "Male", "Male", "Female", "Female", "Male"],
        "Age": [17, 18, 17, 16, 18, 16],
    }
)

grades_df = pd.DataFrame(
    {
        "Roll No": [501, 502, 503, 504, 505, 506],
        "Grades": ["A", "B+", "A-", "A", "B", "A+"],
    }
)

merged_df = student_df.merge(grades_df, on="Roll No")

print("Student DataFrame:")
print(student_df, "\n")

print("Grades DataFrame:")
print(grades_df, "\n")

print("Merged DataFrame:")
print(merged_df, "\n")

Output:

Student DataFrame:
   Roll No      Name  Gender  Age
0      501  Jennifer  Female   17
1      502    Travis    Male   18
2      503       Bob    Male   17
3      504      Emma  Female   16
4      505      Luna  Female   18
5      506     Anish    Male   16 

Grades DataFrame:
   Roll No Grades
0      501      A
1      502     B+
2      503     A-
3      504      A
4      505      B
5      506     A+ 

Merged DataFrame:
   Roll No      Name  Gender  Age Grades
0      501  Jennifer  Female   17      A
1      502    Travis    Male   18     B+
2      503       Bob    Male   17     A-
3      504      Emma  Female   16      A
4      505      Luna  Female   18      B
5      506     Anish    Male   16     A+ 

It merges the DataFrames student_df and grades_df into a single DataFrame. In this case, the column Roll No will be merged into a single column for both the DataFrames.

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn