What Is the Difference Between Join and Merge in Pandas
In Pandas, Series
or DataFrame
can easily join or combine using various operations such as join
and merge
. These operations combine two DataFrames based on the indexes and column name. Both join
and merge
methods can combine two DataFrames. The main difference between the join and merge operation is that the join
method combines two DataFrames based on their indexes, whereas in the merge
method, we need to specify columns to combine both DataFrames.
This article will discuss the difference between the join
and merge
methods in pandas python.
Pandas DataFrame .join
Method
The join
method joins the two dataframs
on their indexes. Let’s take an example to show the working of the join
method. We have taken two DataFrames: left_df
and right_df
. Using the left_df.join(right_df)
code, we have combined two DataFrames.
Example Code:
import pandas as pd
# create two dataframe
df_left = pd.DataFrame({"Name": ["X", "Y", "Z"], "Score": [10, 8, 9]}).set_index("Name")
df_right = pd.DataFrame({"Name": ["X", "Y", "Z"], "Steals": [4, 5, 2]}).set_index(
"Name"
)
print(df_left)
print(df_right)
# join two dataframes
df_left.join(df_right)
Output:
Score
Name
X 10
Y 8
Z 9
Steals
Name
X 4
Y 5
Z 2
Score Steals
Name
X 10 4
Y 8 5
Z 9 2
If we have overlapping columns in both DataFrames, in this case, the join will want you to add a suffix to the overlapping or common column name from the left dataframe. In the following DataFrames, the overlapping column name is C
.
Example Code:
import pandas as pd
# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
# join two dataframes
joined_df = df_left.join(df_right, lsuffix="_")
print(joined_df)
Output:
C D
A x 1
B y 2
C F
A u 3
B v 4
C_ D C F
A x 1 u 3
B y 2 v 4
As you can see in the above output, the index is preserved with four columns. We can also specify a particular column separately on the left dataframe by using the on parameter as the join key.
Pandas DataFrame .merge
Method
The merge
method is also used to combine two DataFrames. But, the merge method requires the column name as the merge key for combining the two DataFrames. In the following example, we implement the simple merge function to combine two dataframes without using any parameter.
Example Code:
import pandas as pd
# create two dataframe
df_left = pd.DataFrame({"Name": ["X", "Y", "Z"], "Score": [10, 8, 9]}).set_index("Name")
df_right = pd.DataFrame({"Name": ["X", "Y", "Z"], "Steals": [4, 5, 2]}).set_index(
"Name"
)
print(df_left)
print(df_right)
# merge two dataframes
df_left.merge(df_right, on="Name")
Output:
Score
Name
X 10
Y 8
Z 9
Steals
Name
X 4
Y 5
Z 2
Score Steals
Name
X 10 4
Y 8 5
Z 9 2
We can specify the overlapping column name with the on
parameter in the merge
method. In the following example, we specify the overlapping column name C
to perform the merge operation on two DataFrames.
Example Code:
import pandas as pd
# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
# merge dataframes
merged_df = df_left.merge(df_right, on="C", how="outer")
print(merged_df)
Output:
C D
A x 1
B y 2
C F
A u 3
B v 4
C D F
0 x 1.0 NaN
1 y 2.0 NaN
2 u NaN 3.0
3 v NaN 4.0
To separately specify it using right_on
and left_on
parameters. See the following example, in which we have used different parameters such as on
, left_on
, right_on
for better understanding.
Example Code:
import pandas as pd
# Creating the two dataframes
df_left = pd.DataFrame([["x", 1], ["y", 2]], list("AB"), list("CD"))
df_right = pd.DataFrame([["u", 3], ["v", 4]], list("AB"), list("CF"))
print(df_left)
print(df_right)
merged_df = df_left.merge(
df_right, left_index=True, right_index=True, suffixes=["_", ""]
)
print(merged_df)
Output:
C D
A x 1
B y 2
C F
A u 3
B v 4
C_ D C F
A x 1 u 3
B y 2 v 4
Conclusion
We demonstrated the difference between the join
and merge
in pandas with the help of some examples. We have seen both methods, join
and merge
are used for a similar purpose, combining the DataFrames in pandas. But, the difference is that the join
method combines two DataFrames on their indexed
, whereas in the merge
method, we specify the column name to combine two DataFrames.