How to Apply Transform With Groupby in Pandas
-
Difference Between the
apply()
andtransform()
in Python -
Use the
apply()
Method in Python Pandas -
Use the
transform()
Method in Python Pandas
The groupby()
is a powerful method in Python that allows us to divide the data into separate groups according to some criteria. The purpose is to run calculations and perform better analysis.
Difference Between the apply()
and transform()
in Python
The apply()
and transform()
are two methods used in conjunction with the groupby()
method call. The difference between these two methods is the argument passed, and the value returned.
The apply()
method accepts the argument as a DataFrame
and returns a scalar
or a sequence
of the data frame. Therefore, it allows us to conduct operations on each group’s column, rows, and the complete data frame.
The transform()
method only accepts the argument as a series representing a column from each group, and it returns a sequence of the same length as the input series. Therefore, we can only operate on specific columns inside each group at once.
Use the apply()
Method in Python Pandas
In the following code, we have loaded a CSV file that consists of Student records. We have used the apply function to show the highest score among each department.
First, we have to make a group of every department using the groupby()
method. Then found the maximum score of each department using the max()
function.
The output returned in the form of a series. We can also perform operations on multiple columns or the entire data frame.
# Python 3.x
import pandas as pd
df = pd.read_csv("Student.csv")
display(df)
def f(my_df):
return my_df.Marks.max()
df.groupby("Department").apply(f)
Output:
Use the transform()
Method in Python Pandas
We have merged another column, Mean_Marks
, to the data frame by making a group of each department using the groupby()
method in the next example, and then calculated the Mean of both departments using the mean
keyword.
The output shows the mean score of both departments.
Here, the transform()
method has operated on a single column, in our case Marks
.
# Python 3.x
import pandas as pd
df = pd.read_csv("Student.csv")
display(df)
df["Mean_Marks"] = df.groupby("Department")["Marks"].transform("mean")
display(df)
Output:
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn