How to Calculate Weighted Average in Pandas
- Calculate the Weighted Average of Pandas DataFrame
-
Use
Groupby
Function to Group the Weighted Average in Pandas
In this article, we’ll learn how to calculate a weighted average of Pandas DataFrame. We also discuss how to group the weighted average of Pandas DataFrame.
Calculate the Weighted Average of Pandas DataFrame
After importing pandas as pd
, we will create a simple DataFrame. Let us imagine you are a teacher and evaluating your students’ scores.
Overall, there are three different assessments: Quiz_1
, Quiz_2
and Quiz_3
.
Code Example:
import pandas as pd
import numpy as np
Student_DF = pd.DataFrame(
{
"Student_Score": [30, 60, 90],
"Std_score_Weight": [1, 2, 3],
"Student_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3"],
}
)
Student_DF
Output:
Student_Score Std_score_Weight Student_Assessment
0 30 1 Quiz_1
1 60 2 Quiz_2
2 90 3 Quiz_3
Code Example:
Student_Average = Student_DF["Student_Score"].mean()
Student_Average
Output:
60.0
Those assessments should influence the overall score differently depending on their weight. So, we want to calculate the weighted average instead of the sample mean.
First, we multiply the Student_Score
by the values, then we need to divide the result by the total sum of the weights, and this is also how we could implement it in Pandas.
As the Pandas library allows us to do vectorized calculations, we can multiply the Student_Score
by the weight and calculate the sum. Then we need to divide the result by the sum of the weights.
Code Example:
Std_weighted_avg = (
Student_DF["Student_Score"] * Student_DF["Std_score_Weight"]
).sum() / Student_DF["Std_score_Weight"].sum()
Std_weighted_avg
For this DataFrame, the weighted average would be the following.
70.0
Use Groupby
Function to Group the Weighted Average in Pandas
For the next example, we added another column for the different students. So, we have got students John
& Jack
here.
Student_DF = pd.DataFrame(
{
"Student_Score": [30, 50, 90, 40, 50, 20],
"Std_score_Weight": [1, 2, 3, 1, 2, 3],
"Two_Students": ["John", "John", "John", "Jack", "Jack", "Jack"],
"Students_Assessment": [
"Quiz_1",
"Quiz_2",
"Quiz_3",
"Quiz_1",
"Quiz_2",
"Quiz_3",
],
}
)
Student_DF
Output:
Student_Score Std_score_Weight Two_Students Students_Assessment
0 30 1 John Quiz_1
1 50 2 John Quiz_2
2 90 3 John Quiz_3
3 40 1 Jack Quiz_1
4 50 2 Jack Quiz_2
5 20 3 Jack Quiz_3
Let’s assume we want to calculate the weighted average only for student Jack
. In that case, we could filter our data, as we did here with the query()
method.
The filtered DataFrame will look like this.
Filtered_by_Jack = Student_DF.query("Two_Students == 'Jack'")
Filtered_by_Jack
Output:
Student_Score Std_score_Weight Two_Students Students_Assessment
3 40 1 Jack Quiz_1
4 50 2 Jack Quiz_2
5 20 3 Jack Quiz_3
With that in place, we can apply the same calculation as before, but this time on the filtered DataFrame.
Std_weighted_avg = (
Filtered_by_Jack["Student_Score"] * Filtered_by_Jack["Std_score_Weight"]
).sum() / Filtered_by_Jack["Std_score_Weight"].sum()
Std_weighted_avg
Output:
33.333333333333336
However, this method can get tedious, especially when dealing with a larger dataset. For example, 100 students, and you want to calculate the weighted average for each student.
In our case, we have just added one more student to the dataset.
Student_DF = pd.DataFrame(
{
"Student_Score": [20, 40, 90, 80, 60, 10, 5, 60, 90],
"Std_score_Weight": [1, 2, 3, 1, 2, 3, 1, 2, 3],
"Three_Student": [
"John",
"John",
"John",
"Jack",
"Jack",
"Jack",
"Harry",
"Harry",
"Harry",
],
"Students_Assessment": [
"Quiz_1",
"Quiz_2",
"Quiz_3",
"Quiz_1",
"Quiz_2",
"Quiz_3",
"Quiz_1",
"Quiz_2",
"Quiz_3",
],
}
)
Student_DF
Output:
Student_Score Std_score_Weight Three_Student Students_Assessment
0 20 1 John Quiz_1
1 40 2 John Quiz_2
2 90 3 John Quiz_3
3 80 1 Jack Quiz_1
4 60 2 Jack Quiz_2
5 10 3 Jack Quiz_3
6 5 1 Harry Quiz_1
7 60 2 Harry Quiz_2
8 90 3 Harry Quiz_3
But, the following method will also work regardless of many students the dataset might contain. This time, we will write a small helper function called Groupby_weighted_avg()
.
The function takes three parameters: the values
, weighted_value
and the column name called Group_Cols
, which we want to group the weighted average. The calculation method is very similar to before; the only difference is that we combine it with the groupby()
method.
def Groupby_weighted_avg(values, weighted_value, Group_Cols):
return (values * weighted_value).groupby(Group_Cols).sum() / weighted_value.groupby(
Group_Cols
).sum()
With our Groupby_weighted_avg()
function in place, we can now pass in the values, which are the Student_Score
in our example, and the Std_score_Weight
. We want to group the result by Three_Student
.
Groupby_weighted_avg(
Student_DF["Student_Score"],
Student_DF["Std_score_Weight"],
Student_DF["Three_Student"],
)
After executing the above line, we will have a new DataFrame containing the weighted average for each student.
Three_Student
Harry 65.833333
Jack 38.333333
John 61.666667
dtype: float64
Complete Source Code:
# In[1]:
import pandas as pd
import numpy as np
Student_DF = pd.DataFrame(
{
"Student_Score": [30, 60, 90],
"Std_score_Weight": [1, 2, 3],
"Student_Assessment": ["Quiz_1", "Quiz_2", "Quiz_3"],
}
)
Student_DF
# In[2]:
Student_Average = Student_DF["Student_Score"].mean()
Student_Average
# In[3]:
Std_weighted_avg = (
Student_DF["Student_Score"] * Student_DF["Std_score_Weight"]
).sum() / Student_DF["Std_score_Weight"].sum()
Std_weighted_avg
# In[4]:
# groupby
Student_DF = pd.DataFrame(
{
"Student_Score": [30, 50, 90, 40, 50, 20],
"Std_score_Weight": [1, 2, 3, 1, 2, 3],
"Two_Students": ["John", "John", "John", "Jack", "Jack", "Jack"],
"Students_Assessment": [
"Quiz_1",
"Quiz_2",
"Quiz_3",
"Quiz_1",
"Quiz_2",
"Quiz_3",
],
}
)
Student_DF
# In[5]:
Filtered_by_Jack = Student_DF.query("Two_Students == 'Jack'")
Filtered_by_Jack
# In[6]:
Std_weighted_avg = (
Filtered_by_Jack["Student_Score"] * Filtered_by_Jack["Std_score_Weight"]
).sum() / Filtered_by_Jack["Std_score_Weight"].sum()
Std_weighted_avg
# In[7]:
Student_DF = pd.DataFrame(
{
"Student_Score": [20, 40, 90, 80, 60, 10, 5, 60, 90],
"Std_score_Weight": [1, 2, 3, 1, 2, 3, 1, 2, 3],
"Three_Student": [
"John",
"John",
"John",
"Jack",
"Jack",
"Jack",
"Harry",
"Harry",
"Harry",
],
"Students_Assessment": [
"Quiz_1",
"Quiz_2",
"Quiz_3",
"Quiz_1",
"Quiz_2",
"Quiz_3",
"Quiz_1",
"Quiz_2",
"Quiz_3",
],
}
)
Student_DF
# In[8]:
def Groupby_weighted_avg(values, weighted_value, Group_Cols):
return (values * weighted_value).groupby(Group_Cols).sum() / weighted_value.groupby(
Group_Cols
).sum()
# In[9]:
Groupby_weighted_avg(
Student_DF["Student_Score"],
Student_DF["Std_score_Weight"],
Student_DF["Three_Student"],
)
Hello! I am Salman Bin Mehmood(Baum), a software developer and I help organizations, address complex problems. My expertise lies within back-end, data science and machine learning. I am a lifelong learner, currently working on metaverse, and enrolled in a course building an AI application with python. I love solving problems and developing bug-free software for people. I write content related to python and hot Technologies.
LinkedIn