How to Get Average of a Column of a Pandas DataFrame
When we work with large data sets, sometimes we have to take average or mean of column. For example, you have a grading list of students and you want to know the average of grades or some other column. Listed below are the different ways to achieve this task.
We will use the same DataFrame
in the next sections as follows,
import pandas as pd
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, 76, 95, 79],
}
df = pd.DataFrame(data)
Below is the example DataFrame
.
name percentage grade
0 Oliver 90 88
1 Harry 99 76
2 George 50 95
3 Noah 65 79
df.mean()
Method to Calculate the Average of a Pandas DataFrame Column
Let’s take the mean of grades column present in our dataset.
import pandas as pd
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, 76, 95, 79],
}
df = pd.DataFrame(data)
mean_df = df["grade"].mean()
print(mean_df)
The following will be output.
84.5
Let’s take another example and apply df.mean()
function on the entire DataFrame.
import pandas as pd
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, 76, 95, 79],
}
df = pd.DataFrame(data)
mean_df = df.mean()
print(mean_df)
We don’t specify the column name in the mean()
method in the above example. The mean()
method automatically determines which columns are eligible for applying mean
function.
The following will be output.
percentage 76.0
grade 84.5
dtype: float64
df.describe()
Method
This method creates the output of a complete statistics of the dataset. Let’s take a look how to use it.
import pandas as pd
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, 76, 95, 79],
}
df = pd.DataFrame(data)
print(df.describe())
Output:
percentage grade
count 4.000000 4.000000
mean 76.000000 84.500000
std 22.524061 8.660254
min 50.000000 76.000000
25% 61.250000 78.250000
50% 77.500000 83.500000
75% 92.250000 89.750000
max 99.000000 95.000000
The result of df.describle()
method is a DataFrame
, therefore, you could get the average of percentage
and grade
by referring to the column name and row name.
df.describe()["grade"]["mean"]
df.describe()["percentage"]["mean"]
df.describe()
can also work for specific column. Let’s apply this function on grade
column.
import pandas as pd
data = {
"name": ["Oliver", "Harry", "George", "Noah"],
"percentage": [90, 99, 50, 65],
"grade": [88, 76, 95, 79],
}
df = pd.DataFrame(data)
print(df["grade"].describe())
The following will be output.
count 4.000000
mean 84.500000
std 8.660254
min 76.000000
25% 78.250000
50% 83.500000
75% 89.750000
max 95.000000
Name: grade, dtype: float64
The result is Series
when the column is specified. We could get the average value by referring to mean
directly.
df["grade"].describe()["mean"]