How to Get the Aggregate of Pandas Group-By and Sum
-
Cumulative Sum With
groupby
-
pivot()
to Rearrange the Data in a Nice Table -
Apply
function
togroupby
in Pandas -
agg()
to Get Aggregate Sum of the Column
We will demonstrate how to get the aggregate in Pandas by using groupby
and sum
. We will also look at the pivot
functionality to arrange the data in a nice table and define our custom function and run it on the DataFrame
. We will also get the aggregate sum by using agg()
.
Cumulative Sum With groupby
We can get the cumulative sum by using groupby
method. Consider the following Dataframe with Date
, Fruit
and Sale
columns:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
If we want to calculate the cumulative sum of Sale per Fruit and for every date, we can do:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
print(df.groupby(by=["Fruit", "Date"]).sum().groupby(level=[0]).cumsum())
Output:
Fruit Date Sale
Apple 08/09/2018 34
10/09/2018 46
Banana 08/09/2018 22
10/09/2018 49
pivot()
to Rearrange the Data in a Nice Table
pivot()
method could set row and column attribute of the table. Let’s change the above code and apply pivot()
method to rearrange the data in a nice table:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
print(
df.groupby(["Fruit", "Date"], as_index=False).sum().pivot("Fruit", "Date").fillna(0)
)
Output:
Sale
Date 08/09/2018 10/09/2018
Fruit
Apple 34 12
Banana 22 27
Apply function
to groupby
in Pandas
We will create a simple method to get count of values in series
or 1d array
and use groupby
to get aggregate count of each value:
from pandas import *
d = {"series": Series(["1", "2", "1", "1", "4", "4", "5"])}
df = DataFrame(d)
def get_count(values):
return len(values)
grouped_count = df.groupby("series").series.agg(get_count)
print(grouped_count)
After running the code, we will get the following output, which provides each value’s occurrence in the series.
Output:
series
1 3
2 1
4 2
5 1
Name: series, dtype: int64
agg()
to Get Aggregate Sum of the Column
We can use agg()
to apply sum operation on column. Example Codes:
import pandas as pd
df = pd.DataFrame(
{
"Date": ["08/09/2018", "10/09/2018", "08/09/2018", "10/09/2018"],
"Fruit": [
"Apple",
"Apple",
"Banana",
"Banana",
],
"Sale": [34, 12, 22, 27],
}
)
print(df.groupby(["Fruit"])["Sale"].agg("sum"))
Output:
Fruit
Apple 46
Banana 49
Name: Sale, dtype: int64