GroupBy Apply in Pandas
- Pandas GroupBy-Apply Behaviour
-
Using the
groupby()
Function in Pandas -
Join
groupby()
andapply()
Function in Pandas
This tutorial aims to explore the GroupBy Apply
concept in Pandas. Pandas is used as an advanced data analysis tool or a package extension in Python.
It is highly recommended to use Pandas when we have data in a SQL table, a spreadsheet or heterogenous columns. The data can be ordered or unordered, and time-series data is also supported.
Pandas GroupBy-Apply Behaviour
let us try to understand how to group by data and then apply a particular function to aggregate or calculate values to our data. GroupBy
helps us group or bring together certain data entries together.
GroupBy
helps us keep track of different data entry points in our data. Let us see this method in action.
We’ll create a dummy data frame to work with. Here we create a data frame dframe
and a few rows.
from pandas import *
our_data = {"mylabel": Series(["P", "R", "E", "E", "T", "S", "A", "P", "R", "E", "T"])}
dframe = DataFrame(our_data)
print(dframe) # print output
Output:
mylabel
0 P
1 R
2 E
3 E
4 T
5 S
6 A
7 P
8 R
9 E
10 T
We have our data frame with the label mylabel
set up with different data points and indices. Each alphabet has been assigned a particular index.
These labels are something we will learn how to group and apply certain aggregation functions.
Using the groupby()
Function in Pandas
We can understand how to group data with the help of the following code. As we can see, we are trying to group each alphabet and count their occurrence.
from pandas import *
our_data = {"mylabel": Series(["P", "R", "E", "E", "T", "S", "A", "P", "R", "E", "T"])}
dframe = DataFrame(our_data)
def perc(value, total):
return value / float(total)
def gcou(values):
return len(values)
grpd_count = dframe.groupby("mylabel").mylabel.agg(gcou)
print(grpd_count) # prints output
Output:
mylabel
A 1
E 3
P 2
R 2
S 1
T 2
Name: mylabel, dtype: int64
We need to work with this new data frame that we have created called the grpd_count
to apply any mathematical formula. Here, we have the count of every alphabet available to us.
Join groupby()
and apply()
Function in Pandas
Let us manipulate the data frame grpd_count
to divide the total number of counts for each alphabet by the sum of all counts. This idea is generally used to gauge the weightage of an entity in the range from 0 to 1
.
The values closer to one have a higher weightage, whereas the values closer to zero have a lower weightage, meaning the occurrence of that particular alphabet is less than others.
Code Sample:
from pandas import *
our_data = {"mylabel": Series(["P", "R", "E", "E", "T", "S", "A", "P", "R", "E", "T"])}
dframe = DataFrame(our_data)
def perc(value, total):
return value / float(total)
def gcou(values):
return len(values)
grpd_count = dframe.groupby("mylabel").mylabel.agg(gcou)
mydata = grpd_count.apply(perc, total=dframe.mylabel.count())
print(mydata) # prints output
Output:
mylabel
A 0.090909
E 0.272727
P 0.181818
R 0.181818
S 0.090909
T 0.181818
Name: mylabel, dtype: float64
We have successfully performed an operation after grouping data in Pandas.
Therefore, with the help of the Grouping By
technique in Pandas, we can efficiently filter data based on our requirement and when needed and based on one or more than one condition and then apply some function or aggregation to results.