How to Use of rolling().apply() on Pandas Dataframe and Series
-
Use
rolling().apply()
on a Pandas DataFrame -
rolling.apply
With Lambda -
Use
rolling().apply()
on a Pandas Series
Pandas library has many useful functions, rolling()
is one of them, which can perform complex calculations on the specified datasets. We also have a method called apply()
to apply the particular function/method with a rolling window to the complete data.
We can use rolling().apply()
with Python series and data frames. This tutorial educates about rolling()
and apply()
methods, also demonstrates how to use rolling().apply()
on a Pandas dataframe and series.
Use rolling().apply()
on a Pandas DataFrame
Let’s dive in step-by-step to learn the use of rolling().apply()
on a dataframe.
-
Import libraries.
import pandas as pd import numpy as np
First, we import necessary libraries,
pandas
for playing with data frames andnumpy
to work with arrays while using thenumpy.median()
function. -
Create a dataframe.
points_df = pd.DataFrame( { "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33], "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34], "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35], "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36], } ) print(points_df)
Output:
Team_A Team_B Team_C Team_D 0 12 13 14 15 1 23 24 25 26 2 34 35 36 37 3 45 46 47 48 4 32 33 34 35 5 45 46 47 48 6 32 33 34 35 7 21 22 23 24 8 33 34 35 36
Next, create a dataframe named
points_df
, which contains different points forTeam_A
,Team_B
,Team_C
, andTeam_D
. We can see that the default index has no header (heading).Let’s create a heading for that in the following step.
-
Set the heading as
index
for the default column index.points_df.index.names = ["index"] print(points_df)
Output:
Team_A Team_B Team_C Team_D index 0 12 13 14 15 1 23 24 25 26 2 34 35 36 37 3 45 46 47 48 4 32 33 34 35 5 45 46 47 48 6 32 33 34 35 7 21 22 23 24 8 33 34 35 36
As we can see, the heading
index
is not aligned withTeam_A
,Team_B
,Team_C
, andTeam_D
. Let’s do it in the following step. -
Align all headings for
points_df
dataframe.points_df.columns.name = points_df.index.name points_df.index.name = None print(points_df)
Output:
index Team_A Team_B Team_C Team_D 0 12 13 14 15 1 23 24 25 26 2 34 35 36 37 3 45 46 47 48 4 32 33 34 35 5 45 46 47 48 6 32 33 34 35 7 21 22 23 24 8 33 34 35 36
-
Create the
calculate_median()
function.def calculate_median(n): return np.median(n)
This function will take a series (we can say an array of numeric values) and return that series’s median.
-
Use
rolling().apply()
on thepoints_df
dataframe.points_df = points_df.rolling(2).apply(calculate_median) print(points_df)
Output:
index Team_A Team_B Team_C Team_D 0 NaN NaN NaN NaN 1 17.5 18.5 19.5 20.5 2 28.5 29.5 30.5 31.5 3 39.5 40.5 41.5 42.5 4 38.5 39.5 40.5 41.5 5 38.5 39.5 40.5 41.5 6 38.5 39.5 40.5 41.5 7 26.5 27.5 28.5 29.5 8 27.0 28.0 29.0 30.0
Here, the
rolling()
is used to serve rolling window computations. This idea (rolling window) is used in signal processes & time-series datasets.We have already written an article about
rolling()
, its syntax, the rolling window feature, and its working process by demonstrating various rolling functions. You can read that here.We use
apply()
function to apply a custom function (which iscalculate_median()
in our case) on the specified data. -
Here is the complete source code.
import pandas as pd import numpy as np points_df = pd.DataFrame( { "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33], "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34], "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35], "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36], } ) points_df.index.names = ["index"] points_df.columns.name = points_df.index.name points_df.index.name = None print("Before rolling().apply():\n\n") print(points_df) def calculate_median(n): return np.median(n) points_df = points_df.rolling(2).apply(calculate_median) print("\n\nBefore rolling().apply():\n\n") print(points_df)
Output:
Before rolling().apply(): index Team_A Team_B Team_C Team_D 0 12 13 14 15 1 23 24 25 26 2 34 35 36 37 3 45 46 47 48 4 32 33 34 35 5 45 46 47 48 6 32 33 34 35 7 21 22 23 24 8 33 34 35 36 Before rolling().apply(): index Team_A Team_B Team_C Team_D 0 NaN NaN NaN NaN 1 17.5 18.5 19.5 20.5 2 28.5 29.5 30.5 31.5 3 39.5 40.5 41.5 42.5 4 38.5 39.5 40.5 41.5 5 38.5 39.5 40.5 41.5 6 38.5 39.5 40.5 41.5 7 26.5 27.5 28.5 29.5 8 27.0 28.0 29.0 30.0
rolling.apply
With Lambda
Consider the following code:
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
def test(df):
return np.mean(df)
sc = StandardScaler()
tmp = pd.DataFrame(
np.random.randn(2000, 2) / 10000,
index=pd.date_range("2001-01-01", periods=2000),
columns=["A", "B"],
)
print("Test 1: ")
print(tmp.rolling(window=5, center=False).apply(lambda x: test(x)))
print("SC_Fit: ")
print(
tmp.rolling(window=5, center=False).apply(
lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
)
)
Output:
Test 1:
A B
2001-01-01 NaN NaN
2001-01-02 NaN NaN
2001-01-03 NaN NaN
2001-01-04 NaN NaN
2001-01-05 -0.000039 0.000053
... ... ...
2006-06-19 0.000022 -0.000021
2006-06-20 0.000005 -0.000027
2006-06-21 0.000024 -0.000060
2006-06-22 0.000023 -0.000038
2006-06-23 0.000014 -0.000017
[2000 rows x 2 columns]
SC_Fit:
A B
2001-01-01 NaN NaN
2001-01-02 NaN NaN
2001-01-03 NaN NaN
2001-01-04 NaN NaN
2001-01-05 -0.201991 0.349646
... ... ...
2006-06-19 1.035835 -0.688231
2006-06-20 -0.595888 1.057016
2006-06-21 -0.640150 -1.399535
2006-06-22 -0.535689 1.244345
2006-06-23 0.510958 0.614429
[2000 rows x 2 columns]
Since x
in the lambda
function represents a (rolling) series/ndarray, the function can be written as follows (where x[-1]
refers to the current rolling data point).
lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
Use rolling().apply()
on a Pandas Series
Similarly, we can use rolling().apply()
for a Pandas series. The following code fence is the same as we wrote for Pandas data frames except for one difference, we are using series here.
The complete source code is given below, but you can read about the series in detail here.
Example Code:
import pandas as pd
import numpy as np
points_series = pd.Series(
[12, 23, 34, 45], index=["Team_A", "Team_B", "Team_C", "Team_D"]
)
print("Before rolling().apply():\n\n")
print(points_series)
def calculate_median(n):
return np.median(n)
points_series = points_series.rolling(2).apply(calculate_median)
print("\n\nBefore rolling().apply():\n\n")
print(points_series)
Output:
Before rolling().apply():
Team_A 12
Team_B 23
Team_C 34
Team_D 45
dtype: int64
Before rolling().apply():
Team_A NaN
Team_B 17.5
Team_C 28.5
Team_D 39.5
dtype: float64