How to Use of rolling().apply() on Pandas Dataframe and Series

Mehvish Ashiq Feb 02, 2024 Pandas Pandas Rolling

Use rolling().apply() on a Pandas DataFrame
rolling.apply With Lambda
Use rolling().apply() on a Pandas Series

How to Use of rolling().apply() on Pandas Dataframe and Series

Pandas library has many useful functions, rolling() is one of them, which can perform complex calculations on the specified datasets. We also have a method called apply() to apply the particular function/method with a rolling window to the complete data.

We can use rolling().apply() with Python series and data frames. This tutorial educates about rolling() and apply() methods, also demonstrates how to use rolling().apply() on a Pandas dataframe and series.

Use `rolling().apply()` on a Pandas DataFrame

Let’s dive in step-by-step to learn the use of rolling().apply() on a dataframe.

Import libraries.
```
import pandas as pd
import numpy as np
```
First, we import necessary libraries, pandas for playing with data frames and numpy to work with arrays while using the numpy.median() function.

Create a dataframe.

points_df = pd.DataFrame(
    {
        "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33],
        "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34],
        "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35],
        "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36],
    }
)
print(points_df)

Output:

 Team_A Team_B Team_C Team_D
0      12      13      14      15
1      23      24      25      26
2      34      35      36      37
3      45      46      47      48
4      32      33      34      35
5      45      46      47      48
6      32      33      34      35
7      21      22      23      24
8      33      34      35      36

Next, create a dataframe named points_df, which contains different points for Team_A, Team_B, Team_C, and Team_D. We can see that the default index has no header (heading).

Let’s create a heading for that in the following step.

Set the heading as `index` for the default column index.

points_df.index.names = ["index"]
print(points_df)

Output:

	 Team_A Team_B Team_C Team_D
index
0          12      13      14      15
1          23      24      25      26
2          34      35      36      37
3          45      46      47      48
4          32      33      34      35
5          45      46      47      48
6          32      33      34      35
7          21      22      23      24
8          33      34      35      36

As we can see, the heading index is not aligned with Team_A, Team_B, Team_C, and Team_D. Let’s do it in the following step.

Align all headings for `points_df` dataframe.

points_df.columns.name = points_df.index.name
points_df.index.name = None
print(points_df)

Output:

index Team_A Team_B Team_C Team_D
0          12      13      14      15
1          23      24      25      26
2          34      35      36      37
3          45      46      47      48
4          32      33      34      35
5          45      46      47      48
6          32      33      34      35
7          21      22      23      24
8          33      34      35      36

Create the calculate_median() function.
```
def calculate_median(n):
    return np.median(n)
```
This function will take a series (we can say an array of numeric values) and return that series’s median.
Use rolling().apply() on the points_df dataframe.
```
points_df = points_df.rolling(2).apply(calculate_median)
print(points_df)
```
Output:
```
index Team_A Team_B Team_C Team_D
0         NaN     NaN     NaN     NaN
1        17.5    18.5    19.5    20.5
2        28.5    29.5    30.5    31.5
3        39.5    40.5    41.5    42.5
4        38.5    39.5    40.5    41.5
5        38.5    39.5    40.5    41.5
6        38.5    39.5    40.5    41.5
7        26.5    27.5    28.5    29.5
8        27.0    28.0    29.0    30.0
```
Here, the rolling() is used to serve rolling window computations. This idea (rolling window) is used in signal processes & time-series datasets.

We have already written an article about rolling(), its syntax, the rolling window feature, and its working process by demonstrating various rolling functions. You can read that here.

We use apply() function to apply a custom function (which is calculate_median() in our case) on the specified data.

Here is the complete source code.

import pandas as pd
import numpy as np

points_df = pd.DataFrame(
    {
        "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33],
        "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34],
        "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35],
        "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36],
    }
)

points_df.index.names = ["index"]
points_df.columns.name = points_df.index.name
points_df.index.name = None

print("Before rolling().apply():\n\n")
print(points_df)


def calculate_median(n):
    return np.median(n)


points_df = points_df.rolling(2).apply(calculate_median)
print("\n\nBefore rolling().apply():\n\n")
print(points_df)

Output:

Before rolling().apply():


index Team_A Team_B Team_C Team_D
0          12      13      14      15
1          23      24      25      26
2          34      35      36      37
3          45      46      47      48
4          32      33      34      35
5          45      46      47      48
6          32      33      34      35
7          21      22      23      24
8          33      34      35      36


Before rolling().apply():


index Team_A Team_B Team_C Team_D
0         NaN     NaN     NaN     NaN
1        17.5    18.5    19.5    20.5
2        28.5    29.5    30.5    31.5
3        39.5    40.5    41.5    42.5
4        38.5    39.5    40.5    41.5
5        38.5    39.5    40.5    41.5
6        38.5    39.5    40.5    41.5
7        26.5    27.5    28.5    29.5
8        27.0    28.0    29.0    30.0

`rolling.apply` With Lambda

Consider the following code:

from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np


def test(df):
    return np.mean(df)


sc = StandardScaler()

tmp = pd.DataFrame(
    np.random.randn(2000, 2) / 10000,
    index=pd.date_range("2001-01-01", periods=2000),
    columns=["A", "B"],
)

print("Test 1: ")
print(tmp.rolling(window=5, center=False).apply(lambda x: test(x)))

print("SC_Fit: ")
print(
    tmp.rolling(window=5, center=False).apply(
        lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
    )
)

Output:

Test 1:
                   A         B
2001-01-01       NaN       NaN
2001-01-02       NaN       NaN
2001-01-03       NaN       NaN
2001-01-04       NaN       NaN
2001-01-05 -0.000039  0.000053
...              ...       ...
2006-06-19  0.000022 -0.000021
2006-06-20  0.000005 -0.000027
2006-06-21  0.000024 -0.000060
2006-06-22  0.000023 -0.000038
2006-06-23  0.000014 -0.000017
[2000 rows x 2 columns]

SC_Fit:

                   A         B
2001-01-01       NaN       NaN
2001-01-02       NaN       NaN
2001-01-03       NaN       NaN
2001-01-04       NaN       NaN
2001-01-05 -0.201991  0.349646
...              ...       ...
2006-06-19  1.035835 -0.688231
2006-06-20 -0.595888  1.057016
2006-06-21 -0.640150 -1.399535
2006-06-22 -0.535689  1.244345
2006-06-23  0.510958  0.614429

[2000 rows x 2 columns]

Since x in the lambda function represents a (rolling) series/ndarray, the function can be written as follows (where x[-1] refers to the current rolling data point).

lambda x: (x[-1] - x.mean()) / x.std(ddof=1)

Use `rolling().apply()` on a Pandas Series

Similarly, we can use rolling().apply() for a Pandas series. The following code fence is the same as we wrote for Pandas data frames except for one difference, we are using series here.

The complete source code is given below, but you can read about the series in detail here.

Example Code:

import pandas as pd
import numpy as np

points_series = pd.Series(
    [12, 23, 34, 45], index=["Team_A", "Team_B", "Team_C", "Team_D"]
)


print("Before rolling().apply():\n\n")
print(points_series)


def calculate_median(n):
    return np.median(n)


points_series = points_series.rolling(2).apply(calculate_median)
print("\n\nBefore rolling().apply():\n\n")
print(points_series)

Output:

Before rolling().apply():


Team_A    12
Team_B    23
Team_C    34
Team_D    45
dtype: int64


Before rolling().apply():


Team_A     NaN
Team_B    17.5
Team_C    28.5
Team_D    39.5
dtype: float64

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Mehvish Ashiq

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook

How to Use of rolling().apply() on Pandas Dataframe and Series

Use `rolling().apply()` on a Pandas DataFrame

Import libraries.

Create a dataframe.

Set the heading as `index` for the default column index.

Align all headings for `points_df` dataframe.

Create the `calculate_median()` function.

Use `rolling().apply()` on the `points_df` dataframe.

Here is the complete source code.

`rolling.apply` With Lambda

Use `rolling().apply()` on a Pandas Series

Use rolling().apply() on a Pandas DataFrame

Import libraries.

Create a dataframe.

Set the heading as index for the default column index.

Align all headings for points_df dataframe.

Create the calculate_median() function.

Use rolling().apply() on the points_df dataframe.

Here is the complete source code.

rolling.apply With Lambda

Use rolling().apply() on a Pandas Series

Use `rolling().apply()` on a Pandas DataFrame

Set the heading as `index` for the default column index.

Align all headings for `points_df` dataframe.

Create the `calculate_median()` function.

Use `rolling().apply()` on the `points_df` dataframe.

`rolling.apply` With Lambda

Use `rolling().apply()` on a Pandas Series