Pandas Dataframe 및 Series에서 rolling().apply() 사용

Mehvish Ashiq 2023년6월21일 Pandas Pandas Rolling

Pandas DataFrame에서 rolling().apply() 사용
Lambda를 사용한 rolling.apply
Pandas Series에서 rolling().apply() 사용

Pandas Dataframe 및 Series에서 rolling().apply() 사용

Pandas 라이브러리에는 많은 유용한 기능이 있으며 rolling()은 지정된 데이터 세트에서 복잡한 계산을 수행할 수 있는 기능 중 하나입니다. 또한 롤링 윈도우가 있는 특정 함수/메소드를 전체 데이터에 적용하는 apply()라는 메서드가 있습니다.

Python 계열 및 데이터 프레임과 함께 rolling().apply()를 사용할 수 있습니다. 이 튜토리얼은 rolling() 및 apply() 메서드에 대해 교육하고 Pandas 데이터 프레임 및 시리즈에서 rolling().apply()를 사용하는 방법도 보여줍니다.

Pandas DataFrame에서 `rolling().apply()` 사용

데이터 프레임에서 rolling().apply() 사용법을 단계별로 살펴보겠습니다.

라이브러리를 가져옵니다.
```
import pandas as pd
import numpy as np
```
먼저 numpy.median() 함수를 사용하는 동안 필요한 라이브러리인 데이터 프레임을 가지고 놀기 위한 pandas와 배열로 작업하기 위한 numpy를 가져옵니다.

데이터 프레임을 만듭니다.

points_df = pd.DataFrame(
    {
        "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33],
        "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34],
        "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35],
        "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36],
    }
)
print(points_df)

출력:

 Team_A Team_B Team_C Team_D
0      12      13      14      15
1      23      24      25      26
2      34      35      36      37
3      45      46      47      48
4      32      33      34      35
5      45      46      47      48
6      32      33      34      35
7      21      22      23      24
8      33      34      35      36

다음으로 Team_A, Team_B, Team_C 및 Team_D에 대한 서로 다른 포인트를 포함하는 points_df라는 데이터 프레임을 만듭니다. 기본 인덱스에 헤더(제목)가 없음을 알 수 있습니다.

다음 단계에서 제목을 만들어 보겠습니다.

기본 열 인덱스에 대한 제목을 `인덱스`로 설정합니다.

points_df.index.names = ["index"]
print(points_df)

출력:

	 Team_A Team_B Team_C Team_D
index
0          12      13      14      15
1          23      24      25      26
2          34      35      36      37
3          45      46      47      48
4          32      33      34      35
5          45      46      47      48
6          32      33      34      35
7          21      22      23      24
8          33      34      35      36

보시다시피 제목 index는 Team_A, Team_B, Team_C 및 Team_D와 정렬되지 않습니다. 다음 단계에서 해보자.

`points_df` 데이터 프레임의 모든 제목을 정렬합니다.

points_df.columns.name = points_df.index.name
points_df.index.name = None
print(points_df)

출력:

index Team_A Team_B Team_C Team_D
0          12      13      14      15
1          23      24      25      26
2          34      35      36      37
3          45      46      47      48
4          32      33      34      35
5          45      46      47      48
6          32      33      34      35
7          21      22      23      24
8          33      34      35      36

calculate_median() 함수를 생성합니다.
```
def calculate_median(n):
    return np.median(n)
```
이 함수는 계열(숫자 값의 배열이라고 할 수 있음)을 사용하여 해당 계열의 중앙값을 반환합니다.
points_df 데이터 프레임에서 rolling().apply()를 사용하십시오.
```
points_df = points_df.rolling(2).apply(calculate_median)
print(points_df)
```
출력:
```
index Team_A Team_B Team_C Team_D
0         NaN     NaN     NaN     NaN
1        17.5    18.5    19.5    20.5
2        28.5    29.5    30.5    31.5
3        39.5    40.5    41.5    42.5
4        38.5    39.5    40.5    41.5
5        38.5    39.5    40.5    41.5
6        38.5    39.5    40.5    41.5
7        26.5    27.5    28.5    29.5
8        27.0    28.0    29.0    30.0
```
여기서 rolling()은 롤링 창 계산을 제공하는 데 사용됩니다. 이 아이디어(롤링 창)는 신호 프로세스 및 시계열 데이터 세트에 사용됩니다.

우리는 이미 rolling(), 구문, 롤링 창 기능 및 다양한 롤링 기능을 보여줌으로써 작업 프로세스에 대한 기사를 작성했습니다. 여기에서 읽을 수 있습니다.

우리는 apply() 함수를 사용하여 지정된 데이터에 사용자 지정 함수(우리의 경우 calculate_median())를 적용합니다.

다음은 전체 소스 코드입니다.

import pandas as pd
import numpy as np

points_df = pd.DataFrame(
    {
        "Team_A": [12, 23, 34, 45, 32, 45, 32, 21, 33],
        "Team_B": [13, 24, 35, 46, 33, 46, 33, 22, 34],
        "Team_C": [14, 25, 36, 47, 34, 47, 34, 23, 35],
        "Team_D": [15, 26, 37, 48, 35, 48, 35, 24, 36],
    }
)

points_df.index.names = ["index"]
points_df.columns.name = points_df.index.name
points_df.index.name = None

print("Before rolling().apply():\n\n")
print(points_df)


def calculate_median(n):
    return np.median(n)


points_df = points_df.rolling(2).apply(calculate_median)
print("\n\nBefore rolling().apply():\n\n")
print(points_df)

출력:

Before rolling().apply():


index Team_A Team_B Team_C Team_D
0          12      13      14      15
1          23      24      25      26
2          34      35      36      37
3          45      46      47      48
4          32      33      34      35
5          45      46      47      48
6          32      33      34      35
7          21      22      23      24
8          33      34      35      36


Before rolling().apply():


index Team_A Team_B Team_C Team_D
0         NaN     NaN     NaN     NaN
1        17.5    18.5    19.5    20.5
2        28.5    29.5    30.5    31.5
3        39.5    40.5    41.5    42.5
4        38.5    39.5    40.5    41.5
5        38.5    39.5    40.5    41.5
6        38.5    39.5    40.5    41.5
7        26.5    27.5    28.5    29.5
8        27.0    28.0    29.0    30.0

Lambda를 사용한 `rolling.apply`

다음 코드를 고려하십시오.

from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np


def test(df):
    return np.mean(df)


sc = StandardScaler()

tmp = pd.DataFrame(
    np.random.randn(2000, 2) / 10000,
    index=pd.date_range("2001-01-01", periods=2000),
    columns=["A", "B"],
)

print("Test 1: ")
print(tmp.rolling(window=5, center=False).apply(lambda x: test(x)))

print("SC_Fit: ")
print(
    tmp.rolling(window=5, center=False).apply(
        lambda x: (x[-1] - x.mean()) / x.std(ddof=1)
    )
)

출력:

Test 1:
                   A         B
2001-01-01       NaN       NaN
2001-01-02       NaN       NaN
2001-01-03       NaN       NaN
2001-01-04       NaN       NaN
2001-01-05 -0.000039  0.000053
...              ...       ...
2006-06-19  0.000022 -0.000021
2006-06-20  0.000005 -0.000027
2006-06-21  0.000024 -0.000060
2006-06-22  0.000023 -0.000038
2006-06-23  0.000014 -0.000017
[2000 rows x 2 columns]

SC_Fit:

                   A         B
2001-01-01       NaN       NaN
2001-01-02       NaN       NaN
2001-01-03       NaN       NaN
2001-01-04       NaN       NaN
2001-01-05 -0.201991  0.349646
...              ...       ...
2006-06-19  1.035835 -0.688231
2006-06-20 -0.595888  1.057016
2006-06-21 -0.640150 -1.399535
2006-06-22 -0.535689  1.244345
2006-06-23  0.510958  0.614429

[2000 rows x 2 columns]

lambda 함수의 x는 (롤링) 계열/ndarray를 나타내므로 함수는 다음과 같이 작성할 수 있습니다(여기서 x[-1]은 현재 롤링 데이터 포인트를 나타냄).

lambda x: (x[-1] - x.mean()) / x.std(ddof=1)

Pandas Series에서 `rolling().apply()` 사용

마찬가지로 Pandas 시리즈에 rolling().apply()를 사용할 수 있습니다. 다음 코드 펜스는 한 가지 차이점을 제외하고 Pandas 데이터 프레임에 대해 작성한 것과 동일합니다. 여기서 시리즈를 사용하고 있습니다.

전체 소스 코드는 아래에 제공되지만 시리즈에 대한 자세한 내용은 여기에서 읽을 수 있습니다.

예제 코드:

import pandas as pd
import numpy as np

points_series = pd.Series(
    [12, 23, 34, 45], index=["Team_A", "Team_B", "Team_C", "Team_D"]
)


print("Before rolling().apply():\n\n")
print(points_series)


def calculate_median(n):
    return np.median(n)


points_series = points_series.rolling(2).apply(calculate_median)
print("\n\nBefore rolling().apply():\n\n")
print(points_series)

출력:

Before rolling().apply():


Team_A    12
Team_B    23
Team_C    34
Team_D    45
dtype: int64


Before rolling().apply():


Team_A     NaN
Team_B    17.5
Team_C    28.5
Team_D    39.5
dtype: float64

튜토리얼이 마음에 드시나요? DelftStack을 구독하세요 YouTube에서 저희가 더 많은 고품질 비디오 가이드를 제작할 수 있도록 지원해주세요. 구독하다

작가: Mehvish Ashiq

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook

Pandas Dataframe 및 Series에서 rolling().apply() 사용

Pandas DataFrame에서 `rolling().apply()` 사용

라이브러리를 가져옵니다.

데이터 프레임을 만듭니다.

기본 열 인덱스에 대한 제목을 `인덱스`로 설정합니다.

`points_df` 데이터 프레임의 모든 제목을 정렬합니다.

`calculate_median()` 함수를 생성합니다.

`points_df` 데이터 프레임에서 `rolling().apply()`를 사용하십시오.

다음은 전체 소스 코드입니다.

Lambda를 사용한 `rolling.apply`

Pandas Series에서 `rolling().apply()` 사용

Pandas DataFrame에서 rolling().apply() 사용

라이브러리를 가져옵니다.

데이터 프레임을 만듭니다.

기본 열 인덱스에 대한 제목을 인덱스로 설정합니다.

points_df 데이터 프레임의 모든 제목을 정렬합니다.

calculate_median() 함수를 생성합니다.

points_df 데이터 프레임에서 rolling().apply()를 사용하십시오.

다음은 전체 소스 코드입니다.

Lambda를 사용한 rolling.apply

Pandas Series에서 rolling().apply() 사용

Pandas DataFrame에서 `rolling().apply()` 사용

기본 열 인덱스에 대한 제목을 `인덱스`로 설정합니다.

`points_df` 데이터 프레임의 모든 제목을 정렬합니다.

`calculate_median()` 함수를 생성합니다.

`points_df` 데이터 프레임에서 `rolling().apply()`를 사용하십시오.

Lambda를 사용한 `rolling.apply`

Pandas Series에서 `rolling().apply()` 사용