How to GroupBy Month in Pandas
This tutorial uses Pandas to arrange data frames by date, specifically by month. Let’s start by importing the required libraries.
Group Data Frame By Month in Pandas
Import pertinent libraries:
import pandas as pd
We need to create a data frame containing dates to arrange them in the month’s order. In our case, we will take three dates to work on.
We will create the sample data frame using the code below.
df = pd.DataFrame(
{
"Date": [
pd.Timestamp("2000-11-02"),
pd.Timestamp("2000-01-02"),
pd.Timestamp("2000-01-09"),
],
"ID": [1, 2, 3],
"Price": [140, 120, 230],
}
)
Let us look at our sample data frame containing dates.
print(df)
Date ID Price
0 2000-11-02 1 140
1 2000-01-02 2 120
2 2000-01-09 3 230
After creating our data frame, let us work on arranging them in order of the month. We will use the groupby()
function to work on the entire data frame.
Use the groupby()
Function in Pandas
We can specify a groupby
directive for an object using Pandas GroupBy
. This stated instruction will choose a column using the grouper
function’s key
argument, the level and/or axis
parameters if provided, and the target object’s or column’s index level.
Using the code below, let us perform the groupby
operation on our sample data frame.
df1 = df.groupby(pd.Grouper(key="Date", axis=0, freq="M")).sum()
Now that we have grouped our data frame let us look at the updated data frame.
print(df1)
ID Price
Date
2000-01-31 5 350
2000-02-29 0 0
2000-03-31 0 0
2000-04-30 0 0
2000-05-31 0 0
2000-06-30 0 0
2000-07-31 0 0
2000-08-31 0 0
2000-09-30 0 0
2000-10-31 0 0
2000-11-30 1 140
The Date
column groups the data frame in the example above. Because we specified freq = 'M'
, which stands for month, the data is grouped by month until the last date of each month, and the sum of the price
column is presented.
Because we didn’t supply a value for all of the months, the groupby
method displayed data for all of them while assigning a value of 0 to the others.
Therefore we have successfully grouped our data frame by month in Pandas using the above approach.