Pandas groupby() and diff()
The Pandas library is a complete tool for handling text data in addition to numbers. You’ll want to exclude text input from many data analysis applications and machine learning exploration/pre-processing, or you’ll want to extract information from it.
To do this, you can add, remove, and change text columns in your DataFrames using various in-built techniques provided by Pandas. This article will briefly discuss how to group data and find the differences between the grouped values.
Data Grouping in Python
Data analysis frequently calls for grouping records by one or more columns. Examples of such scenarios include:
- Counting the number of employees in each business department.
- Figuring out the average salaries of men and women in each department.
- Figuring out the average salaries of employees of various ages.
Pandas offer a groupby()
function that makes it easy to handle most grouping chores. However, there are some jobs that the position needs help to complete; let’s attempt to offer other ways.
One of the most significant Pandas functions is groupby()
. Records are grouped and summarized using the split in this method and use the combined strategy.
Use groupby()
With diff()
in Pandas
The example below created a Dataframe with ID_Number
, Stu_Names
, and Marks
of different students. After that, we made a new column called Marks_diff
that contains the difference in marks between consecutive dates, which ID_Number
groups.
We have used fillna(0)
here because when the group variable’s value changes across adjacent rows in the DataFrame, fillna(0)
instructs Pandas to insert a zero.
The difference between the marks of Harry and Petter is 6.0, and the difference between Daniel and Ron is 10, as shown in the output.
Example code:
import pandas as pd
d1 = pd.DataFrame(
{
"ID_Number": ["ID1", "ID1", "ID2", "ID2"],
"Stu_Names": ["Harry", "Petter", "Daniel", "Ron"],
"Marks": [72, 78, 80, 90],
}
)
print(d1)
d1 = d1.sort_values(by=["ID_Number"])
d1["Marks_diff"] = d1.groupby(["ID_Number"])["Marks"].diff().fillna(0)
print(d1)
Output:
ID_Number Stu_Names Marks
0 ID1 Harry 72
1 ID1 Petter 78
2 ID2 Daniel 80
3 ID2 Ron 90
ID_Number Stu_Names Marks Marks_diff
0 ID1 Harry 72 0.0
1 ID1 Petter 78 6.0
2 ID2 Daniel 80 0.0
3 ID2 Ron 90 10.0
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn