How to Group by and Sort in Pandas
- Group by and Sort DataFrame in Pandas
-
Use the
groupby
Function to Group by and Sort DataFrame in Pandas
This tutorial explores the concept of grouping data of a data frame and sorting it in Pandas.
Group by and Sort DataFrame in Pandas
As we have learned, Pandas is an advanced data analysis tool or a package extension in Python. Most companies and organizations that use Python and require high-quality data analysis use this tool on a large scale.
This tutorial lets us understand how and why to group and sort certain data from a data frame in Pandas. Most businesses and organizations that use Python and Pandas for data analysis need to gather insights from their data to better plan their businesses.
Pandas help analysts with the groupby
function to gather such insights. Consider, for example, a product-based company.
This company might need to group certain products and sort them in their sales order. Thus, grouping and sorting have many advantages in data analysis and interpretation.
Before we begin, we create a dummy data frame to work with. Here we create one data frame, namely df
.
We add a few columns and certain data within this df
data frame. We can do this operation using the following code.
import pandas as pd
df = pd.DataFrame({"dat1": [9, 5]})
df = pd.DataFrame(
{
"name": ["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"],
"count_1": [5, 10, 12, 15, 20, 25, 30, 35],
"count_2": [100, 150, 100, 25, 250, 300, 400, 500],
}
)
The above code creates a data frame along with a few entries. To view the entries in the data, we use the following code.
print(df)
The above code gives the following output.
name count_1 count_2
0 Foo 5 100
1 Foo 10 150
2 Baar 12 100
3 Foo 15 25
4 Baar 20 250
5 Foo 25 300
6 Baar 30 400
7 Baar 35 500
As we can see, we have four columns and 8 rows indexed from value 0 to value 7. If we look into our data frame, we see certain names repeated, named df
.
Since we have our data frame set up, let us group data within this data frame and then sort the values within those groupings.
Use the groupby
Function to Group by and Sort DataFrame in Pandas
Let us group this data as we have set it up in place. We can group this data such that we have the names of similar products under the column name
grouped up with each other to perform better data analysis.
We can do this operation in Pandas using the groupby
function. This function ensures that the products or the values under the specified columns are brought together or grouped.
We can perform any extra operations on this grouped data. This grouping operation can be performed in Pandas, as illustrated below.
df.groupby(["name"])
As we can see, we use the groupby
function on our data frame named df
with the column name
passed as an argument.
Now let us sort our data with this groupby
function such that we have not only the groupings but also the data sorted in a particular format.
We want to sort the data to have the three biggest values in our grouping after performing the groupby
operation.
It means that we wish to fetch the three largest values after sorting the grouped data frame from our df
. We can perform this operation using the following code.
print(df.groupby(["name"])["count_1"].nlargest(3))
The code fetches the following results.
name
Baar 7 35
6 30
4 20
Foo 5 25
3 15
1 10
Name: count_1, dtype: int64
As we can see, we have our groupings sorted in such a fashion that we have only the top three names with the highest counts as indicated within the count_1
column.
Thus, for the name Baar
, we can see that we have three entries for the count listed as 35
, 30
, and 20
and two entries for Foo
with counts listed as 25
, 15
, and 10
.
In Pandas, we can also visualize the data type and the column’s name associated with that data type that has been grouped. In our case, we have the grouped column named count_1
with the data type int64
listed in our output at the bottom.
Thus, using the groupby
function and the nlargest()
function, we have grouped columns, sorted, and fetched certain records in our data frame.