How to Groupby Index Columns in Pandas
-
Use the
groupby()Function to Group by Index Columns in Python Pandas -
Use the
groupby()Function on a CSV File Data in Python Pandas
This tutorial introduces how groupby in Python Pandas categorizes data and applies a function to the categories. Use the groupby() function to group multiple index columns in Pandas with examples.
Use the groupby() Function to Group by Index Columns in Python Pandas
In this post, Pandas DataFrame data.groupby() function divides data into groups based on specific criteria. Pandas objects can be divided into any number of groups along any axis.
A label-to-group-name mapping is the abstract definition of grouping. A groupby operation splits an object, applies a function, and combines the results.
This is useful for grouping large amounts of data and performing operations. Pandas groupby default behavior converts the groupby columns into indexes and removes them from the DataFrame’s list of columns.
Syntax:
DataFrame.groupby(
by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True
)
Parameters:
by |
a mapping, a function, a string, or an iterable |
axis |
int, default value 0 |
level |
For the Multi-Index axis, group by a specific level or levels (hierarchical). |
as_index |
As the index for aggregated output, return an object with group labels. This is only applicable to DataFrame input. When index=False, the output is grouped in “SQL-style”. |
sort |
The keys in a group should be sorted. Turn this off to improve performance. It should be noted that the order of observations within each group is unaffected by this. Groupby maintains the order of rows within each group. |
group_keys |
Add group keys to index when calling apply to identify pieces squeeze: If possible, reduce the dimensionality of the return type; otherwise, return a consistent type. |
Take a DataFrame with two columns: date and item sell. Groupby both date and item sell and get the user’s item-by count.
First, we need to import necessary libraries, pandas and numpy, create three columns, ct, date, and item_sell and pass a set of values to the columns.
import pandas as pd
import numpy as np
data = pd.DataFrame()
data["date"] = ["a", "a", "a", "b"]
data["item_sell"] = ["z", "z", "a", "a"]
data["ct"] = 1
print(data)
Output:
date item_sell ct
0 a z 1
1 a z 1
2 a a 1
3 b a 1
Group by using date and item_sell columns.
import pandas as pd
import numpy as np
data = pd.DataFrame()
data["date"] = ["a", "a", "a", "b"]
data["item_sell"] = ["z", "z", "a", "a"]
data["ct"] = 1
output = pd.pivot_table(data, values="ct", index=["date", "item_sell"], aggfunc=np.sum)
print(output)
Output:
ct
date item_sell
a a 1
z 2
b a 1
The groupby() by parameter can now refer to either column names or index level names.
import pandas as pd
import numpy as np
arrays = [
["rar", "raz", "bal", "bac", "foa", "foa", "qus", "qus"],
["six", "seven", "six", "seven", "six", "seven", "six", "seven"],
]
index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
data = pd.DataFrame({"C": [1, 1, 1, 1, 2, 2, 3, 3], "D": np.arange(8)}, index=index)
print(data)
Output:
C D
first second
rar six 1 0
raz seven 1 1
bal six 1 2
bac seven 1 3
foa six 2 4
seven 2 5
qus six 3 6
seven 3 7
Group by second and C and then calculate the sum using the sum function.
import pandas as pd
import numpy as np
arrays = [
["rar", "raz", "bal", "bac", "foa", "foa", "qus", "qus"],
["six", "seven", "six", "seven", "six", "seven", "six", "seven"],
]
index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
data = pd.DataFrame({"C": [1, 1, 1, 1, 2, 2, 3, 3], "D": np.arange(8)}, index=index)
output = data.groupby(["second", "C"]).sum()
print(output)
Output:

Use the groupby() Function on a CSV File Data in Python Pandas
Now use a groupby() function on a CSV file. To download the CSV file used in the code, click [here](Students Performance in Exams | Kaggle).
The CSV file used is about student performance. To group the data based on gender, use the groupby() function.
The read_csv function of the Python Pandas library is used to read the file from your drive. Store the file in a data variable.
import pandas as pd
data = pd.read_csv("/content/drive/MyDrive/CSV/StudentsPerformance.csv")
print(data)
Output:

Apply groupby() function.
import pandas as pd
data = pd.read_csv("StudentsPerformance.csv")
std = data.groupby("gender")
print(std.first())

Let us print the value in any of the groups. For this, use the team’s name.
The function get_group is used to find the entries in any group. Find the value contained in the female group.
import pandas as pd
data = pd.read_csv("StudentsPerformance.csv")
std = data.groupby("gender")
print(std.get_group("female"))
Output:

Use the groupby() function to create more than one category group. To split, use more than one column.
import pandas as pd
data = pd.read_csv("StudentsPerformance.csv")
std_per = data.groupby(["gender", "lunch"])
print(std_per.first())
Output:

Groupby() is a versatile function with numerous variants. It makes splitting the DataFrame based on certain criteria very simple and efficient.