Pandas DataFrame DataFrame.groupby() Function
-
Syntax of
pandas.DataFrame.groupby()
: -
Example Codes: Group Two DataFrames With
pandas.DataFrame.groupby()
Based on Values of Single Column -
Example Codes: Group Two DataFrames With
pandas.DataFrame.groupby()
Based on Multiple Conditions -
Example Codes: Set
as_index=False
inpandas.DataFrame.groupby()
pandas.DataFrame.groupby()
splits the DataFrame into groups based on the given criteria. We can easily manipulate large datasets using the groupby()
method.
Syntax of pandas.DataFrame.groupby()
:
DataFrame.groupby(
by=None,
axis=0,
level=None,
as_index=True,
sort=True,
group_keys=True,
squeeze: bool=False,
observed: bool=False)
Parameters
by |
mapping, function, string, label or iterable to group elements |
axis |
group by along with the row (axis=0) or column (axis=1) |
level |
Integer. value to group by a particular level or levels |
as_index |
Boolean. It returns an object with group labels as the index |
sort |
Boolean. It sorts the group keys |
group_keys |
Boolean. It adds group keys to index to identify pieces |
squeeze |
Boolean. It decreases the dimension of the return when possible |
observed |
Boolean. Only apply if any of the groupers are Categorical and only show observed values for categorical groupers if set to True . |
Return
It returns a DataFrameGroupBy
object containing the groupped information.
Example Codes: Group Two DataFrames With pandas.DataFrame.groupby()
Based on Values of Single Column
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df)
print(type(grouped_df))
Output:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f73cc992d30>
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
It groups the DataFrame
into groups based on the values in the In_Stock
column and returns a DataFrameGroupBy
object.
To get details about the DataFrameGroupBy
object returned by groupby()
, we can use the first()
method of DataFrameGroupBy
object to get the first element of each group.
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.first())
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
It prints the DataFrame formed by the first elements of both groups split from df
.
We can also print the entire group using get_group()
method.
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ) ,
('Mango', 24, 'No' ) ,
('banana', 14, 'No' ) ,
('Apple', 44, 'Yes' ) ,
('Pineapple', 64, 'No') ,
('Kiwi', 84, 'Yes') ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock'])
grouped_df = df.groupby('In_Stock')
print(grouped_df.get_group('Yes'))
Output:
Name Price In_Stock
0 Orange 34 Yes
3 Apple 44 Yes
5 Kiwi 84 Yes
It prints all the elements in df
whose value in the In_Stock
column is Yes
. We firstly group elements with different values of the In_Stock
column into separate groups by using groubpy()
method and then access a particular group using get_group()
method.
Example Codes: Group Two DataFrames With pandas.DataFrame.groupby()
Based on Multiple Conditions
import pandas as pd
fruit_list = [ ('Orange', 34, 'Yes' ,'ABC') ,
('Mango', 24, 'No','ABC' ) ,
('banana', 14, 'No','ABC' ) ,
('Apple', 44, 'Yes',"XYZ" ) ,
('Pineapple', 64, 'No',"XYZ") ,
('Kiwi', 84, 'Yes',"XYZ") ]
df = pd.DataFrame(fruit_list, columns = ['Name' , 'Price', 'In_Stock',"Supplier"])
grouped_df = df.groupby(['In_Stock', 'Supplier'])
print(grouped_df.first())
Output:
Name Price
In_Stock Supplier
No ABC Mango 24
XYZ Pineapple 64
Yes ABC Orange 34
XYZ Apple 44
It groups the df
into groups based on their values in the In_Stock
and Supplier
columns and returns a DataFrameGroupBy
object.
We use the first()
method to get the first element of each group. It returns a DataFrame formed by the combination of the first elements of the following four groups:
- Group with values of
In_Stock
columnNo
andSupplier
columnABC
. - Group with values of
In_Stock
columnNo
andSupplier
columnXYZ
. - Group with values of
In_Stock
columnYes
andSupplier
columnABC
. - Group with values of
In_Stock
columnYes
andSupplier
columnXYZ
.
The DataFrame
returned by the methods of GroupBy
object has a MultiIndex
, when we pass multiple labels to groupby()
function.
print(grouped_df.first().index)
Output:
MultiIndex(levels=[['No', 'Yes'], ['ABC', 'XYZ']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['In_Stock', 'Supplier'])
Example Codes: Set as_index=False
in pandas.DataFrame.groupby()
as_index
parameter in DataFrame.groupby()
method is True
by default. The group label is the index of the returned DataFrame
when applying GroupBy
methods like first()
.
import pandas as pd
fruit_list = [
("Orange", 34, "Yes"),
("Mango", 24, "No"),
("banana", 14, "No"),
("Apple", 44, "Yes"),
("Pineapple", 64, "No"),
("Kiwi", 84, "Yes"),
]
df = pd.DataFrame(fruit_list, columns=["Name", "Price", "In_Stock"])
grouped_df = df.groupby("In_Stock", as_index=True)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
print("---------")
grouped_df = df.groupby("In_Stock", as_index=False)
firtGroup = grouped_df.first()
print(firtGroup)
print(firtGroup.index)
Output:
Name Price
In_Stock
No Mango 24
Yes Orange 34
Index(['No', 'Yes'], dtype='object', name='In_Stock')
---------
In_Stock Name Price
0 No Mango 24
1 Yes Orange 34
Int64Index([0, 1], dtype='int64')
As you could see, the index of the generated DataFrame
is the group labels because of as_index=True
by default.
The index becomes automatically generated index in numbers when we set as_index=False
.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn