Pandas cut Function
-
pandas.cut()
Function Syntax -
Example: Distribute Column Values of a DataFrame Into Bins Using the
pandas.cut()
Method -
Example: Distribute Values Into Bins and Assign a Label to Each Bin Using the
pandas.cut()
Method -
Example: Set
retbins=True
inpandas.cut()
Method to Return the Bin Values
The pandas.cut()
function could distribute the given data into ranges, also called bins
.
We will use the below DataFrame in this article.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Anish", "Birat", "Chirag", "Kabin", "Sachin"],
"Age": [23, 34, 38, 45, 27],
"Score": [316, 322, 332, 330, 325],
}
)
print(df)
Output:
Name Age Score
0 Anish 23 316
1 Birat 34 322
2 Chirag 38 332
3 Kabin 45 330
4 Sachin 27 325
pandas.cut()
Function Syntax
pandas.cut(
x,
bins,
right=True,
labels=None,
retbins=False,
precision=3,
include_lowest=False,
duplicates="raise",
ordered=True,
)
Parameters
x |
The given array |
bins |
The criteria to bin the data |
right |
Boolean. If True , include the rightmost number of the bin also. |
labels |
Array. Labels for the bins. |
retbins |
Boolean. If True , return the bins. |
precision |
Integer. Precision for storing and displaying bins |
ordered |
Boolean. If True, the resulting labels will be ordered |
Return
It returns an array consisting of bin values for each element in the array x
. It also returns the bins if we have set retbins=True
.
Example: Distribute Column Values of a DataFrame Into Bins Using the pandas.cut()
Method
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Anish", "Birat", "Chirag", "Kabin", "Sachin"],
"Age": [23, 34, 38, 45, 27],
"Score": [316, 322, 332, 330, 325],
}
)
print("Initial DataFrame:")
print(df, "\n")
df["Age-Range"] = pd.cut(x=df["Age"], bins=[20, 30, 40, 50])
print("DataFrame with Age-Range:")
print(df)
Output:
Initial DataFrame:
Name Age Score
0 Anish 23 316
1 Birat 34 322
2 Chirag 38 332
3 Kabin 45 330
4 Sachin 27 325
DataFrame with Age-Range:
Name Age Score Age-Range
0 Anish 23 316 (20, 30]
1 Birat 34 322 (30, 40]
2 Chirag 38 332 (30, 40]
3 Kabin 45 330 (40, 50]
4 Sachin 27 325 (20, 30]
It separates the values of the Age
column in the DataFrame df
into the age ranges computed using the value of bins
argument in the pandas.cut()
method and finally displays DataFrame with Age-Range
value for each row.
Here, (20,30]
represents the values from 20 to 30, excluding 20 and including 30.
Example: Distribute Values Into Bins and Assign a Label to Each Bin Using the pandas.cut()
Method
By default, the label assigned to each bin will be the range of the bin. We can set the custom bin labels using the labels
parameter in the pandas.cut()
function.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Anish", "Birat", "Chirag", "Kabin", "Sachin"],
"Age": [23, 34, 38, 45, 27],
"Score": [316, 322, 332, 330, 325],
}
)
print("Initial DataFrame:")
print(df, "\n")
bin_labels = labels = ["21 to 30", "31 to 40", "41 to 50"]
df["Age-Range"] = pd.cut(x=df["Age"], bins=[20, 30, 40, 50], labels=bin_labels)
print("DataFrame with Age-Range:")
print(df)
Output:
Initial DataFrame:
Name Age Score
0 Anish 23 316
1 Birat 34 322
2 Chirag 38 332
3 Kabin 45 330
4 Sachin 27 325
DataFrame with Age-Range:
Name Age Score Age-Range
0 Anish 23 316 21 to 30
1 Birat 34 322 31 to 40
2 Chirag 38 332 31 to 40
3 Kabin 45 330 41 to 50
4 Sachin 27 325 21 to 30
It assigns each value of the Age
column into bins and adds a label to each unique bin.
Example: Set retbins=True
in pandas.cut()
Method to Return the Bin Values
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Anish", "Birat", "Chirag", "Kabin", "Sachin"],
"Age": [23, 34, 38, 45, 27],
"Score": [316, 322, 332, 330, 325],
}
)
print("Initial DataFrame:")
print(df, "\n")
bin_labels = labels = ["21 to 30", "31 to 40", "41 to 50"]
df["Age-Range"], bin_values = pd.cut(
x=df["Age"], bins=[20, 30, 40, 50], labels=bin_labels, retbins=True
)
print("DataFrame with Age-Range:")
print(df, "\n")
print("The bin values are:")
print(bin_values)
Output:
Initial DataFrame:
Name Age Score
0 Anish 23 316
1 Birat 34 322
2 Chirag 38 332
3 Kabin 45 330
4 Sachin 27 325
DataFrame with Age-Range:
Name Age Score Age-Range
0 Anish 23 316 21 to 30
1 Birat 34 322 31 to 40
2 Chirag 38 332 31 to 40
3 Kabin 45 330 41 to 50
4 Sachin 27 325 21 to 30
The bin values are:
[20 30 40 50]
It displays the DataFrame with Age-Range
values along with the bin values.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn