How to Get Dummies in Pandas
-
pandas.get_dummies()
Method -
Create DataFrame With Dummy Variable Columns Using
pandas.get_dummies()
Method -
Set
columns
to Create Dummy Variables for Specified Columns Only -
Set
prefix
to Change the Default Name of Dummy Columns
This tutorial explains how we can generate DataFrame with dummy or indicator variables from DataFrame with categorical columns.
pandas.get_dummies()
Method
pandas.get_dummies(
data,
prefix=None,
prefix_sep="_",
dummy_na=False,
columns=None,
sparse=False,
drop_first=False,
dtype=None,
)
Create DataFrame With Dummy Variable Columns Using pandas.get_dummies()
Method
import pandas as pd
students_df = pd.DataFrame(
{
"Id": [302, 504, 708, 103, 303],
"Name": ["Mike", "Christine", "Rob", "Daniel", "Jennifer"],
"Sex": ["Male", "Female", "Male", "Male", "Female"],
}
)
students_df_dummies = pd.get_dummies(students_df)
print("The original DataFrame is:")
print(students_df, "\n")
print("DataFrame with Dummies:")
print(students_df_dummies)
Output:
The original DataFrame is:
Id Name Sex
0 302 Mike Male
1 504 Christine Female
2 708 Rob Male
3 103 Daniel Male
4 303 Jennifer Female
DataFrame with Dummies:
Id Name_Christine Name_Daniel Name_Jennifer Name_Mike Name_Rob Sex_Female Sex_Male
0 302 0 0 0 1 0 0 1
1 504 1 0 0 0 0 1 0
2 708 0 0 0 0 1 0 1
3 103 0 1 0 0 0 0 1
4 303 0 0 1 0 0 1 0
It generates a DataFrame with dummy column names formed by concatenating the original column name and each unique value for the column.
For the Name
column, we have five unique values, and hence the Name
splits to Name_
plus each unique name in the DataFrame. The dummy columns’ values will be 1 or 0 based on the value in the initial DataFrame.
The row with value of Name
column Daniel
in the students_df
DataFrame will have value 1 for the Name_Daniel
column in the students_df_dummies
DataFrame while all other name values will have value 0 for the Name_Daniel
column in the students_df_dummies
DataFrame.
Set columns
to Create Dummy Variables for Specified Columns Only
By default, the get_dummies()
method will create DataFrame with dummy columns for each column with dtypes object
or category
. We can set pass the list of the columns as columns
argument to specify particular columns.
import pandas as pd
students_df = pd.DataFrame(
{
"Id": [302, 504, 708, 103, 303],
"Name": ["Mike", "Christine", "Rob", "Daniel", "Jennifer"],
"Sex": ["Male", "Female", "Male", "Male", "Female"],
}
)
students_df_dummies = pd.get_dummies(students_df, columns=["Sex"])
print("The original DataFrame is:")
print(students_df, "\n")
print("DataFrame with Dummies:")
print(students_df_dummies)
Output:
The original DataFrame is:
Id Name Sex
0 302 Mike Male
1 504 Christine Female
2 708 Rob Male
3 103 Daniel Male
4 303 Jennifer Female
DataFrame with Dummies:
Id Name Sex_Female Sex_Male
0 302 Mike 0 1
1 504 Christine 1 0
2 708 Rob 0 1
3 103 Daniel 0 1
4 303 Jennifer 1 0
It creates dummy variables for the Sex
column only.
Set prefix
to Change the Default Name of Dummy Columns
import pandas as pd
students_df = pd.DataFrame(
{
"Id": [302, 504, 708, 103, 303],
"Name": ["Mike", "Christine", "Rob", "Daniel", "Jennifer"],
"Sex": ["Male", "Female", "Male", "Male", "Female"],
}
)
students_df_dummies = pd.get_dummies(students_df, columns=["Sex"], prefix="Column")
print("The original DataFrame is:")
print(students_df, "\n")
print("DataFrame with Dummies:")
print(students_df_dummies)
Output:
The original DataFrame is:
Id Name Sex
0 302 Mike Male
1 504 Christine Female
2 708 Rob Male
3 103 Daniel Male
4 303 Jennifer Female
DataFrame with Dummies:
Id Name Column_Female Column_Male
0 302 Mike 0 1
1 504 Christine 1 0
2 708 Rob 0 1
3 103 Daniel 0 1
4 303 Jennifer 1 0
It sets the prefix for the dummy columns generated from the Sex
column to Column
. Now the dummy column names become Column_Female
and Column_Male
.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn