How to Create DataFrame Column Based on Given Condition in Pandas
-
List Comprehension to Create New
DataFrame
Columns Based on a Given Condition in Pandas -
NumPy Methods to Create New
DataFrame
Columns Based on a Given Condition in Pandas -
pandas.DataFrame.apply
to Create NewDataFrame
Columns Based on a Given Condition in Pandas -
pandas.Series.map()
to Create NewDataFrame
Columns Based on a Given Condition in Pandas
We can create the DataFrame
columns based on a given condition in Pandas using list comprehension, NumPy methods, apply()
method, and map()
method of the DataFrame object.
List Comprehension to Create New DataFrame
Columns Based on a Given Condition in Pandas
We can utilize various list Comprehension to create new DataFrame
columns based on a given condition in Pandas. List comprehension is a method to create new lists from iterables. It is quite faster and simpler than other methods.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = ["Senior" if s >= 400 else "Junior" for s in df["Salary"]]
print(df)
Output:
Name Joined date Salary Status
0 Hisila 2019-11-20 200 Junior
1 Shristi 2020-01-02 400 Senior
2 Zeppy 2020-02-05 300 Junior
3 Alina 2020-03-10 500 Senior
4 Jerry 2020-04-16 600 Senior
5 Kevin 2020-05-01 300 Junior
It creates a new column Status
in df
whose value is Senior
if the salary is greater than or equal to 400, or Junior
otherwise.
NumPy Methods to Create New DataFrame
Columns Based on a Given Condition in Pandas
We also can use NumPy methods to create a DataFrame
column based on given conditions in Pandas. We can utilize np.where()
method and np.select()
method for this purpose.
np.where()
Method
np.where()
takes the condition as an input and returns the indices of elements that satisfy the given condition. We can use this method to create a DataFrame column based on given conditions in Pandas when we have only one condition.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df["Status"] = np.where(df["Salary"] >= 400, "Senior", "Junior")
print(df)
Output:
Name Joined date Salary Status
0 Hisila 2019-11-20 200 Junior
1 Shristi 2020-01-02 400 Senior
2 Zeppy 2020-02-05 300 Junior
3 Alina 2020-03-10 500 Senior
4 Jerry 2020-04-16 600 Senior
5 Kevin 2020-05-01 300 Junior
np.where(condition, x, y)
returns x
if the condition is met, otherwise y
.
The above code creates a new column Status
in df
whose value is Senior
if the given condition is satisfied; otherwise, the value is set to Junior
.
np.select()
Method
np.where()
takes condition-list and choice-list as an input and returns an array built from elements in choice-list, depending on conditions. We can use this method to create a DataFrame column based on given conditions in Pandas when we have two or more conditions.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
conditionlist = [
(df["Salary"] >= 500),
(df["Salary"] >= 300) & (df["Salary"] < 300),
(df["Salary"] <= 300),
]
choicelist = ["High", "Mid", "Low"]
df["Salary_Range"] = np.select(conditionlist, choicelist, default="Not Specified")
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 black
2 Zeppy 2020-02-05 300 Low
3 Alina 2020-03-10 500 High
4 Jerry 2020-04-16 600 High
5 Kevin 2020-05-01 300 Low
Here, if the 1st condition in the conditionlist
is satisfied for a row, the value of column Salary_Range
for that specific row is set to the 1st element in the choicelist
. Other conditions in the conditionlist
are similar. If none of the conditions in the conditionlist
is satisfied, the value of column Salary_Range
for that row is set to the value of the default
parameter in the np.where()
method, for example, Not Specified
.
pandas.DataFrame.apply
to Create New DataFrame
Columns Based on a Given Condition in Pandas
pandas.DataFrame.apply
returns a DataFrame
as a result of applying the given function along the given axis of the DataFrame.
Syntax:
DataFrame.apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds)
func
represents the function to be applied.
axis
represents the axis along which the function is applied. We can use axis=1
or axis = 'columns'
to apply function to each row.
We can use this method to check the condition and set values for every row of a new column.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
def set_values(row, value):
return value[row]
map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}
df["Salary_Range"] = df["Salary"].apply(set_values, args=(map_dictionary,))
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 MID
2 Zeppy 2020-02-05 300 LOW
3 Alina 2020-03-10 500 HIGH
4 Jerry 2020-04-16 600 HIGH
5 Kevin 2020-05-01 300 LOW
Here, we define a function set_values()
which is applied to every row using df.apply()
. The function sets the value of each row of the Salary_Range
column depending upon the value of the Salary
column of that row. We make map_dictionary
to assign what will be the value of the Salary_Range
column for a row given its value in the Salary
column. This method provides us much more flexibility when we have a large number of options for the new column.
pandas.Series.map()
to Create New DataFrame
Columns Based on a Given Condition in Pandas
We could also use pandas.Series.map()
to create new DataFrame
columns based on a given condition in Pandas. This method is applied elementwise for Series
and maps values from one column to the other based on the input that could be a dictionary, function, or Series
.
import pandas as pd
import numpy as np
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
map_dictionary = {200: "Low", 300: "LOW", 400: "MID", 500: "HIGH", 600: "HIGH"}
df["Salary_Range"] = df["Salary"].map(map_dictionary)
print(df)
Output:
Name Joined date Salary Salary_Range
0 Hisila 2019-11-20 200 Low
1 Shristi 2020-01-02 400 MID
2 Zeppy 2020-02-05 300 LOW
3 Alina 2020-03-10 500 HIGH
4 Jerry 2020-04-16 600 HIGH
5 Kevin 2020-05-01 300 LOW
It creates a new column Salary_Range
and sets the values of each row of the column depending on the key-value pairs in map_dictionary
.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedInRelated Article - Pandas DataFrame Column
- How to Get Pandas DataFrame Column Headers as a List
- How to Delete Pandas DataFrame Column
- How to Convert Pandas Column to Datetime
- How to Get the Sum of Pandas Column
- How to Change the Order of Pandas DataFrame Columns
- How to Convert DataFrame Column to String in Pandas