How to Filter DataFrame Rows Based on the Date in Pandas
- Select Rows Between Two Dates With Boolean Mask
-
pandas.DataFrame.query()
to SelectDataFrame
Rows Between Two Dates -
pandas.DataFrame.isin()
to SelectDataFrame
Rows Between Two Dates -
pandas.Series.between()
to SelectDataFrame
Rows Between Two Dates
We can filter DataFrame
rows based on the date in Pandas using the boolean mask with the loc
method and DataFrame indexing. We could also use query
, isin
, and between
methods for DataFrame
objects to select rows based on the date in Pandas.
Select Rows Between Two Dates With Boolean Mask
To filter DataFrame rows based on the date in Pandas using the boolean mask, we at first create boolean mask using the syntax:
mask = (df["col"] > start_date) & (df["col"] <= end_date)
Where start_date
and end_date
are both in datetime
format, and they represent the start and end of the range from which data has to be filtered. Then we select the part of DataFrame that lies within the range using the df.loc()
method.
import pandas as pd
import numpy as np
import datetime
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
df = pd.DataFrame({"Joined date": pd.to_datetime(list_of_dates)}, index=employees)
mask = (df["Joined date"] > "2019-06-1") & (df["Joined date"] <= "2020-02-05")
filtered_df = df.loc[mask]
print(filtered_df)
Output:
Joined date
Hisila 2019-11-20
Shristi 2020-01-02
Zeppy 2020-02-05
We can simplify the above process using the integrated df.loc[start_date:end_date]
method by setting the date column as an index column.
import pandas as pd
import numpy as np
import datetime
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined date": pd.to_datetime(list_of_dates), "Salary": salary}
)
df = df.set_index(["Joined date"])
filtered_df = df.loc["2019-06-1":"2020-02-05"]
print(filtered_df)
Output:
Name Salary
Joined date
2019-11-20 Hisila 200
2020-01-02 Shristi 400
2020-02-05 Zeppy 300
pandas.DataFrame.query()
to Select DataFrame
Rows Between Two Dates
We can also filter DataFrame rows based on the date in Pandas using the pandas.DataFrame.query()
method. The method returns a DataFrame resulting from the provided query expression.
import pandas as pd
import numpy as np
import datetime
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined_date": pd.to_datetime(list_of_dates), "Salary": salary}
)
filtered_df = df.query("Joined_date >= '2019-06-1' and Joined_date <='2020-02-05'")
print(filtered_df)
Output:
Name Joined_date Salary
0 Hisila 2019-11-20 200
1 Shristi 2020-01-02 400
2 Zeppy 2020-02-05 300
pandas.DataFrame.isin()
to Select DataFrame
Rows Between Two Dates
pandas.DataFrame.isin()
method returns the Dataframe of booleans which represent whether the element lies in the specified range or not. We can use this method to filter DataFrame rows based on the date in Pandas.
import pandas as pd
import numpy as np
import datetime
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined_date": pd.to_datetime(list_of_dates), "Salary": salary}
)
filtered_df = df[df["Joined_date"].isin(pd.date_range("2019-06-1", "2020-02-05"))]
print(filtered_df)
Output:
Name Joined_date Salary
0 Hisila 2019-11-20 200
1 Shristi 2020-01-02 400
2 Zeppy 2020-02-05 300
pandas.date_range()
returns a fixed DateTimeIndex
. Its first parameter is the starting date, and the second parameter is the ending date.
pandas.Series.between()
to Select DataFrame
Rows Between Two Dates
We can also use pandas.Series.between()
to filter DataFrame based on date.The method returns a boolean vector representing whether series element lies in the specified range or not. We pass thus obtained the boolean vector to loc()
method to extract DataFrame.
import pandas as pd
import numpy as np
import datetime
list_of_dates = [
"2019-11-20",
"2020-01-02",
"2020-02-05",
"2020-03-10",
"2020-04-16",
"2020-05-01",
]
employees = ["Hisila", "Shristi", "Zeppy", "Alina", "Jerry", "Kevin"]
salary = [200, 400, 300, 500, 600, 300]
df = pd.DataFrame(
{"Name": employees, "Joined_date": pd.to_datetime(list_of_dates), "Salary": salary}
)
filtered_df = df.loc[df["Joined_date"].between("2019-06-1", "2020-02-05")]
print(filtered_df)
Output:
Name Joined_date Salary
0 Hisila 2019-11-20 200
1 Shristi 2020-01-02 400
2 Zeppy 2020-02-05 300
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn