How to Get Index of Rows Whose Column Matches Specific Value in Pandas
- Get Indices of Rows Containing Integers/Floats in Pandas
- Get Indices of Rows Containing Strings in Pandas
This article demonstrates how to to get the index of rows that matches certain criteria in Pandas.
The necessity to find the indices of the rows is important in feature engineering. These skills can be useful to remove the outliers or abnormal values in a Dataframe. The indices, also known as the row labels, can be found in Pandas using several functions. In the following examples, we will be working on the dataframe created using the following snippet.
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df)
Output:
A B C D
0 13 16 1 4
1 4 8 10 19
2 5 7 13 2
3 7 8 15 18
4 6 14 9 10
5 17 6 16 16
6 1 19 4 18
7 15 8 1 2
8 10 1 11 4
9 12 19 3 1
10 1 5 6 7
11 9 18 16 5
12 10 11 2 2
13 8 10 4 7
14 12 15 19 1
15 15 4 13 11
16 12 5 7 5
17 16 4 13 5
18 9 15 16 4
19 16 14 17 18
Get Indices of Rows Containing Integers/Floats in Pandas
The pandas.DataFrame.loc
function can access rows and columns by its labels/names. It is straight forward in returning the rows matching the given boolean condition passed as a label. Notice the square brackets next to df.loc
in the snippet.
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.loc[df["B"] == 19])
The rows corresponding to the boolean condition is returned as an output in the format of a Dataframe.
Output:
A B C D
6 1 19 4 18
9 12 19 3 1
Multiple conditions can be chained and applied together to the function, as shown below. This helps in isolating the rows based on specific conditions.
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.loc[(df["B"] == 19) | (df["C"] == 19)])
Output:
A B C D
6 1 19 4 18
9 12 19 3 1
14 12 15 19 1
Get Index of Rows With pandas.DataFrame.index()
If you would like to find just the matched indices of the dataframe that satisfies the boolean condition passed as an argument, pandas.DataFrame.index()
is the easiest way to achieve it.
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.index[df["B"] == 19].tolist())
In the above snippet, the rows of column A
matching the boolean condition == 1
is returned as output as shown below.
Output:
[6, 9]
The reason why we put tolist()
behind the index()
method is to convert the Index
to the list; otherwise, the result is of Int64Index
data type.
Int64Index([6, 9], dtype='int64'
Retrieving just the indices can be done based on multiple conditions too. The snippet can be written as follows:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 20, size=(20, 4)), columns=list("ABCD"))
print(df.index[(df["B"] == 19) | (df["C"] == 19)].tolist())
Output:
[6, 9, 14]
Get Indices of Rows Containing Strings in Pandas
The string values can be matched based on two methods. Both the methods shown in the previous section will work, except for the condition change.
In the following examples, we will use the following snippet.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["blue", "delta", "echo", "charlie", "alpha"],
"Type": ["Raptors", "Raptors", "Raptors", "Raptors", "Tyrannosaurus rex"],
}
)
print(df)
Output:
Name Type
0 blue Raptors
1 delta Raptors
2 echo Raptors
3 charlie Raptors
4 alpha Tyrannosaurus rex
Get Index of Rows With the Exact String Match
The equality condition used in the previous section can be used in finding the exact string match in the Dataframe. We will look for the two strings.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["blue", "delta", "echo", "charlie", "alpha"],
"Type": ["Raptors", "Raptors", "Raptors", "Raptors", "Tyrannosaurus rex"],
}
)
print(df.index[(df["Name"] == "blue")].tolist())
print("\n")
print(df.loc[df["Name"] == "blue"])
print("\n")
print(df.loc[(df["Name"] == "charlie") & (df["Type"] == "Raptors")])
Output:
[0]
Name Type
0 blue Raptors
Name Type
3 charlie Raptors
As seen above, both the index and the rows matching the condition can be received.
Get Index of Rows With the Partial String Match
The string values can be partially matched by chaining the dataframe to the str.contains
function. In the following example, we will be looking for the string ha
in charlie and alpha.
import pandas as pd
df = pd.DataFrame(
{
"Name": ["blue", "delta", "echo", "charlie", "alpha"],
"Type": ["Raptors", "Raptors", "Raptors", "Raptors", "Tyrannosaurus rex"],
}
)
print(df.index[df["Name"].str.contains("ha")].tolist())
print("\n")
print(df.loc[df["Name"].str.contains("ha")])
print("\n")
print(df.loc[(df["Name"].str.contains("ha")) & (df["Type"].str.contains("Rex"))])
Output:
[3, 4]
Name Type
3 charlie Raptors
4 alpha Tyrannosaurus rex
Name Type
4 alpha Tyrannosaurus rex
This function can be very useful in performing a partial string matching across multiple columns of the dataframe.
Related Article - Pandas DataFrame
- How to Get Pandas DataFrame Column Headers as a List
- How to Delete Pandas DataFrame Column
- How to Convert Pandas Column to Datetime
- How to Convert a Float to an Integer in Pandas DataFrame
- How to Sort Pandas DataFrame by One Column's Values
- How to Get the Aggregate of Pandas Group-By and Sum
Related Article - Pandas DataFrame Row
- How to Get the Row Count of a Pandas DataFrame
- How to Randomly Shuffle DataFrame Rows in Pandas
- How to Filter Dataframe Rows Based on Column Values in Pandas
- How to Iterate Through Rows of a DataFrame in Pandas
- How to Get Index of All Rows Whose Particular Column Satisfies Given Condition in Pandas
- How to Find Duplicate Rows in a DataFrame Using Pandas