How to Get Substring in Pandas
-
Get
Substring
From PandasDataFrame
Column Values -
Extract the
First N
Characters From a String -
Extract the
Last N
Characters From a String -
Extract
Any Substring
From the Middle of a String
Pandas is an open-source data analysis library in Python. It provides many built-in methods to perform operations on numerical data.
In this guide, we will get a substring (part of a string) from the values of a pandas data frame column through different approaches. It could be helpful when we want to extract some meaningful substring from a string.
Get Substring
From Pandas DataFrame
Column Values
We will use string slicing
methods to achieve this task. The str.slice()
method returns a portion of a string without modifying the actual string.
Syntax:
# Python 3.x
df.column_name.str.slice(start_index, end_index)
We can also do string slicing using the str
accessor with square brackets([]
).
# Python 3.x
df.column_name.str[start_index:end_index]
Extract the First N
Characters From a String
We have a Pandas data frame in the following example consisting of the complete processor name. If we want to get the substring intel
(first five characters), we will specify 0
and 5
as start
and end
indexes, respectively.
We can also mention only the end index if we use the square bracket method because they have the same meaning.
Example Code:
# Python 3.x
import pandas as pd
import numpy as np
df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Brand Name"] = df.Processor.str.slice(0, 5)
display(df)
Output:
Extract the Last N
Characters From a String
If we want to extract the brand modifier
(last two characters) from the string, we will use negative indexing
in the string slicing. We will pass the start index -2
(the second last character’s index) and leave the end index empty.
It will automatically take the last two characters from the string.
Example Code:
# Python 3.x
import pandas as pd
import numpy as np
df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Brand Modifier"] = df.Processor.str.slice(
-2,
)
display(df)
Output:
Extract Any Substring
From the Middle of a String
To get a substring from the middle of a string, we have to specify the start and end index in string slicing. Here, if we want to get the word Core
, we will mention 6
and 10
as start and end indexes, respectively.
It will get the substring between(and inclusive of) the specified positions.
Example Code:
# Python 3.x
import pandas as pd
import numpy as np
df = {"Processor": ["Intel Core i7", "Intel Core i3", "Intel Core i5", "Intel Core i9"]}
df = pd.DataFrame.from_dict(df)
display(df)
df["Series"] = df.Processor.str[6:10]
display(df)
Output:
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn