How to Extract Substring From a String in Python
- Extract Substring Using String Slicing in Python
-
Extract Substring Using the
slice()
Constructor in Python - Extract Substring Using Regular Expression in Python
The string is a sequence of characters. We deal with strings all the time, no matter if we are doing software development or competitive programming. Sometimes, while writing programs, we have to access sub-parts of a string. These sub-parts are more commonly known as substrings. A substring is a subset of a string.
In Python, we can easily do this task using string slicing or using regular expression or regex.
Extract Substring Using String Slicing in Python
There are a few ways to do string slicing in Python. Indexing is the most basic and the most commonly used method. Refer to the following code.
myString = "Mississippi"
print(myString[:]) # Line 1
print(myString[4:]) # Line 2
print(myString[:8]) # Line 3
print(myString[2:7]) # Line 4
print(myString[4:-1]) # Line 5
print(myString[-6:-1]) # Line 6
Output:
Mississippi
issippi
Mississi
ssiss
issipp
ssipp
In the above code, we add []
brackets at the end of the variable storing the string. We use this notation for indexing. Inside these brackets, we add some integer values that represent indexes.
This is the format for the brackets [start : stop : step]
(seperated by colons (:
)).
By default, the value of start
is 0
or the first index, the value of stop
is the last index, and the value of step
is 1
. start
represents the starting index of the substring, stop
represents the ending index of the substring, and step
represents the value to use for incrementing after each index.
The substring returned is actually between start
index and stop - 1
index because the indexing starts from 0
in Python. So, if we wish to retrieve Miss
from Mississippi
, we should use [0 : 4]
The brackets can’t be empty. If you wish to use the default values, the required amount of colons :
should be added with spaces in-between to state which parameter you refer to. Refer to the following list for better understanding.
[:]
-> Returns the whole string.[4 : ]
-> Returns a substring starting from index4
till the last index.[ : 8]
-> Returns a substring starting from index0
till index7
.[2 : 7]
-> Returns a substring starting from index2
till index6
.[4 : -1]
-> Returns a substring starting from index4
till second last index.-1
can be used to define the last index in Python.[-6 : -1]
-> Returns a substring starting from the sixth index from the end till the second last index.
Extract Substring Using the slice()
Constructor in Python
Instead of mentioning the indexes inside the brackets, we can use the slice()
constructor to create a slice
object to slice a string or any other sequence such as a list or tuple.
The slice(start, stop, step)
constructor accepts three parameters, namely, start
, stop
, and step
. They mean exactly the same as explained above.
The working of slice
is a bit different as compared to brackets notation. The slice object is put inside the string variable brackets like this myString[<'slice' object>]
.
If a single integer value, say x
, is provided to the slice()
constructor and is further used for index slicing, a substring starting from index 0
till index x - 1
will be retrieved. Refer to the following code.
myString = "Mississippi"
slice1 = slice(3)
slice2 = slice(4)
slice3 = slice(0, 8)
slice4 = slice(2, 7)
slice5 = slice(4, -1)
slice6 = slice(-6, -1)
print(myString[slice1])
print(myString[slice2])
print(myString[slice3])
print(myString[slice4])
print(myString[slice5])
print(myString[slice6])
Output:
Mis
Miss
Mississi
ssiss
issipp
ssipp
The outputs received are self-explanatory. The indexes follow the same rules as defined for brackets notation.
Extract Substring Using Regular Expression in Python
For regular expression, we’ll use Python’s in-built package re
.
import re
string = "123AAAMississippiZZZ123"
try:
found = re.search("AAA(.+?)ZZZ", string).group(1)
print(found)
except AttributeError:
pass
Output:
Mississippi
In the above code, the search()
function searches for the first location of the pattern provided as an argument in the passed string. It returns a Match
object. A Match
object has many attributes which define the output such as the span
of the substring or the starting and the ending indexes of the substring.
print(dir(re.search('AAA(.+?)ZZZ', string)))
will output all the attributes of the Match
object. Note that some attributes might be missing because when dir()
is used, __dir__()
method is called, and this method returns a list of all the attributes. And this method is editable or overridable.