Regex Wildcards Using the Re Module in Python
-
Use the
re.sub()
Function for Regex Operations Using Wildcards in Python -
Replace Matches in Regular Expression Using
re.sub()
Module in Python -
Understand How to Use Wildcards With
re.sub()
Submodule - Use Two or More Regex Wildcards Together in Python
-
Perform Operations on Strings Using the Regex Pattern and
re.sub()
Function by Adding a Wildcard in Python - Conclusion
Wildcards are used in regular expressions as a symbol to represent or swap out one or more characters. These are mostly used to simplify search criteria.
This article explains in detail how to use re.sub()
with a wildcard in Python to match characters strings with regex.
Use the re.sub()
Function for Regex Operations Using Wildcards in Python
The re
module in Python is used for operations on Regular expressions (RegEx). These are unique strings of characters used to find a string or group of strings.
Comparing a text to a specific pattern may determine if it is present or absent.
It can also divide a pattern into one or more sub-patterns. Regex support is available in Python through the re
module.
Its main purpose is to search for a string inside a regular expression.
Before we understand how to use re.sub()
with a wildcard in Python, let’s learn the implementation of the re.sub()
function on normal string statements.
Replace Matches in Regular Expression Using re.sub()
Module in Python
The re.sub()
function replaces one or more matches in the given text with a string.
re.sub(pattern, repl, string, count=0, flags=0)
It returns the string created by substituting the replacement repl
for the pattern’s leftmost non-overlapping occurrences in the string.
In the absence of a match, the string is returned in its original form. If repl
is a string, any backslash escapes are processed. The repl
can be a function as well.
Let’s understand the code example below.
import re
rex = "[0-9]+"
string_reg = "ID - 54321, Pay - 586.32"
repl = "NN"
print("Original string")
print(string_reg)
result = re.sub(rex, repl, string_reg)
print("After replacement")
print(result)
What the code does:
- The first line of code imports the
re
module. - The pattern to search is stored inside the variable
rex
. The quantifier -[0-9]+
implies a group of digits ranging from 0-9 whose decimal places can extend to any number of digits. - The string on which the sub-operation will be implemented is stored inside the variable
string_reg
. - The string to replace the pattern is stored inside the variable
repl
. - The
re.sub()
operation looks up the patternrex
inside the string variablestring_reg
and replaces it withrepl
. The returned string is stored inside the variableresult
.
result = re.sub(rex, repl, string_reg)
Output: All the numeric digits are replaced with 'NN'
, while all the alphabetical ones are left untouched.
Original string
ID - 54321, Pay - 586.32
After replacement
ID - NN, Pay - NN.NN
Understand How to Use Wildcards With re.sub()
Submodule
This article mainly focuses on four types of wildcards - .
(Dot), *
, ?
, and +
. Learning what each of them does is important in learning how to use re.sub()
with a wildcard in Python.
-
.
(The Dot) - Usere.sub
with the.
wildcard in Python to match any character except a new line. There
module is imported in the program below, and three string instances are stored inside a string variablestring_reg
.Using
re.sub()
with the.
wildcard in Python, thestring_reg
variable is overwritten with the result returned from there.sub()
function. As the dot matches a new character, the program searches for the patternad
and any number ofd
that are repeated afterad
.In the output, it can be seen that every time the program finds a pattern
ad.
, it replaces it withREMOVED
.import re string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx" string_reg = re.sub(r"ad.", "REMOVED ", string_reg) print(string_reg)
Output:
a23kREMOVED hh234 ... REMOVED 2asdf675 ... xxxREMOVED 2axxx
-
The asterisk (
*
) - Usere.sub()
with this wildcard in Python to give the precedingRE
as many repetitions as possible, matching 0 or more of those repetitions in the resultingRE
.For example,
ad*
matches the letters'a'
,'ad'
, or'a'
that is followed by any number ofd
.It can be seen in the output here that every instance of
'a'
and'ad'
is replaced with the keyword'PATCH'
.import re string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx" string_reg = re.sub(r"ad*", "PATCH", string_reg) print(string_reg)
Output:
PATCH23kPATCHhh234 ... PATCH2PATCHsdf675 ... xxxPATCH2PATCHxxx
-
The
+
- Usere.sub()
with this wildcard in Python to match 1 or more repeats of the previousRE
in the newRE
.Ad+
will not match'a'
; instead, it matches'a'
followed by any non-zero number ofd
.The function searches for the pattern
'ad....'
where the'...'
represents the repeating number of the succeedingRE
'd'
and replaces it with'POP'
.import re string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx" string_reg = re.sub(r"ad+", "POP", string_reg) print(string_reg)
Output:
a23kPOPhh234 ... POP2asdf675 ... xxxPOP2axxx
-
The
?
- makes the nextRE
match the previousRE
’s 0 or 1 repetitions. The patternad?
matches either'a'
or'ad'
.The program finds the instances of
'a'
or'ad'
and replaces them with the regular expression (REGEX)'POP'
.import re string_reg = "a23kaddhh234 ... add2asdf675 ... xxxadd2axxx" string_reg = re.sub(r"ad?", "POP", string_reg) print(string_reg)
Output:
POP23kPOPdhh234 ... POPd2POPsdf675 ... xxxPOPd2POPxxx
Use Two or More Regex Wildcards Together in Python
Sometimes using re.sub()
with a wildcard in Python with just a single quantifier is not enough to get the desired result. Combining quantifiers enable the possibility of passing more complex patterns to the system.
Let’s understand some of them.
-
The
*?
,+?
,??
- In the previous examples, we have learned about the ‘.
’, ‘+
’, ‘*
’ quantifiers. All of them are greedy, implying that they match as much text as possible.For example, if the
RE<.*>
is matched against<a> b <c>
, it will match the full string rather than just<a>
, which is often not the desired behavior.The
?
quantifier is added at the end to solve the issue. The quantifier instructs it to do the match in a minimal or non-greedy manner, implying that the fewest characters get matched.Only
<a>
will match when theRE<.*?>
pattern is used.import re string_reg = "as56ad5 ... dhgasd55df ... xxxadd2axxx" string_reg = re.sub(r"ad*?", "SUGAR", string_reg) print(string_reg)
Output: The
ad*?
quantifier searches instances of just'a'
.SUGARs56SUGARd5 ... dhgSUGARsd55df ... xxxSUGARdd2SUGARxxx
For
ad+?
: It searches the instance of just'ad'
.as56SUGAR5 ... dhgasd55df ... xxxSUGARd2axxx
For
ad??
: It also searches instances of just'a'
.SUGARs56SUGARd5 ... dhgSUGARsd55df ... xxxSUGARdd2SUGARxxx
-
The
*+
,++
,?+
(also known as possessive quantifiers) - Similar to the'*'
,'+'
, and'?'
quantifiers, those with the'+'
match as frequently as feasible.When the expression after it doesn’t match, these don’t allow for backtracking as the greedy quantifiers do. This type of quantifier is known as a possessive quantifier.
For instance,
a*a
will match"aaaa"
since thea*
matches all four a’s, but when the final"a"
is encountered, the expression backtracks, and thea*
only matches three a’s in total with the final"a"
matching the fourth"a"
.But when the expression
a*+a
is used to match"aaaa"
, thea*+
will match all four"a"
s, but it cannot be backtracked and will not match with the final"a"
as it cannot find any more characters to match.The equivalents of
x*+
,x++
, andx?+
are(?>x*)
,(?>x+)
, and(?>x?)
respectively. Let’s look at the program to understand the concept better.import regex string_reg = "as56ad5 ... dhgasd55df ... xxxadd2axxx" string_reg = regex.sub(r"ad*+", "SUGAR", string_reg) print(string_reg)
Note: The
re
module does not support possessive quantifiers. Use theregex()
module instead.Output: Finds instance of either
a
or'adddd....'
.SUGARs56SUGAR5 ... dhgSUGARsd55df ... xxxSUGAR2SUGARxxx
For
ad++
: Finds instance of'ad'
or'adddd....'
.as56SUGAR5 ... dhgasd55df ... xxxSUGAR2axxx
For
ad+?
: Behaves the same asad++
.as56SUGAR5 ... dhgasd55df ... xxxSUGARd2axxx
Perform Operations on Strings Using the Regex Pattern and re.sub()
Function by Adding a Wildcard in Python
We have learned how to use the re.sub()
with a wildcard in Python. Now we will use the concepts together to search for a string pattern in a Regex and replace the whole word instead of just the string pattern.
The problem statement presents us with a string and a pattern. The pattern needs to be searched inside the given string.
Once found, the re.sub()
function will replace the whole word.
Example: Replace the Whole Word When the Pattern Is Found in the Beginning
-
Import the
re
module. -
Create a variable
string_reg
and store any string value. Here, a compound string is stored, meaning there.sub()
function will implement its effect on all four groups inside the string.string_reg = """\ 23khadddddh234 > REMOVED23khh234 add2asdf675 > REMOVED2asdf675"""
-
The function needs to find a pattern inside the string, which replaces the whole string when found. The pattern to find is
'add'
, so a combination of quantifiers is used to achieve the desired result.The combination should be in a way that matches
'ad'
,'add'
, or'addddd'
. However, neitheradd23khh234
noradd2asdf675
should match.The best way to do it is to use
add.+?
.string_reg = re.sub(r"add.+? ", "REMOVED ", string_reg)
Code:
import re
string_reg = """\
... 23khadddddh234 > REMOVED23khh234
... add2asdf675 > REMOVED2asdf675"""
string_reg = re.sub(r"add.+? ", "REMOVED ", string_reg)
print(string_reg)
Output: The program searches for 'ad...'
, and when found, replaces it with repl
'REMOVED'
. If the 'ad...'
is spotted at the beginning, it replaces the whole word.
... 23khREMOVED > REMOVED23khh234
... REMOVED > REMOVED2asdf675
Conclusion
A vivid description of how to use re.sub()
with a wildcard in Python is presented. The article’s first section focuses on using the Python function re.sub
with simple REGEX.
Then the concept of using wildcards with re.sub()
is explained in detail.
After going through the article, the reader can easily use re.sub()
with a wildcard in Python and create programs that search string patterns in REGEX.