How to Use Regular Expressions in Excel VBA
Visual Basic for Applications is an event-driven programming language used for Microsoft applications. It allows users to automate tasks and rewrite several functionalities according to their requirements.
VBA allows the user to do multiple tasks with the help of a few code statements stored in the form of a macro. The macro allows the user to reuse the code over and over again.
Regular Expressions in Microsoft Excel
Microsoft Excel is a powerful application that allows data manipulation on a huge scale. Taking input from a database, manipulating the data, and summarizing the results is quite easy with the help of Microsoft Excel.
Regular Expressions (Regex) are used to identify and match a pattern within a string. Using regular expressions in Microsoft Excel, you can manipulate data to a large extent.
For instance, you may split the data input based on gender. In addition, you may also check if users have entered the correct email or not.
In short, it is quite easy to check formatted data and perform operations accordingly.
Allow the Use of Regular Expressions in Excel
It is essential to add a VBA reference to "Microsoft VBScript Regular Expressions 5.5"
to use regular expressions in Microsoft Excel. It can be done with the following steps:
-
Select the
Developer
tab in Excel -
Choose the
Visual Basic
icon from theCode
ribbon section -
Select
Tools
from the top menu in theMicrosoft Visual Basic for Applications
window -
Select
References
-
Check
Microsoft VBScript Regular Expressions 5.5
These steps allow you to use regular expressions in the VBA code.
Write a Regular Expression
Some of the basic definitions have been mentioned below that help you to formulate a regular expression.
-
Mention a Range
The
-
symbol is used to define a range. For example, thea-z
matches a string with lowercase letters. -
Match One Object
The square brackets
[]
are used to match exactly one of the objects mentioned inside the brackets. For instance, the[wxyz]
matches a single letter: w, x, y, or z. -
Match One or Zero Occurrence of a Pattern
A question mark
?
matches the pattern defined in the square brackets zero or one time. The[0-9]?
matches with an empty string or any one digit. -
Match Zero or More Occurrences of a Pattern
An asterisk
*
matches the pattern defined in the square brackets zero or more times. The[0-9]*
matches with an empty string or any string of digits. -
Match One or More Occurrences of a Pattern
A plus
+
sign matches the pattern defined in the square brackets one or more times. The[0-9]+
matches with at least one or more digits. -
Multiplicity of a Pattern
To allow a pattern to be repeated several times, the curly brackets
{}
are used. For example:6.1. The
[x]{2}
matches two consecutive lowercasex
’s. It matches withxx
.6.2. The
[x]{1, 3}
matches the pattern defined in the square brackets for a minimum of1
to a maximum of3
times. It matchesx
,xx
, andxxx
. -
The
OR
OperatorThe
OR
operator|
allows you to match between several options. For instance, thex|y|z
matches with exactly one of the options. -
The
NOT
OperatorThe
NOT
operator^
does not match the pattern defined in square brackets. For example, the[^a-z]
does not match with any lowercase letter. -
Group Different Matches
The parenthesis
()
is used to group the different matching patterns. For instance, the(^[0-9]{3})([a-z])
matches the pattern, which has three digits at the start, followed by a lowercase letter. -
Anchoring Patterns
The
^
operator can be used to match the starting of a string. For example, the expression^a
ensures that the string starts with a lowercase'a'
.The
$
operator is used to match the ending of a string. For example, the expressiona$
ensures that the string ends with a lowercase'a'
.
Precedence in Regular Expressions
Different operators have different precedence, which has been mentioned in the following table:
Order | Name | Representation |
---|---|---|
1 | Parentheses | ( ) |
2 | Multipliers | ? + * {m,n} {m, n}? |
3 | Sequence & Anchors | abc ^ $ |
4 | Alternation | | |
Character Abbreviations
Several predefined character abbreviations are used in regular expressions. Some of the predefined character abbreviations have been mentioned in the table below.
Abbreviation | Definition |
---|---|
\d |
To represent a single digit |
\D |
To represent a non-digit character |
\w |
To represent a word character |
\W |
To represent a non-word character |
\s |
To represent a space character |
\S |
To represent a non-space character |
\n |
To represent a new line |
Examples of Regular Expressions in Excel VBA
In the following examples, check the cell value for an uppercase character and replace it with an empty string. In short, all the uppercase characters are matched and deleted from the input string.
Run the Code as a Macro
The following code takes A1
as the cell reference and deletes all uppercase letters from the string placed on A1
. The result is displayed in a Msgbox
.
For example, the input string aAbBcC
will change to abc
in the message box.
Private Sub test()
Dim pattern As String: pattern = "[A-Z]"
Dim replace As String: replace = ""
Dim exp As New RegExp
Dim cellVal As String
Dim rangeref As Range
Set rangeref = ActiveSheet.Range("A1")
If pattern <> "" Then
cellVal = rangeref.Value
With exp
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If exp.test(cellVal) Then
MsgBox (exp.replace(cellVal, replace))
Else
MsgBox ("Not matched")
End If
End If
End Sub
Loop Through a Range
The same code used in the first example can be altered by adding a For
loop to it to run the macro over a specific range. The loop will run for each cell within the defined range and show the output of each cell in a message box.
Private Sub test()
Dim pattern As String: pattern = "[A-Z]"
Dim replace As String: replace = ""
Dim exp As New RegExp
Dim cellVal As String
Dim rangeref As Range
Set rangeref = ActiveSheet.Range("A1:A5")
For Each cell In rangeref
If pattern <> "" Then
cellVal = cell.Value
With exp
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If exp.test(cellVal) Then
MsgBox (exp.replace(cellVal, replace))
Else
MsgBox ("Not matched")
End If
End If
Next
End Sub
Create a Function
The following code can be used to create an in-cell function that deletes occurrences of uppercase letters in the string. The code is similar to the one used in the first example.
A few changes have been made to convert it into a function.
Function cellTest(rangeref As Range) As String
Dim pattern As String: pattern = "[A-Z]"
Dim replace As String: replace = ""
Dim exp As New RegExp
Dim cellVal As String
If pattern <> "" Then
cellVal = rangeref.Value
With exp
.Global = True
.MultiLine = True
.IgnoreCase = False
.pattern = pattern
End With
If exp.test(cellVal) Then
cellTest = exp.replace(cellVal, replace)
Else
cellTest = "Not matched"
End If
End If
End Function
If you use cellTest(A1)
where A1
contains DdEeleTte
, the function will return delete
.
Conclusion
There is no limit on the number of regular expressions that can be generated. Once you can identify a general format you require, create a regular expression for it and test all the strings based on that.
This can help you divide your data based on different features.
Hello, I am Bilal, a research enthusiast who tends to break and make code from scratch. I dwell deep into the latest issues faced by the developer community and provide answers and different solutions. Apart from that, I am just another normal developer with a laptop, a mug of coffee, some biscuits and a thick spectacle!
GitHub