How to Work With Regex in Scala
-
Regex
Class in Scala - Find Matches in Text in Scala
-
Use
Regex
to Replace a Text in Scala -
Use
Regex
to Extract Values in Scala - Conclusion
Regular expressions define a common pattern used to match the input data. These are highly useful for pattern matching and text processing or parsing.
In this article, we’ll learn how to work with Regex(regular expressions) in Scala.
Regex
Class in Scala
Regex
is a class in Scala that is imported from scala.util.matching.Regex
, based on the Java package java.util.regex
, which is extensively used for pattern matching and text parsing. Regex
objects can be created in two ways.
The first method is to explicitly create the Regex
class object.
val x = new Regex("you")
The second method is to use the r
method.
val x = "You".r
Let’s look at different use cases for regular expressions with Regex
.
Find Matches in Text in Scala
Finding matches in the text is one of the most common use cases of the Regex.
Example Code:
import scala.util.matching.Regex
object myClass
{
def main(args: Array[String])
{
val x = new Regex("Tony")
val text = "Iron man is also known as Tony Stark. Tony is an Avenger"
println(x findFirstIn text)
}
}
Output:
Some(Tony)
In the above code, we used the findFirstIn
method to find the first match of the Regular expression, and the method returns an Option[String]
object.
Example Code:
import scala.util.matching.Regex
object myClass
{
def main(args: Array[String])
{
val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"
println((reg findAllIn text).mkString(","))
}
}
Output:
01-011,30-059
In the above example, we used the findAllIn
method to find all the matches and return the MatchIterator
. We then used the mkString
method to convert the output to a string separated by a ,
(comma).
We also have the findFirstMatchIn
method. It works like the findFirstIn
method but returns Option[Match]
.
Example Code:
import scala.util.matching.Regex
object myClass
{
def main(args: Array[String])
{
val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"
val result = reg.findFirstMatchIn(text)
println(Some("011"), for (x <- result) yield x.group(2))
}
}
Output:
(Some(011),Some(011))
Use Regex
to Replace a Text in Scala
This is another use case of Regex
that is replacing text. At times during text parsing, we might have replaced some part of it with something else.
Example Code:
import scala.util.matching.Regex
object myClass
{
// Main method
def main(args: Array[String])
{
val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"
println(reg replaceFirstIn(text, "1234"))
}
}
Output:
He lives in Warsaw 1234 and she lives in Cracow 30-059
In the above code, we’ve used the replaceFirstIn
method to replace the first match found in the text with the string "1234"
.
Example Code:
import scala.util.matching.Regex
object myClass
{
// Main method
def main(args: Array[String])
{
val reg = new Regex("([0-9]{2})\\-([0-9]{3})")
val text = "He lives in Warsaw 01-011 and she lives in Cracow 30-059"
println(reg replaceAllIn(text, "1234"))
}
}
Output:
He lives in Warsaw 1234 and she lives in Cracow 1234
In the above code, we used the replaceAllIn
method, which replaces all the matches found in text with "1234"
.
Use Regex
to Extract Values in Scala
When we find a match with regular expressions, we can use Regex
to extract values using pattern matching.
Example code:
import scala.util.matching.Regex
object myClass {
def main(args: Array[String]) {
val timestamp = "([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})".r
val time = "12:20:01.411" match {
case timestamp(hour, minutes, _, _) => println(s"It is $minutes minutes after $hour")
}
}
}
Output:
It is 20 minutes after 12
In Scala, Regex
by default behaves as if the pattern was anchored
. For example, the pattern is put in the middle of ^
and $
characters like ^pattern$
, but we can remove these characters using the method unanchored
, which is present in the UnanchoredRegex
class.
With this help, we can have additional text in our string and still find what we need.
Example code:
import scala.util.matching.Regex
object myClass
{
def main(args: Array[String]) {
val timestamp = "([0-9]{2}):([0-9]{2}):([0-9]{2}).([0-9]{3})".r
val temp = timestamp.unanchored
val time = "It is 12:20:01.411 in New York" match {
case temp(hour, minutes, _, _) => println(s"It is $minutes minutes after $hour")
}
}
}
Output:
It is 20 minutes after 12
Java inherits most of its regular expressions and its Regex
features from the Perl programming language, and Scala inherits its regular expressions syntax from Java.
Let’s look at some of Scala’s commonly used regular expressions taken from Java.
Subexpression | Matches |
---|---|
^ |
It matches the beginning of the line. |
$ |
It matches the beginning of the end. |
[...] |
It is used to match any single character present in the bracket. |
[^...] |
It is used to match any single character not present in the bracket |
\\w |
It is used to match the word characters. |
\\d |
It is used to match the digits. |
Conclusion
In this article, we have learned about the Regex
class present in Scala standard library. We have also seen how it provides different APIs, which help us deal with the different use cases of regular expressions
.