How to Split String Based on Multiple Delimiters in Python
Python string split()
method allows a string to be easily split into a list based on a delimiter. Though in some cases, you might need the separation to occur based on not just one but multiple delimiter values. This quick 101 article introduces two convenient approaches this can be achieved in Python.
Split String With Two Delimiters in Python
Assume the following string.
text = "python is, an easy;language; to, learn."
For our example, we need to split it either by a semicolon followed by a space ;
, or by a comma followed by a space ,
. In this case, any occurrences of singular semicolons or commas i.e. ,
, ;
with no trailing spaces should not be concerned.
Regular Expressions
Although the use of regular expressions is often frowned upon due to its quite expensive nature when it comes to string parsing, it can safely be warranted in a situation like this.
Use Basic Expression
Python’s built-in module re
has a split()
method we can use for this case.
Let’s use a basic a or b regular expression (a|b
) for separating our multiple delimiters.
import re
text = "python is, an easy;language; to, learn."
print(re.split("; |, ", text))
Output:
['python is', 'an easy;language', 'to', 'learn.']
As mentioned on the Wikipedia page, Regular Expressions use IEEE POSIX as the standard for its syntax. By referring to this standard, we can administer several additional ways we may come about writing a regular expression that matches our use case.
Instead of using bar separators (|
) for defining our delimiters, we can achieve the same result using Range ([]
) syntax provided in Regular Expressions. You may define a range of characters a regular expression can match by providing them within square brackets.
Therefore when specifying the pattern of our regular expression, we can simply provide a semicolon and comma within square brackets and an additional space [;,]
which would result in the regular expression being matched by parts of a string with exactly [a semicolon OR comma] and a trailing space.
import re
text = "python is, an easy;language; to, learn."
print(re.split("[;,] ", text))
Make It a Function
Prior mentioned basic expression was limited to a hardcoded set of separators. This can later on lead to hassles when delimiter modifications occur and also limits its reusability on other parts of the code. Therefore, It is better in terms of using the best practices to consider making the code more generic and reusable. Hence let’s code that logic to a Python function just to be on our safe side.
import re
text = "python is, an easy;language; to, learn."
separators = "; ", ", "
def custom_split(sepr_list, str_to_split):
# create regular expression dynamically
regular_exp = "|".join(map(re.escape, sepr_list))
return re.split(regular_exp, str_to_split)
print(custom_split(separators, text))
Use String Functions
In case you want to refrain from using Regular Expressions or do not need to introduce new modules to the project just for the sake of splitting a string, you can use replace()
and split()
methods present in the string module itself in sort of a hacky way to achieve the same result.
text = "python is, an easy;language; to, learn."
# transform [semicolon-space] parts of the string into [comma-space]
text_one_delimiter = text.replace("; ", ", ")
print(text_one_delimiter.split(", "))
Here first off, we replace all occurrences of a semicolon followed by a space (; )
within the string with our other delimiter which is a comma followed by a space (, )
. This way we can limit the string splitting to just one delimiter, which is a comma followed by a space (, )
in this case.
Now we can safely split that modified string using the simple split()
function provided built-in by Python string module to bring about the same result.
Note that we have not imported any new modules to the code this time to achieve the outcome.
Split String With Multiple Delimiters in Python
Consider the text mentioned below.
text = "python is# an% easy;language- to, learn."
For this example, we need to split it on all instances the text has any of the characters # % ; - ,
followed by a space.
Regular Expressions
In this case, we can easily add the additional separators when defining our regular expression.
import re
text = "python is# an% easy;language- to, learn."
print(re.split("; |, |# |% |- ", text))
Output:
['python is', 'an', 'easy;language', 'to', 'learn.']
as a Function
In this situation as well, we can simply use the same code we used earlier with two delimiters with a simple change of adding all additional separators into the separators
variable.
import re
text = "python is# an% easy;language- to, learn."
separators = "; ", ", ", "# ", "% ", "- "
def custom_split(sepr_list, str_to_split):
# create regular expression dynamically
regular_exp = "|".join(map(re.escape, sepr_list))
return re.split(regular_exp, str_to_split)
print(custom_split(separators, text))
Use String Functions
Similar to the way we dealt with it before with two delimiters, we can use replace()
and split()
functions to deal with this as well.
text = "python is, an easy;language; to, learn."
# transform [semicolon-space] parts of the string into [comma-space]
text_one_delimiter = (
text.replace("# ", ", ").replace("% ", ", ").replace("; ", ", ").replace("- ", ", ")
)
print(text_one_delimiter.split(", "))
Output:
['python is', 'an easy;language', 'to', 'learn.']
It should be noted that this method is not recommended when used for higher delimiter counts such as in this instance. Since in this case string replace()
method will have to be used multiple times to make multiple traverses on the string, it would ultimately cause undesired performance spikes easily avoidable using Regular Expressions.