Raw String and Unicode String in Python
Raw String in Python
Raw string literals in Python define normal strings that are prefixed with either an r
or R
before the opening quote. If a backslash (\
) is in the string, the raw string treats this character as a literal character but not an escape character.
For example,
print(r"\n")
print(r"\t")
Output:
\n
\t
It is required to double every backslash when defining a string so that it is not mistaken as the beginning of an escape sequence like a new-line, or the new-tab. We see such syntax application in the syntax of regular expressions and when expressing Windows file paths.
r'\'
will raise a syntax error because r
treats the backslash as a literal. Without the r
prefix, the backslash is treated as an escape character.Example:
text = "Hello\nWorld"
print(text)
Output:
Hello
World
Without the raw string flag r
, the backslash is treated as an escape character, so when the above string is printed, the new line escape sequence is generated. Hence the two strings in the text are printed out on separate lines, as displayed in the output.
Using the same text example, add the r
prefix before the string.
Example:
text = r"Hello\nWorld"
print(text)
Output:
Hello\nWorld
From the output, the raw string flag treats the backslash as a literal and prints out the text with the backslash included. So, the input and output are both the same because the backslash character is not escaped.
For instance, '\\n'
and r'\n'
have the same value.
print("\\n")
print(r"\n")
Python Unicode String
Unicode is one way of storing python strings. Unicode can store strings from all language types. The second way is the ASCII
type of string storage represented as str
in Python. str
is the default data type to store strings in Python.
To convert a string to Unicode type, put a u
before the text like this - u'string'
or call the unicode()
function like this - unicode('string')
.
u'text'
is a Unicode string whiletext
is a byte string. A Unicode object takes more memory space.
For example,
test = u"一二三"
print(test)
Output:
一二三