How to Convert String to Unicode in Python
This tutorial will discuss converting regular strings into Unicode strings in Python.
Convert Strings to Unicode in Python 2
In Python 2, regular strings are known as byte strings and we can use the built-in unicode()
function to convert these byte strings into a Unicode string. This code snippet shows us how we can convert a regular string into a Unicode string in Python 2.
regular = "regular string"
unicode_string = unicode(regular, "utf-8")
print(type(regular))
print(type(unicode_string))
Output:
<type 'str'>
<type 'unicode'>
We converted the regular byte string into a Unicode string with the unicode()
function in Python 2.
Convert Strings to Unicode Format in Python 3
In Python 3, strings are Unicode strings by default and there’s no method for us to convert a regular string into a Unicode string. Hence, the following code gives different results on Python 2 and Python 3.
regular = "regular string"
unicode_string = u"Unicode string"
print(type(regular))
print(type(unicode_string))
Python 2 Output:
<type 'str'>
<type 'unicode'>
Python 3 Output:
<class 'str'>
<class 'str'>
In the code above, we initialize a Unicode string in both Python 2 and Python 3. In Python 2, the string belongs to the class unicode
because there’s a difference between regular strings and Unicode strings, whereas, in Python 3, the string belongs to the class str
. After all, Unicode strings are the same as regular strings.
Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.
LinkedIn