如何在 Python 2 和 Python 3 中將位元組轉換為字串

Jinku Hu 2023年10月10日 Python Python Bytes Python Encoding-Decoding Python String

在 Python 2.x 中將位元組轉換為字串
在 Python 3.x 中將位元組轉換為字串
Python 位元組轉換為字串的不同方法的效能比較和結論

我們將來介紹將如何在 Python 2.x 和 Python 3.x 中將 bytes 轉換為字串。

在 Python 2.x 中將位元組轉換為字串

Python 2.7 中的 bytes 等同於 str，因此 bytes 變數本身就是字串。

python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> A = b'cd'
>>> A
'cd'
>>> type(A)
<type 'str'>

在 Python 3.x 中將位元組轉換為字串

bytes 是 Python 3 中引入的新資料型別。

python 3.6.3 (v3.6.3:2c5fed8, Oct  3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> A = b'cd'
>>> A
b'cd'
>>> type(A)
<class 'bytes'>
>>>

bytes 中的元素的資料型別為 int。

>>> A = b'cd'
>>> A[0]
99
>>> type(A[0])
<class 'int'>

在 Python 3.x 中通過 `decode` 將位元組轉換為字串

bytes 的 decode 方法可以使用給定的 encoding 將位元組轉換為字串。在大多數情況下，如果將 encoding 方法沒有被指定，那預設設定是 utf-8。但它並非總是安全的，因為位元組可以使用其他的不同於 utf-8 的編碼方法進行編碼。

>>> b'\x50\x51'.decode()
'PQ'
>>> b'\x50\x51'.decode('utf-8')
'PQ'
>>> b'\x50\x51'.decode(encoding = 'utf-8')
'PQ'

如上所示，三種對 bytes 的解碼方式得到的結果相同，因為它們都是用 utf-8 作編碼方法。

如果 bytes 不是用 utf-8 編碼，但你用 utf-8 解碼的話，系統會報 UnicodeDecodeError 錯誤。

>>> b'\x50\x51\xffed'.decode('utf-8')
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    b'\x50\x51\xffed'.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid start byte

我們有兩種方法可以解決此類 encoding 問題。

`backslashreplace`, `ignore` 或者 `replace` 作為 `errors` 的引數

decode 除 encoding 外，還有另一個引數 errors。它定義了 error 事件發生時的行為。預設值 errors 是 strict，這意味著如果在解碼過程中發生錯誤，則會丟擲錯誤。

error 還有其他選項，如 ignore，replace 或其他註冊的 codecs.register_error 名稱，比如 backslashreplace。

ignore 忽略錯誤的解碼錯誤，並儘可能建立輸出字串。

replace 用 encoding 給定方法中定義的字元替換相應的字元。backslashreplace 用與原始 bytes 內容相同的內容替換無法解碼的字元。

>>> b'\x50\x51\xffed'.decode('utf-8', 'backslashreplace')
'PQ\\xffed'
>>> b'\x50\x51\xffed'.decode('utf-8', 'ignore')
'PQed'
>>> b'\x50\x51\xffed'.decode('utf-8', 'replace')
'PQ�ed'

如果 bytes 資料的編碼未知，則可以使用 MS-DOS cp437 編碼。

>>> b'\x50\x51\xffed'.decode('cp437')
'PQ\xa0ed'

在 Python 3.x 中用 `chr` 將位元組轉換為字串

chr(i, /) 返回一個包含一個序號的字元的 Unicode 字串，它可以將 bytes 的單個元素轉換為 string，而不是轉換整個 bytes。

我們可以使用列表推導或 map，用 chr 來獲取 bytes 單個元素轉換成的字元，從而得到整個轉換後的字串。

>>> A =  b'\x50\x51\x52\x53'
>>> "".join([chr(_) for _ in A])
'PQRS'
>>> "".join(map(chr, A))
'PQRS'

Python 位元組轉換為字串的不同方法的效能比較和結論

我們用 timeit 來比較本教程中介紹的方法- decode 和 chr 的效能。

>>> import timeit
>>> timeit.timeit('b"\x50\x51\x52\x53".decode()', number=1000000)
0.1356779
>>> timeit.timeit('"".join(map(chr, b"\x50\x51\x52\x53"))', number=1000000)
0.8295201999999975
>>> timeit.timeit('"".join([chr(_) for _ in b"\x50\x51\x52\x53"])', number=1000000)
0.9530071000000362

從上面顯示的時間效能中可以看出，decode() 它要比 chr() 快得多。chr() 效率相對較低，因為它需要從單個字串字元中重建字串。

我們建議在效能重要的應用程式中使用 decode。

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

作者： Jinku Hu

DelftStack.com 創辦人。Jinku 在機器人和汽車行業工作了8多年。他在自動測試、遠端測試及從耐久性測試中創建報告時磨練了自己的程式設計技能。他擁有電氣/ 電子工程背景，但他也擴展了自己的興趣到嵌入式電子、嵌入式程式設計以及前端和後端程式設計。

LinkedIn Facebook

如何在 Python 2 和 Python 3 中將位元組轉換為字串

在 Python 2.x 中將位元組轉換為字串

在 Python 3.x 中將位元組轉換為字串

在 Python 3.x 中通過 `decode` 將位元組轉換為字串

`backslashreplace`, `ignore` 或者 `replace` 作為 `errors` 的引數

在 Python 3.x 中用 `chr` 將位元組轉換為字串

Python 位元組轉換為字串的不同方法的效能比較和結論

相關文章 - Python Bytes

相關文章 - Python String

在 Python 2.x 中將位元組轉換為字串

在 Python 3.x 中將位元組轉換為字串

在 Python 3.x 中通過 decode 將位元組轉換為字串

backslashreplace, ignore 或者 replace 作為 errors 的引數

在 Python 3.x 中用 chr 將位元組轉換為字串

Python 位元組轉換為字串的不同方法的效能比較和結論

相關文章 - Python Bytes

相關文章 - Python String

在 Python 3.x 中通過 `decode` 將位元組轉換為字串

`backslashreplace`, `ignore` 或者 `replace` 作為 `errors` 的引數

在 Python 3.x 中用 `chr` 將位元組轉換為字串