UTF-8 Encoding (CHCP 65001) in PowerShell
- Unicode in PowerShell
- Change System Locale to Use UTF-8 Encoding in Windows PowerShell
-
Set Encoding in
$PSDefaultParameterValues
Variable to Use UTF-8 Encoding in Windows PowerShell -
Use the
chcp
Command to Switch to UTF-8 Encoding in Windows PowerShell - Benefits of Using UTF-8 Encoding in PowerShell
- Conclusion
UTF-8 encoding, represented by CHCP 65001
in PowerShell, is a pivotal tool for working with multilingual and special characters in the console. This article will provide a comprehensive guide on how to utilize UTF-8 encoding in PowerShell, from understanding its significance to practical implementation.
Unicode in PowerShell
Unicode is a worldwide character encoding standard. It defines how characters in text files, web pages, and other documents are represented.
The computer system uses Unicode to manipulate characters and strings. The default encoding in PowerShell is Windows-1252.
Unicode was developed to support characters from all languages of the world. PowerShell supports a Unicode character encoding by default.
UTF-8 and UTF-16 are the most common Unicode encodings. PowerShell always uses BOM
in all Unicode encodings except UTF7.
The BOM
(byte-order-mark) is a Unicode signature included in the first few bytes of a file or text stream that indicates the Unicode encoding.
Understanding UTF-8 Encoding
UTF-8 is a character encoding standard that uses variable-width encoding to represent text. It’s capable of encoding virtually all characters in Unicode, making it the most widely used character encoding on the internet.
In the context of PowerShell, UTF-8 encoding ensures that characters from different languages, symbols, and special characters are displayed and processed correctly in the console window.
Change System Locale to Use UTF-8 Encoding in Windows PowerShell
There is an option to change the system locale (current language for non-Unicode programs) in Windows. But this feature is still in beta.
Go to Region Settings
from the Control Panel
or open intl.cpl
from the Run
program (Windows+R).
Open the Administrative
tab and click Change system locale
. Then, check the Beta
option, as shown in the image below.
After that, press OK
and restart the computer to apply the settings.
After restarting the computer, you can check the $OutputEncoding
variable to view the current encoding.
$OutputEncoding
Output:
As you can see, the current encoding is Unicode (UTF-8).
BodyName : utf-8
EncodingName : Unicode (UTF-8)
HeaderName : utf-8
WebName : utf-8
WindowsCodePage : 1200
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
IsSingleByte : False
EncoderFallback : System.Text.EncoderReplacementFallback
DecoderFallback : System.Text.DecoderReplacementFallback
IsReadOnly : True
CodePage : 65001
Now, you can view the characters of other languages in PowerShell.
Get-Content test.txt
Output:
만나서 반가워요
Set Encoding in $PSDefaultParameterValues
Variable to Use UTF-8 Encoding in Windows PowerShell
$PSDefaultParameterValues
is a built-in automatic variable in PowerShell that allows you to set default values for cmdlet parameters. This means you can specify default values for parameters of cmdlets without having to explicitly provide them every time you use the cmdlet.
You can run the following command to activate the UTF-8 encoding in PowerShell.
$PSDefaultParameterValues = @{'*:Encoding' = 'utf8' }
It is only valid for the current PowerShell console. It will be reset to default after you exit the PowerShell window.
Get-Content test.txt
Output:
만나서 반가워요
Several cmdlets in PowerShell have the -Encoding
parameter to specify the encoding for different character sets. Some of them are Add-Content
, Set-Content
, Get-Content
, Export-Csv
, Out-File
, etc.
The -Encoding
parameter supports these values: ascii
, bigendianunicode
, oem
, unicode
, utf7
, utf8
, utf8BOM
, utf8NoBOM
, utf32
.
Use the chcp
Command to Switch to UTF-8 Encoding in Windows PowerShell
To switch to UTF-8 encoding in PowerShell, use the chcp
command followed by 65001
:
chcp 65001
This command tells PowerShell to use UTF-8 encoding for character input and output.
Here’s what this command does in detail:
-
chcp
: This is a command in the Windows command prompt and PowerShell. It stands for"Change Code Page"
. The code page determines how characters are encoded and displayed in the console window. -
65001
: In this context,65001
represents the code page for UTF-8 encoding. UTF-8 is a variable-width character encoding capable of encoding all possible characters, or code points, in Unicode.- UTF-8: It’s a widely used character encoding that can represent almost all characters from all human languages. It’s especially prevalent on the internet.
-
Changing to UTF-8 (
65001
): When you runchcp 65001
, you’re telling PowerShell to use UTF-8 encoding for character input and output in the console. This can be helpful when working with text data that includes characters from different languages and symbols.For instance, if you’re dealing with files or data that contain non-English characters, setting the code page to UTF-8 ensures that these characters are displayed correctly in the console.
Resetting to Default Code Page
Remember that changing the code page might affect how some console applications behave, so it’s generally a good practice to reset it to the default code page (usually 437
for English) when you’re done using UTF-8.
To reset the code page to the default, you can use the command:
chcp 437
This will switch back to the default code page for your system, which is suitable for English text.
Benefits of Using UTF-8 Encoding in PowerShell
- Multilingual Support: UTF-8 allows PowerShell to handle text in multiple languages, ensuring correct display and processing of characters from different scripts.
- Special Characters: It’s crucial when dealing with special characters like emojis or mathematical symbols that aren’t represented in standard encodings.
- File Handling: When working with text files that include characters from various languages, using UTF-8 ensures accurate file operations.
- Data Processing: If you’re working with data that contains non-English characters, setting the code page to UTF-8 ensures correct handling and processing.
Potential Considerations
- Console Applications: Changing the code page might affect how some console applications behave. Always reset to the default code page (
chcp 437
for English) when done using UTF-8. - Compatibility: Ensure that the programs or scripts you’re running in PowerShell can handle UTF-8 encoding. Older applications may not fully support it.
Practical Use Cases
- Reading Files: When reading text files with non-English characters, using UTF-8 ensures accurate representation.
- Web Scraping: If you’re extracting text from websites that may contain characters from various languages, UTF-8 ensures correct interpretation.
- Script Outputs: If your scripts generate outputs with non-English characters, using UTF-8 ensures they are displayed correctly.
- Interactive PowerShell Sessions: For interactive sessions where you need to input or output text with special characters, UTF-8 encoding is invaluable.
Conclusion
UTF-8 encoding (CHCP 65001
) in PowerShell is a powerful tool for handling multilingual and special characters in the console. It allows for accurate representation and processing of text from various languages and scripts. Understanding when and how to use UTF-8 encoding ensures a seamless experience when working with diverse sets of characters.
Remember to consider the compatibility of programs or scripts with UTF-8 and always revert to the default code page when necessary. By harnessing the power of UTF-8 encoding, you’ll be equipped to handle a wide range of text data with confidence and accuracy in PowerShell.