How to Decode URL in Python
-
Use the
urllib.parse.unquote()
Function to Decode a URL in Python -
Use the
urllib.parse.unquote_plus()
Function to Decode a URL in Python -
Use the
requests
Module to Decode a URL in Python - Encode and Decode Unicode Encoded URL String Using UTF-8 in Python
-
Use the
unquote()
andunescape()
Functions to Decode URL in Python - Conclusion
URL encoding is vital for data security when using APIs or transmitting data online. However, there are times when we need to decode these encoded URLs back into plain text.
In this article, we’ll explore various methods for URL decoding in Python, which can be especially helpful when working with web forms.
Use the urllib.parse.unquote()
Function to Decode a URL in Python
The urllib.parse.unquote()
function efficiently converts a percent-encoded string to plain text. It replaces %x
escape sequences with their respective characters, working with byte
and str
objects.
To utilize this function, import the urllib
library. This package provides several libraries and functions that make it easy to work with URLs in Python.
Example Code:
import urllib.parse
url = "delftstack.com/code=%20HOW%20TO%20Articles"
x = urllib.parse.unquote(url)
print(x)
First, we import the urllib.parse
module, which provides utilities for working with URLs. Then, define a variable url
and assign it a URL string that contains some percent-encoded characters.
Percent-encoded characters in URLs are represented by a '%'
followed by two hexadecimal digits that represent the character’s ASCII code.
We use the urllib.parse.unquote()
function to decode the url
variable. This function takes a percent-encoded URL as input and replaces the encoded characters with their actual character values.
The result of decoding the URL is stored in the variable x
. Finally, we print the decoded URL, which will show the original string without percent-encoded characters.
Output:
delftstack.com/code= HOW TO Articles
In the output, the %20
sequences have been replaced with spaces. The other characters remain unchanged as they were not URL-encoded.
Use the urllib.parse.unquote_plus()
Function to Decode a URL in Python
In HTML forms, you often encounter +
signs when decoding values. Unlike urllib.parse.unquote()
, which can’t decode the +
signs, the urllib.parse.unquote_plus()
function is designed to handle it.
It replaces +
signs with spaces. However, this function only works with str
objects.
Example Code:
import urllib.parse
url = "delftstack.com/code=HOW%20TO+Articles"
x = urllib.parse.unquote_plus(url)
print(x)
In the code, we import the urllib.parse
module. Then, we define a variable url
and assign it a URL string, and in this URL, %20
represents a space, and %2B
represents a plus sign.
Next, we use the urllib.parse.unquote_plus()
function to decode the url
variable. This function takes a percent-encoded URL as input and replaces the encoded characters with their actual character values, and it also replaces the '+'
character with a space.
The result of decoding the URL is stored in the variable x
. Finally, we print the decoded URL.
Output:
delftstack.com/code=HOW TO Articles
Aside from replacing the %20
sequence with a space, the +
sign in the original URL has also been replaced with a space.
Use the requests
Module to Decode a URL in Python
Python offers a convenient and efficient library called requests
for sending HTTP requests within Python. This library can also be valuable for URL decoding tasks, especially when working with HTML forms in Python.
Similar to the urllib.parse.unquote()
function, the requests.utils.unquote()
function can decode URLs without filtering out the +
sign.
Example Code:
import requests
url = "delftstack.com/code=%20HOW%20TO%20Articles"
decoded_url = requests.utils.unquote(url)
print(decoded_url)
First, we import the requests
library, which is used for making HTTP requests. Then, define a URL string with some percent-encoded characters.
Next, utilize the requests.utils.unquote()
function to decode the URL. This function replaces percent-encoded characters (e.g., '%20'
) with their actual values.
Lastly, the result of decoding the URL is stored in the variable decoded_url
and prints the decoded URL.
Output:
delftstack.com/code= HOW TO Articles
The output displays the decoded URL string named "url"
. The %20
encodings are replaced with spaces, making the URL more human-readable.
Encode and Decode Unicode Encoded URL String Using UTF-8 in Python
The first example demonstrates decoding a unicode-encoded string by encoding it first using the UTF-8 method.
Decode Unicode Encoded Plain String in Python
Here, the first input given is a unicode-encoded string that cannot be decoded directly, so it needs to be UTF-8 encoded before proceeding further.
-
Import the Python library package
urllib.parse
. Note that importingparse
along withurllib
is necessary. -
The string must be saved inside the variable
u
and encoded.Syntax:
urllib.parse.quote(variable_name.encode('utf8'))
The result is saved inside a new variable
url
, so that it can be used as input while decoding. -
The variable,
url
, is printed to view the encoded result.
The steps below demonstrate taking the encoded string and decoding it using unquote
.
-
A variable
f
is initialized to decode and store the result. -
The syntax
urllib.parse.unquote(url)
decodes the string stored inside the variableurl
and saves it into the variablef
. -
The variable
f
is printed to view the decoded string URL.
Example Code:
import urllib.parse
u = "Tan\u0131m"
url = urllib.parse.quote(u.encode("utf8"))
print(url)
f = urllib.parse.unquote(url)
print(f)
Output:
"C:\Users\Win 10\main.py"
Tan%C4%B1m
Tanım
The first line prints the URL-encoded version of "Tanım"
, which is "Tan%C4%B1m"
. The second line prints the decoded version of the URL-encoded string, which returns the original string "Tanım"
with the non-ASCII character correctly represented.
Decode Unicode Encoded URL String in Python
In some scenarios, URLs are encoded using the Unicode format. Decoding unicode-encoded string URLs is a complex job, as not many tools are available for this purpose.
A user might have to create a decoder on its own to decode unicode-encoded string URLs. A turnaround to this problem is implementing the above method to Unicode URLs.
When the above method is applied, Unicode URLs are first encoded using the UTF-8 format, and then the bytes are %
escaped from it, resulting in a decoded URL string.
Example Code:
import urllib.parse
u = (
"%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05"
"D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4"
)
url = urllib.parse.quote(u.encode("utf8"))
f = urllib.parse.unquote(url)
print(f)
In the above example, we import the urllib.parse module
, which provides functions for working with URLs. Then, we define a URL-encoded string and store it in the variable 'u'
.
Next, encode the URL using urllib.parse.quote()
to percent-encode the special characters. We also encode it in UTF-8 before `quoting.
Use urllib.parse.unquote()
to decode the URL. Lastly, print the decoded URL.
Output:
"C:\Users\Win 10\main.py"
%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4
In the output, the urllib.parse.unquote(url)
attempts to decode the URL-encoded string back to its original form. However, since the input string was already URL-encoded with %u
encoding for Unicode characters, the output retains the same URL-encoded format.
Use the unquote()
and unescape()
Functions to Decode URL in Python
The following code demonstrates how to decode a URL using Python’s libraries, specifically urllib
and html
. We’ll use the unquote()
function from the urllib.request
sub-package to decode the URL and the unescape()
function from the html
package to handle any HTML escaping.
Example Code:
from urllib.request import unquote
from html import unescape
f = (
"https://v.w.xy/p1/p22?userId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&"
"confirmationToken=7uAf%2fxJoxRTFAZdxslCn2uwVR9vV7cYrlHs%2fl9sU%2frix9f9C"
"nVx8uUT%2bu8y1%2fWCs99INKDnfA2ayhGP1ZD0z%2bodXjK9xL5I4gjKR2xp7p8Sckvb04mddf"
"%2fiG75QYiRevgqdMnvd9N5VZp2ksBc83lDg7%2fgxqIwktteSI9RA3Ux9VIiNxx%2fZLe9dZSHxRq9AA"
)
print(unescape(unquote(f)))
Import the unquote
function from the urllib.request
module to decode URL-encoded characters and import the unescape
function from the html
module to decode HTML-encoded entities. Define a URL string and store it in the variable f
.
In this line, print(unescape(unquote(f)))
, we use the unquote()
function to decode the URL-encoded characters in the string(f
). Use the unescape()
function to decode any HTML-encoded entities in the string.
Lastly, print the decoded URL.
Output:
https://v.w.xy/p1/p22?userId=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx&confirmationToken=7uAf/xJoxRTFAZdxslCn2uwVR9vV7cYrlHs/l9sU/rix9f9CnVx8uUT+u8y1/WCs99INKDnfA2ayhGP1ZD0z+odXjK9xL5I4gjKR2xp7p8Sckvb04mddf/iG75QYiRevgqdMnvd9N5VZp2ksBc83lDg7/gxqIwktteSI9RA3Ux9VIiNxx/ZLe9dZSHxRq9AA
In the output, all the URL-encoded characters have been converted to their original characters. Any HTML entities in the URL have been unescaped to their corresponding characters.
This code is useful when working with URLs that may contain both URL-encoded and HTML-escaped elements, ensuring a clean and usable URL for further processing.
Conclusion
Decoding URLs is a crucial skill in web development and data processing. Python offers various methods, each tailored to different scenarios.
Remember to select the method that best suits your unique needs. Whether you’re navigating HTTP requests, managing form data, or handling Unicode-encoded URLs, Python’s flexibility ensures you can decode URLs effectively for your projects.
Vaibhhav is an IT professional who has a strong-hold in Python programming and various projects under his belt. He has an eagerness to discover new things and is a quick learner.
LinkedIn