How to Extract Domain From URL in Python
This article will use practical examples to explain Python’s urlparse()
function to parse and extract the domain name from a URL. We’ll also discuss improving our ability to resolve URLs and use their different components.
Use urlparse()
to Extract Domain From the URL
The urlparse()
method is part of Python’s urllib
module, useful when you need to split the URLs into different components and use them for various purposes. Let us look at the example:
from urllib.parse import urlparse
component = urlparse("http://www.google.com/doodles/mothers-day-2021-april-07")
print(component)
In this code snippet, we have first included the library files from the urllib
module. Then we passed a URL to the urlparse
function. The return value of this function is an object that acts like an array having six elements that are listed below:
scheme
- Specify the protocol we can use to get the online resources, for instance,HTTP
/HTTPS
.netloc
-net
means network andloc
means location; so it means URLs’ network location.path
- A specific pathway a web browser uses to access the provided resources.params
- These are thepath
elements’ parameters.query
- Adheres to thepath
component & the data’s steam that a resource can use.fragment
- It classifies the part.
When we display this object using the print function, it will print its components’ value. The output of the above code fence will be as follows:
ParseResult(scheme='http', netloc='www.google.com', path='/doodles/mothers-day-2021-april-07', params='', query='', fragment='')
You can see from the output that all the URL components are separated and stored as individual elements in the object. We can get the value of any component by using its name like this:
from urllib.parse import urlparse
domain_name = urlparse("http://www.google.com/doodles/mothers-day-2021-april-07").netloc
print(domain_name)
Using the netloc
component, we can get the domain name of the URL as follows:
www.google.com
This way, we can get our URL parsed and use its different components for various purposes in our programming.