How to Extract Domain From URL in Python

Naila Saad Siddiqui Mar 11, 2025 Python Python URL

Method 1: Using urllib.parse
Method 2: Using Regular Expressions
Method 3: Using tldextract
Conclusion
FAQ

How to Extract Domain From URL in Python

Extracting the domain from a URL is a common task in web development and data analysis. Whether you’re building a web scraper, analyzing user data, or simply working with URLs, knowing how to extract the domain efficiently can save you time and effort.

In this tutorial, we will explore different methods to parse and extract the domain from a URL using Python. We’ll cover built-in libraries, regular expressions, and third-party packages, giving you a well-rounded understanding of how to tackle this task. By the end of this guide, you’ll be equipped with the knowledge to extract domains from URLs effortlessly. Let’s dive in!

Method 1: Using urllib.parse

One of the simplest and most effective ways to extract the domain from a URL in Python is by using the built-in urllib.parse module. This module provides a straightforward way to break down a URL into its components, including the scheme, netloc, path, and more.

Here’s how you can use urllib.parse to extract the domain:

from urllib.parse import urlparse

url = 'https://www.example.com/path/to/resource'
parsed_url = urlparse(url)
domain = parsed_url.netloc

print(domain)

Output:

www.example.com

In this example, we first import the urlparse function from the urllib.parse module. We then define a sample URL and pass it to urlparse, which breaks the URL into its components. The netloc attribute of the parsed URL object contains the domain, which we then print. This method is reliable and works for various URL formats, making it a go-to choice for many developers.

Method 2: Using Regular Expressions

Regular expressions (regex) offer a powerful way to extract specific patterns from strings, including domains from URLs. While this method may seem complex at first, it can be very effective for custom URL formats or when you need more control over the extraction process.

Here’s a simple example of how to use regex to extract the domain:

import re

url = 'https://www.example.com/path/to/resource'
pattern = r'^(?:http[s]?://)?(?:www\.)?([^/]+)'
match = re.match(pattern, url)

if match:
    domain = match.group(1)
    print(domain)

Output:

example.com

In this code snippet, we first import the re module for regular expressions. We define a URL and a regex pattern that matches the domain part of the URL. The pattern accounts for optional “http://” or “https://” prefixes and the “www.” subdomain. The re.match function checks the URL against the pattern, and if a match is found, we extract the domain using match.group(1). This method is versatile, allowing you to customize the regex pattern to fit specific needs.

Method 3: Using tldextract

For more advanced domain extraction, especially when dealing with country code top-level domains (ccTLDs) or complex domain structures, the tldextract library is an excellent choice. This third-party package accurately separates the subdomain, domain, and suffix, making it ideal for comprehensive URL analysis.

To use tldextract, you first need to install it:

pip install tldextract

Once installed, you can use it as follows:

import tldextract

url = 'https://subdomain.example.co.uk/path/to/resource'
extracted = tldextract.extract(url)
domain = f"{extracted.domain}.{extracted.suffix}"

print(domain)

Output:

example.co.uk

In this example, we import tldextract and define a URL that includes a subdomain and a ccTLD. The extract function breaks the URL into its components, allowing us to easily construct the full domain by combining the domain and suffix. This method is particularly useful for developers who need to manage various domain structures and ensure accurate extraction.

Conclusion

Extracting the domain from a URL in Python can be accomplished in several ways, each with its advantages. Whether you prefer using built-in libraries like urllib.parse, leveraging the power of regular expressions, or utilizing third-party packages like tldextract, understanding these methods will enhance your programming toolkit. By mastering these techniques, you can efficiently handle URLs in your projects, making your code cleaner and more effective. Happy coding!

FAQ

How do I install tldextract?
You can install tldextract using pip with the command pip install tldextract.
Can I extract domains from URLs with different structures?
Yes, using methods like regular expressions or tldextract allows you to handle various URL formats effectively.
Is urllib.parse sufficient for all URL parsing tasks?
While urllib.parse is powerful for basic URL parsing, tldextract is better for complex domains and ccTLDs.
What are the advantages of using regex for domain extraction?
Regex provides flexibility and allows for custom patterns, making it suitable for specialized URL formats.
Can I extract subdomains along with the main domain?
Yes, both urllib.parse and tldextract can be used to extract subdomains if needed.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Method 1: Using urllib.parse

Method 2: Using Regular Expressions

Method 3: Using tldextract

Conclusion

FAQ

Related Article - Python URL