How to Get Data From a URL in Python
A URL or a Uniform Resource Locator is a valid and unique web address that points to some resource over the internet. This resource can be a simple text file, a zip file, an exe
file, a video, an image, or a webpage.
In the case of a webpage, the HTML or the Hypertext Markup Language content is fetched. This article will show how to get this HTML or Hypertext Markup Language data from a URL using Python.
Get Data From a URL Using the requests
Module in Python
Python has a requests
module that easily sends HTTP (Hypertext Transfer Protocol) requests. This module can be used to fetch the HTML content or any content from a valid URL.
The requests
module has a get()
method that we can use to fetch data from a URL. This method accepts a url
as an argument and returns a requests.Response
object.
This requests.Response
object contains details about the server’s response to the sent HTTP request. If an invalid URL is passed to this get()
method, the get()
method will throw a ConnectionError
exception.
If you are unsure about the URL’s validity, it is highly recommended to use the try
and except
blocks. Just enclose the get()
method call inside a try
and except
block. This will be depicted in the upcoming example.
Now, let us understand how to use this function to fetch HTML content or any data from a valid URL. Refer to the following code for the same.
To learn more about the requests.Response
object, refer to the official documentation here.
import requests
try:
url = "https://www.lipsum.com/feed/html"
r = requests.get(url)
print("HTML:\n", r.text)
except:
print(
"Invalid URL or some error occured while making the GET request to the specified URL"
)
Output:
HTML:
...
Note that ...
represents the HTML content that was fetched from the URL. The HTML content has not been shown in the output above since it was too big.
If the URL is faulty, the above code will run the code inside the except
block. The following code depicts how it works.
import requests
try:
url = "https://www.thisisafaultyurl.com/faulty/url/"
r = requests.get(url)
print("HTML:\n", r.text)
except:
print(
"Invalid URL or some error occured while making the GET request to the specified URL"
)
Output:
Invalid URL or some error occurred while making the GET request to the specified URL
Some web pages do not allow GET
requests to fetch their content for security purposes. In such cases, we can use the post()
method from the requests
module.
As the name suggests, this method sends POST
requests to a valid URL. This method accepts two arguments, namely, url
, and data
.
The url
is the target URL, and the data
accepts a dictionary of header details in the form of key-value pairs. The header details could be an API or Application Programming Interface key, CSRF or Cross-Site Request Forgery token, etc.
The Python code for such a case would be as follows.
import requests
try:
url = "https://www.thisisaurl.com/that/accepts/post/requests/"
payload = {
"api-key": "my-api-key",
# more key-value pairs
}
r = requests.post(url, data=payload)
print("HTML:\n", r.text)
except:
print(
"Invalid URL or some error occured while making the POST request to the specified URL"
)