How to Custom Search Engine Using Google API in Python
- Create a Search Engine Using Google CSE Platform
- Implement the Custom Search API in Python
- Conclusion
The article explains creating a Custom Search Engine (CSE) using Google Search API in Python. A CSE is a search engine designed for developers that enables them to incorporate it into any application, including websites, mobile apps, and other things.
For web scraping, many apps use the Google Custom Search Engine. This article will explain how to set up a CSE and use its Google Search API in Python.
Manually scraping Google Search is highly discouraged because the search gets restricted after every few requests.
Create a Search Engine Using Google CSE Platform
Using a Google Search API in Python to get search results is a three-tier process. Unlike web scrapping, which returns results directly from Google search, this method creates a custom search engine and uses it to fetch results.
This helps fetch the same results as scrapping without any thresholds for sending requests.
To create a search engine, look up the programmable search engine
page or click on this link. Give a name for the search engine and add a sample URL inside What to search?
.
Remember that this sample URL can be changed later, and that’s what we will be doing.
Confirm reCAPTCHA and click on Create
to create a custom search engine. This search engine needs to be tweaked to access the entire web.
Click on Customize
on the next page.
Under Basic
, some essential data can be found, like the search engine ID, which will be used to send search requests. Copy the search engine ID and store it.
Scroll down to Search Features
and turn on the Search the entire web
option.
In the Sites to search
section, tick the checkbox of the added URL and delete it. This will make the search engine open to the entire web.
Once the Custom Search Engine is created, it is time to use the Google Search API in Python.
First, we need to get an API for the created search engine.
Get a Google API Key
Google’s Application Programming Interface (API) is a feature of Google Cloud to embed Google services into third-party applications. A Google project needs to be created to get a custom search API key and then use it as a Google search API in Python.
There are two ways to fetch an API key for the custom search engine:
- Create a project in Google Cloud and get a Google Custom Search API.
- Get a JSON API key.
Both steps require a Google Cloud project.
Create a Project in Google Cloud and Get a Google Custom Search API
Head over to the credentials page of Google Cloud. Then, click on New Project
.
Name it and leave the organization box as it is. Then, click on Create
.
After creating the project, we need to attach a custom search API to this project. In the left-hand side panel, select Credentials
and then click on the Create Credentials
button on the top.
Inside Create Credentials
, select the API key
.
Selecting the API key
option will create an API key for the project. Click on the Show key
to copy the API key.
The API key fetched from this method is inactive. It can be manually activated when running the Python script bearing this API key.
The prompt for activation is thrown by Python when the script is run for the first time. After activating the API key, the custom search engine can be used.
Get a JSON API Key
This method is relatively simpler as it does not require activation of the key. The API Key can be directly fetched from this method if a Google Cloud project already exists.
Go to the guide page of the programmable search engine website.
Click on the Get a key
button to open a pop-up asking to choose the project.
Click on the project and select Next
to create an API for the project.
Click on the Show key
to get the API Key.
This JSON API key can be used directly, whereas the API key fetched manually through the Credentials
tab in the Google Cloud needs to be activated.
Implement the Custom Search API in Python
After the CSE ID and the API key is ready, the Google search API in Python can be used inside scripts. There are two programs below that will explain the process.
Example 1:
For the Google Search API in Python to work, we need a Python library to parse the API key. We can use the Google API Python Client.
To install it, go to CMD or any IDE that runs Python and install the Google API Python Client.
Inside CMD, write the command:
pip install google-api-python-client
This will install the Python package into the system.
A Python script needs to be created that will send search queries to the custom search engine and return the result.
Code- custom_search_engine.py
:
from googleapiclient.discovery import build
my_api_key = "The API_KEY you acquired"
my_cse_id = "The search-engine-ID you created"
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
return res["items"]
results = google_search('"How to code in Python"', my_api_key, my_cse_id, num=10)
for result in results:
print(result)
Let’s break down the code to understand what it does. The first line of code imports the build
module from the Python library package google api python client
.
Two object variables, my_api_key
and my_cse_id
, have been created that store the API key and the custom search engine ID, respectively.
A method google_search
is created with four parameters: search_term
, which stores the search query, api_key
for passing the API key, cse_id
for passing the custom search engine’s ID, and lastly, the keyword argument **kwargs
.
The below code creates a variable service
that uses the build
function to create a customsearch
API service that will be fitted to the custom search engine.
service = build("customsearch", "v1", developerKey=api_key)
The next line uses the service.cse()
module to create a client that will send search queries to the custom search engine and store it in the variable rex
.
The list(q=search_term, cx=cse_id, **kwargs)
creates a list of the results fetched from the search term, where **kwargs
is used to put a limit to the number of search terms returned from the client.
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
Lastly, the variable rex
is returned as an array with the search results list.
Finally, a variable result
is created to store the search results. The method google_search
is called with the search query as the first parameter. Then, its API key, CSE ID, and the number of search iterations for the following parameters.
The list returned is stored inside the variable result
. Inside a for
loop, it is printed up to its length.
results = google_search('"How to code in Python"', my_api_key, my_cse_id, num=10)
for result in results:
print(result)
Output:
Example 2:
In this example, we will make a Python script that sends search requests without using any external dependency. This program will use the API key and the CSE ID and create a client that uses the inbuilt Python libraries together with the Google search API in Python.
Code:
import requests
API_KEY = "Your API Key"
SEARCH_ENGINE_ID = "Your CSE ID"
# the search query you want
query = "Starboy"
# using the first page
page = 1
# construct the URL
# doc: https://developers.google.com/custom-search/v1/using_rest
# calculating start, (page=2) => (start=11), (page=3) => (start=21)
start = (page - 1) * 10 + 1
url = f"https://www.googleapis.com/customsearch/v1?key={API_KEY}&cx={SEARCH_ENGINE_ID}&q={query}&start={start}"
# make the API request
data = requests.get(url).json()
# get the result
search_items = data.get("items")
# iterate over 10 results
for i, search_item in enumerate(search_items, start=1):
try:
long_description = search_item["pagemap"]["metatags"][0]["og:description"]
except KeyError:
long_description = "N/A"
# get the title of the page
title = search_item.get("title")
# get the page snippet
snippet = search_item.get("snippet")
# alternatively, you also can get the HTML snippet (bolded keywords)
html_snippet = search_item.get("htmlSnippet")
# extract page url
link = search_item.get("link")
# print results
print("=" * 10, f"Result #{i+start-1}", "=" * 10)
print("Title:", title)
print("Description:", snippet)
print("Long description:", long_description)
print("URL:", link, "\n")
Let’s understand what the above code does.
The first line imports Python HTTP library requests
. The two variables are initialized, API_KEY
and SEARCH_ENGINE_ID
, which store the previously created credentials.
import requests
API_KEY = "Your API Key"
SEARCH_ENGINE_ID = "Your CSE ID"
The variable query
is used to store the search term that the application will look for. The variable page
displays the search result from a particular page, while the variable start
indicates the sequence of results from that page.
For example, every page has 10 search results. If the variable start
has page = 1
, it will show the first 10 search results, meaning the first page, while page = 2
will display search results followed by the 10th result, which means results starting from the 11th.
The variable url
stores the service URL used to get the search results from the custom search engine. It stores the credentials like the API key, the search query, and the page number of search results to be displayed.
query = "Starboy"
page = 1
start = (page - 1) * 10 + 1
url = f"https://www.googleapis.com/customsearch/v1?key={API_KEY}&cx={SEARCH_ENGINE_ID}&q={query}&start={start}"
This program sends an API request using the requests
function to the stored URL and saves the data returned from the API call into the variable data
.
The variable search_items
is used to get the search items. It is put inside a for
loop starting from the first element and running up to its length.
The first element being searched for is the result description, which is put inside an exception handling block.
If the program finds any description, it gets stored inside the variable long_description
. In case nothing is returned, it stores N/A
.
data = requests.get(url).json()
search_items = data.get("items")
for i, search_item in enumerate(search_items, start=1):
try:
long_description = search_item["pagemap"]["metatags"][0]["og:description"]
except KeyError:
long_description = "N/A"
In the below code, the attributes of each search result are stored inside the variable of its name. This process is repeated 10 times for every search result.
title = search_item.get("title")
snippet = search_item.get("snippet")
html_snippet = search_item.get("htmlSnippet")
link = search_item.get("link")
Finally, all the results are printed—the first line prints the result’s number followed by attributes like title, description, etc.
print("=" * 10, f"Result #{i+start-1}", "=" * 10)
print("Title:", title)
print("Description:", snippet)
print("Long description:", long_description)
print("URL:", link, "\n")
The results are printed using Google search API in Python without needing external dependency.
Output:
Conclusion
This article has explained creating a client that sends search queries to a custom search engine using Google search API in Python. The reader would be able to create a custom search engine, fetch API keys, and can easily create Python scripts that send search requests.