Python Address Parser
This article will show you how to parse addresses using Python. We will use the pyparsing
library to manually parse the address and use the functions or pyparsing
for addresses in the CSV file.
We’ll start with a simple example and then move on to a complex one.
Parse Address Using Python Library PyParsing
It is widely acknowledged that the Python programming language’s pyparsing
module is an invaluable tool for performing operations on text data.
The pyparsing
package, used for parsing and modifying text data, simplifies working with addresses. This is because the module can convert and help in parsing addresses.
In this article, we will discuss the usage of the PyParsing
module for handling parsing as well as modifications. Let’s look at a real example of parsing an address using the PyParsing
module.
After that, we will look at a more extensive example to demonstrate how PyParsing
can be used to alter and parse address data.
Simple Address Parsing Using PyParsing
Let’s start by looking at a basic example of parsing an address with the help of the Python library PyParsing
. As a first example, let’s look at the following address and parse it.
567 Main Street
Follow these steps to parse this address:
-
Import
pyparsing
libraryFirst, we will import the
pyparsing
library with all its modules and functions by mentioning*
.from pyparsing import *
-
Create a variable
Now we will create a variable and assign it to the address we want to parse.
address = "567 Main Street"
-
Break down
Now we will break down the address parts by mentioning
nums
andalphas
.addressParser = Word(nums) + Word(alphas) + Word(alphas)
-
Now we will create a variable and call
parseString
from the librarypyparsing
.addressParts = addressParser.parseString(address)
-
Print
Finally, we will print the variable and see the result.
print(addressParts)
Let’s write the entire code and run it to see the result.
from pyparsing import *
address = "123 Main Street FL"
addressParser = Word(nums) + Word(alphas) + Word(alphas) + Word(alphas)
addressParts = addressParser.parseString(address)
print(addressParts)
Output:
['123', 'Main', 'Street', 'FL']
This code will parse the address into four parts: the street number, the street name, the street type, and the state of the address.
The street number will be the first part, the street name will be the second part, the street type will be the third part, and the state will be the last part.
Four Useful Functions of PyParsing
We can use one of four available functions to do the actual parsing.
ParseString
- WithparseString
, you can start parsing text from the beginning without worrying about unnecessary content at the end.ScanString
-ScanString
searches the input string for matching words, somewhat likere.finditer()
.SearchString
-SearchString
is similar toscanString
, except instead of returning a single token, it provides a collection of them.TransformString
-TransformString
is similar toscanString
but allows you to substitute tokens with others of your choosing.
Parse Address From CSV File Using PyParsing
in Python
The addressing information is a specific piece of data frequently recorded in CSV files. Because there is a great deal of difference in how they are structured, they might be hard to parse.
The pyparsing
module simplifies extracting addresses from CSV files using a defined structure. To begin, let’s define a few straightforward guidelines and functions for how to parse an address correctly.
After that, we will apply these principles to parsing address-containing CSV files.
Assume our configuration file or address’s CSV file looks something like this:
city=LAUDERDALE, state=FL, Zipcode: 33316
We will have to parse the string in key=value
format. A KEY=VALUE
string has three parts: the key, the equals sign, and the value.
Including the equals sign in the final output of a parse of such an expression is unnecessary. It is possible to prevent a token from being included in the output using the Suppress()
method.
Token names can be provided by the setResultsName()
function or by calling the parser with the name as an argument when the parser is constructed, making it slightly more straightforward to retrieve specific tokens. Tokens should preferably have names associated with them.
Let’s try the code and see how pyparsing
works with CSV files.
We will start with importing the pyparsing
library with all its functions and modules.
from pyparsing import *
Secondly, we will create a variable for the key
part of the input for output. We will mention alphanums
because the data set of addresses can contain alphabets and numbers.
key = Word(alphanums)("key")
We want to remove the =
sign from our output in the CSV file. We will use the Suppress
function.
equals = Suppress("=")
Now, we will make a variable for the value
part. And again, we will mention alphanums
because the data set of addresses can contain alphabets and numbers.
value = Word(alphanums)("value")
Now, we will create another variable to concatenate the variables.
keyValueExpression = key + equals + value
Now we will open our CSV file of address using file formatting. And use the file.read
function to read every data in the file.
with open("/address.csv") as address_file:
address_file = address_file.read()
After this, we will use a for
loop with the scanString
function or pyparsing
to read each line of the address one by one.
for adrs in keyValueExpression.scanString(address_file):
result = adrs[0]
And lastly, we will use the print
function to see the result.
print("{0} is {1}".format(result.key, result.value))
Here our code ends, and now we will write the entire code to run it. And see what output we will get when we provide a CSV file with the address.
# import library
from pyparsing import *
key = Word(alphanums)("key")
# delet = from the output
equals = Suppress("=")
value = Word(alphanums)("value")
keyValueExpression = key + equals + value
# use file formating to open csv file
with open("/content/address.csv") as address_file:
address_file = address_file.read()
# use for loop to read your CSV file
for adrs in keyValueExpression.scanString(address_file):
result = adrs[0]
# print the output
print("{0} is {1}".format(result.key, result.value))
Output:
city is LAUDERDALE
state is FL
The output of the code shows the data our file contains. In the address.csv
file, we only had one address.
And you can see the functionality of using the pyparsing
library as the address is parsed.
PyParsing
offers a more robust and mature alternative to regular expressions when parsing text into tokens and retrieving or replacing individual tokens.
For example, nested fields are no problem for PyParsing
, but they would be for regular expressions. This parser is more like the old standbys, like lex
and yacc
.
In other words, regular expressions may be used to search for tags and extract data from HTML, but they cannot be used to verify an HTML file. However, pyparsing
would allow you to accomplish this.
We hope you find this article helpful in understanding the address parser used in Python.
My name is Abid Ullah, and I am a software engineer. I love writing articles on programming, and my favorite topics are Python, PHP, JavaScript, and Linux. I tend to provide solutions to people in programming problems through my articles. I believe that I can bring a lot to you with my skills, experience, and qualification in technical writing.
LinkedIn