How to Parse a Log File in Python
A log file contains information about the events happening during the running of a software system or an application. These events include errors, requests made by the users, bugs, etc. Developers can further scan these details about the use to figure out potential problems with the system, implement newer and better solutions, and improve the overall design. Log files can reveal a lot about the system’s security, which helps developers improve the system or the application.
Generally, entries inside a log file have a format or a pattern. For example, a software system can have a format that prints three things: timestamp, log message, and message type. These formats can have any amount of information structured in a well-formatted text for readability and management purposes.
To perform analysis over these log files, one can consider any programming language. But this article will specifically talk about how one can parse such log files using Python. Nevertheless, the theory behind the process remains the same for all programming languages. One can easily translate the Python code to any other programming language to perform the required task.
Parse a Log File in Python
As mentioned above, entries inside a log file have a specific format. This means we can leverage this format to parse the information written inside a log file line by line. Let us try and understand this using an example.
Consider the following log format that is being used for a web application. It has four significant details, namely, the date and time or the timestamp (yyyy-mm-dd hh:mm:ss
format), the URL
accessed, the type of log message (success, error, etc.), and the log message.
DateTime | URL | Log - Type | Log
Now, consider a file log.txt
that contains logs in the format mentioned above. The log.txt
file would look something like this.
2021-10-26 10:26:44 | https://website.com/home | SUCCESS | Message
2021-10-26 10:26:54 | https://website.com/about | SUCCESS | Message
2021-10-26 10:27:01 | https://website.com/page | ERROR | Message
2021-10-26 10:27:03 | https://website.com/user/me | SUCCESS | Message
2021-10-26 10:27:04 | https://website.com/settings/ | ERROR | Message
...
The following Python code will read this log file and store the information inside a dictionary. A variable order
stores all the dictionary keys in the same order as that of a single log. Since the log formal has a |
, we can use it to split a log string into elements and further store those however we like.
import json
file_name = "log.txt"
file = open(file_name, "r")
data = []
order = ["date", "url", "type", "message"]
for line in file.readlines():
details = line.split("|")
details = [x.strip() for x in details]
structure = {key: value for key, value in zip(order, details)}
data.append(structure)
for entry in data:
print(json.dumps(entry, indent=4))
Output
{
"date": "2021-10-20 10:26:44",
"url": "https://website.com/home",
"type": "SUCCESS",
"message": "Message",
}
{
"date": "2021-10-20 10:26:54",
"url": "https://website.com/about",
"type": "SUCCESS",
"message": "Message",
}
{
"date": "2021-10-20 10:27:01",
"url": "https://website.com/page",
"type": "ERROR",
"message": "Message",
}
{
"date": "2021-10-20 10:27:03",
"url": "https://website.com/user/me",
"type": "SUCCESS",
"message": "Message",
}
{
"date": "2021-10-20 10:27:04",
"url": "https://website.com/settings/",
"type": "ERROR",
"message": "Message",
}
Once the information is read, we can perform any further operation over it. We can store it inside a database for future analysis, import NumPy
and Matplotlib
and plot some graphs to understand the information in a graphical manner. Filter the logs with ERROR
tags and scan through the errors faced by the users, or watch out for some suspicious activity or security breaches, for example, spamming or unauthorized accesses. The opportunities are endless, and it depends on what the developers or data scientists are trying the learn from the obtained data.