How to Pretty Print XML Output Pretty in Python
-
Make XML Output Pretty in Python Using
BeautifulSoap
Library -
Make XML Output Pretty in Python Using
lxml
Library
When reading text files, HTML files, XML files, etc., a file’s content is poorly structured and contains inconsistent indentation. This disparity makes it difficult to understand the output. This problem can be solved by beautifying the output of such files. Beautification includes fixing inconsistent indentation, removing random spaces, etc.
In this article, we will learn how to make an XML file output prettier. So that we all are on the same page, the following Python codes will consider this XML file. If you wish to use the same file, all you have to do is copy the XML content into a new file by the name of books.xml
.
Make XML Output Pretty in Python Using BeautifulSoap
Library
BeautifulSoup
is a Python-based library for parsing HTML and XML files. It is generally used for web scraping data out of websites and documents. We can use this library to beautify an XML document’s output.
In case you don’t have this library installed on your machine, use either of the following pip
commands.
pip install beautifulsoup4
pip3 install beautifulsoup4
We have to install two more modules to work with this library: html5lib
and lxml
. The html5lib
library is a Python-based library to parse HTML or Hypertext Markup Language documents. And, the lxml
library is a collection of utilities to work with XML files. Use the following pip
commands to install these libraries on your machine.
pip install html5lib
pip install lxml
Refer to the following Python code to understand the usage.
from bs4 import BeautifulSoup
filename = "./books.xml"
bs = BeautifulSoup(open(filename), "xml")
xml_content = bs.prettify()
print(xml_content)
Output:
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book id="bk101">
<author>
Gambardella, Matthew
</author>
<title>
XML Developer's Guide
</title>
<genre>
Computer
</genre>
<price>
44.95
</price>
<publish_date>
2000-10-01
</publish_date>
<description>
An in-depth look at creating applications
with XML.
</description>
</book>
...
<book id="bk111">
<author>
O'Brien, Tim
</author>
<title>
MSXML3: A Comprehensive Guide
</title>
<genre>
Computer
</genre>
<price>
36.95
</price>
<publish_date>
2000-12-01
</publish_date>
<description>
The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.
</description>
</book>
<book id="bk112">
<author>
Galos, Mike
</author>
<title>
Visual Studio 7: A Comprehensive Guide
</title>
<genre>
Computer
</genre>
<price>
49.95
</price>
<publish_date>
2001-04-16
</publish_date>
<description>
Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.
</description>
</book>
</catalog>
The above code considers that the books.xml
file is in the current working directory. Or, we can also set the full path to the file in the filename
variable. The prettify()
function beautifies the output of the XML file. Note that the output has been trimmed for understanding purposes.
Make XML Output Pretty in Python Using lxml
Library
We can use the lxml
library alone to beautify an XML file’s output. Refer to the following Python code for the same.
from lxml import etree
filename = "./books.xml"
f = etree.parse(filename)
content = etree.tostring(f, pretty_print=True, encoding=str)
print(content)
Output:
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book id="bk101">
<author>
Gambardella, Matthew
</author>
<title>
XML Developer's Guide
</title>
<genre>
Computer
</genre>
<price>
44.95
</price>
<publish_date>
2000-10-01
</publish_date>
<description>
An in-depth look at creating applications
with XML.
</description>
</book>
...
<book id="bk111">
<author>
O'Brien, Tim
</author>
<title>
MSXML3: A Comprehensive Guide
</title>
<genre>
Computer
</genre>
<price>
36.95
</price>
<publish_date>
2000-12-01
</publish_date>
<description>
The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.
</description>
</book>
<book id="bk112">
<author>
Galos, Mike
</author>
<title>
Visual Studio 7: A Comprehensive Guide
</title>
<genre>
Computer
</genre>
<price>
49.95
</price>
<publish_date>
2001-04-16
</publish_date>
<description>
Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.
</description>
</book>
</catalog>
Take a note of the pretty_print = True
argument. This flag makes the output prettier.