How to Parse XML in Bash
Finding any developer who still doesn’t work with XML is almost impossible. It’s a popular markup language widely used to structure and transfer data.
This article will show how we can parse XML through Bash.
We are going to talk about two libraries here. Our first library is xmllint
, and the second is known as XMLStarlet
.
You need to install them before working with them.
Use xmllint
to Parse XML in Bash
This is the most common library that can be used to parse the XML file. But you have to download and install the library before using it.
To install this library, you need to execute the below commands.
sudo apt-get update -qq
sudo apt-get install -y libxml2-utils
You must install the libxml2-utils
package with the apt-get
.
If you have an XML file named MyXML.xml
, you can easily fetch the XML by using the below command.
xmllint MyXML.xml
After executing the above command, you will get an output like the below.
<?xml version="1.0"?>
<specification>
<type>Laptop</type>
<model>Macbook</model>
<screenSizeInch>14</screenSizeInch>
</specification>
This library contains some options or flags. The available options for the library are shared below.
--auto
- This flag is for generating a document for testing.--catalogs
- This flag is for using the catalogs fromSGML_CATALOG_FILES
. Otherwise,/etc/xml/catalog
is used by default.--chkregister
- This flag is for turning on node registration.--compress
- This flag is for turning ongzip
compression of output.--copy
- This flag is for testing the internal copy implementation.--c14n
- This flag is for using the W3C XML Canonicalization (C14N) that serializes the result of parsing throughstdout
. It also keeps comments in the result.--dtdvalid URL
- This flag is for using the DTD specified by the URL for validation.--dtdvalidfpi FPI
- This flag is for using the DTD that a Public Identifier FPI for validation specifies; please note that this flag will require a catalog exporting that works as a Public Identifier to work.--debug
- This flag is for parsing a file. It also outputs an annotated tree that is the in-memory version of the document.--debugent
- This flag is for debugging the entities defined in the document.--dropdtd
- This flag is for removing DTD from the output.--dtdattr
- This flag will fetch external DTD. It also populates the tree with Inherited Attributes.--encode
- This flag will provide output in the given encoding.--format
- This flag will reformat and reindent the output.--help
- This flag will print out a summary of the usage forxmllint
.--html
- This flag is for using the HTML parser.--htmlout
- This flag will show the result as an HTML file. It will output the necessary HTML tags surrounding the result tree output so that the results can be displayed/viewed in a browser.--insert
- This flag is for testing valid insertions.--loaddtd
- This flag is for fetching the external DTD.--load-trace
- This flag will display all the documents loaded when processing tostderr
.--maxmem NNBYTES
- This flag is for testing the parser memory support. Here, theNNBYTES
is the maximum number of bytes that the library can allocate.--memory
- This flag is for parsing from memory.--noblanks
- This flag will drop ignorable blank spaces.--nocatalogs
- This flag specifies not to use any catalogs.--nocdata
- This flag will substitute theCDATA
section through equivalent text nodes.--noent
- This flag will substitute entity values for entity references.--nonet
- This flag specifies not to use the internet to fetch DTDs or entities.--noout
- This flag will suppress the output.xmllint
will show the output of the result tree by default.--nowarning
- This flag specifies not to emit warnings from the validator and/or parser.--nowrap
- This flag specifies not to output HTML doc wrapper.--noxincludenod
- This flag is to doXInclude
processing but specifies not to generate theXInclude
start and end nodes.--nsclean
- This flag is to remove redundant namespace declarations.--output FILE
- This flag defines a file path wherexmllint
saves the result of parsing.--path "PATH(S)"
- This flag is to use the (colon-separated or space-separated) list ofFilesystem
paths that are specified byPATHS
for loading DTDs or entities. Here, space-separated lists are enclosed by quotation marks.--pattern PATTERNVALUE
- This flag is for exercising the pattern recognition engine that can be used with a reader interface. It is also used for debugging.--postvalid
- This flag is for validating after parsing is completed.--push
- This flag enables thepush
mode.--recover
- This flag is for outputting any parsable portions of the invalid document.--relaxng SCHEMA
- This flag will use aRelaxNG
file namedSCHEMA
for validation.--repeat
- This flag is for repeating 100 times for timing or profiling.--schema
- This flag will use the W3C XML Schema file known asSCHEMA
.--shell
- Run a navigating shell.--stream
- This flag is for streaming the API.--testIO
- This flag will test the user input/output support.--timing
- This flag will output information about the time thexmllint
takes to perform the various steps.--valid
- This flag will check the document’s validity.--version
- This flag will display the version of the library.--walker
- This flag will test thewalker
module--xinclude
- This flag will doXInclude
processing.--xmlout
- This flag is mainly used in conjunction with--html
. It will save the document with the XML serializer. It is mainly used to convert from HTML to XHTML.
Use XMLStarlet
to Parse XML in Bash
Another popular library for parsing any XML document is known as XMLStarlet
. The primary command of the library is xmlstarlet
.
You must execute the below command as a root to install this library.
sudo dnf install xmlstarlet
It contains useful options that make validating, transforming, or querying XML files easier. You can easily fetch an XML file through the most simple command of the library.
xmlstarlet format MyXML.xml
After executing the above command, you will see the contents of the XML file as an output like the below.
<?xml version="1.0"?>
<specification>
<type>Laptop</type>
<model>Macbook</model>
<screenSizeInch>14</screenSizeInch>
</specification>
All the codes used in this article are written in Bash. It will only work in the Linux Shell environment.
Aminul Is an Expert Technical Writer and Full-Stack Developer. He has hands-on working experience on numerous Developer Platforms and SAAS startups. He is highly skilled in numerous Programming languages and Frameworks. He can write professional technical articles like Reviews, Programming, Documentation, SOP, User manual, Whitepaper, etc.
LinkedIn