Writing xml files in python

Read and Write XML Files with Python

XML is a commonly used data exchange format in many applications and systems though JSON became more popular nowadays. Compared with JSON, XML supports schema (XSD) validation and can be easily transformed other formats using XSLT. XHTML is also a strict version of HTML that is XML based and used in many websites.

This article provides some examples of using Python to read and write XML files.

Example XML file

Create a sample XML file named test.xml with the following content:




1
Record 1

2
Record 2

3
Record 3

Use XML DOM model

As many other programming languages, XML DOM is commonly used to parse and to manipulate XML files.

More than decades ago when I started coding with C#, XmlDocument was a commonly used class to manipulate XML data. In Python, there are also DOM model implementations like package minidom even though it is a minimal implementation of the Document Object Model interface.

Read XML document

The following code snippet reads the attributes from the document. It first creates a DOM object and then finds all the record elements from document root element. For each element, is parsed as a dictionary record. When parsing record element, several objects are used: Attr, Text and Element. All these elements are inherited from Node class.

from xml.dom.minidom import parse import os import pandas as pd dir_path = os.path.dirname(os.path.realpath(__file__)) data_records = [] with parse(f'/test.xml') as xml_doc: root = xml_doc.documentElement records = root.getElementsByTagName('record') for r in records: data = <> id_node = r.getAttributeNode('id') data['id'] = id_node.value el_rid = r.getElementsByTagName('rid')[0] data['rid'] = el_rid.firstChild.data el_name = r.getElementsByTagName('name')[0] data['name'] = el_name.firstChild.data data_records.append(data) print(data_records) df = pd.DataFrame(data_records) print(df)
[, , ] id rid name 0 1 1 Record 1 1 2 2 Record 2 2 3 3 Record 3

Write XML document

We can also use DOM object to write XML data in memory to files.

Читайте также:  Add Single-Line Comments in HTML

The following code snippet adds a new attribute for each record element in the previous example and then save the new XML document to a new file.

with open(f"/test_new.xml", "w") as new_xml_handle: with parse(f'/test.xml') as xml_doc: root = xml_doc.documentElement records = root.getElementsByTagName('record') for r in records: r.setAttribute('testAttr', 'New Attribute') xml_doc.writexml(new_xml_handle)
  1 Record 1  2 Record 2  3 Record 3  

As you can see, attribute node testAttr is added for each element.

Use xml.etree.ElementTree

Another approach is to use xml.etree.ElementTree to read and write XML files.

Read XML file using ElementTree

import xml.etree.ElementTree as ET import os import pandas as pd dir_path = os.path.dirname(os.path.realpath(__file__)) data_records = [] tree = ET.parse(f'/test.xml') root = tree.getroot() for r in root.getchildren(): data = <> data['id'] = r.attrib['id'] el_rid = r.find('rid') data['rid'] = el_rid.text el_name = r.find('name') data['name'] = el_name.text data_records.append(data) print(data_records) df = pd.DataFrame(data_records) print(df)

The above scripts first create ElementTree object and then find all ‘record’ elements through the root element. For each ‘record’ element, it parses the attributes and child elements. The APIs are very similar to the minidom one but is easier to use.

The output looks like the following:

[, , ] id rid name 0 1 1 Record 1 1 2 2 Record 2 2 3 3 Record 3

Write XML file using ElementTree

To write XML file we can just call the write function.

for r in root.getchildren(): r.set('testAttr', 'New Attribute2') tree.write(f'/test_new_2.xml', xml_declaration=True)

Again, the API is simpler compared with minidom. The content of the newly generated file text_new_2.xml is like the following:

   1 Record 1  2 Record 2  3 Record 3  

For ElementTree.write function, you can specify many optional arguments, for example, encoding, XML declaration, etc.

warning Warning: The xml.etree.ElementTree module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities.

Summary

There are many other libraries available in Python to allow you to parse and write XML files. For many of these packages, they are not as fluent or complete as Java or .NET equivalent libraries. ElementTree is the closest one I found so far.

References

Subscribe to Kontext Newsletter to get updates about data analytics, programming and cloud related articles. You can unsubscribe at anytime.

Источник

Reading and Writing XML Files in Python

Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write XML files in python.

Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a xml file we are referring to parsing the XML document.

In this article, we would take a look at two libraries that could be used for the purpose of xml parsing. They are:

Using BeautifulSoup alongside with lxml parser

For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal.

pip install beautifulsoup4

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser (used for parsing XML/HTML documents). lxml could be installed by running the following command in the command processor of your Operating system:

Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it.

Reading Data From an XML File

There are two steps required to parse a xml file:-

XML File used:

Python3

Writing an XML File

Writing a xml file is a primitive process, reason for that being the fact that xml files aren’t encoded in a special way. Modifying sections of a xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned xml document.

Python3

Using Elementtree

Elementtree module provides us with a plethora of tools for manipulating XML files. The best part about it being its inclusion in the standard Python’s built-in library. Therefore, one does not have to install any external modules for the purpose. Due to the xmlformat being an inherently hierarchical data format, it is a lot easier to represent it by a tree. The module provides ElementTree provides methods to represent whole XML document as a single tree.

In the later examples, we would take a look at discrete methods to read and write data to and from XML files.

Reading XML Files

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree.parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot(). Then displayed (printed) the root tag of our xml file (non-explicit way). Then displayed the attributes of the sub-tag of our parent tag using root[0].attrib. root[0] for the first tag of parent root and attrib for getting it’s attributes. Then we displayed the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.

Python3

Writing XML Files

Now, we would take a look at some methods which could be used to write data on an xml document. In this example we would create a xml file from scratch.

To do the same, firstly, we create a root (parent) tag under the name of chess using the command ET.Element(‘chess’). All the tags would fall underneath this tag, i.e. once a root tag has been defined, other sub-elements could be created underneath it. Then we created a subtag/subelement named Opening inside the chess tag using the command ET.SubElement(). Then we created two more subtags which are underneath the tag Opening named E4 and D4. Then we added attributes to the E4 and D4 tags using set() which is a method found inside SubElement(), which is used to define attributes to a tag. Then we added text between the E4 and D4 tags using the attribute text found inside the SubElement function. In the end we converted the datatype of the contents we were creating from ‘xml.etree.ElementTree.Element’ to bytes object, using the command ET.tostring() (even though the function name is tostring() in certain implementations it converts the datatype to `bytes` rather than `str`). Finally, we flushed the data to a file named gameofsquares.xml which is a opened in `wb` mode to allow writing binary data to it. In the end, we saved the data to our file.

Источник

Оцените статью