Python open xml as text

Reading and Writing XML Files in Python

Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write XML files in python.

Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a xml file we are referring to parsing the XML document.

In this article, we would take a look at two libraries that could be used for the purpose of xml parsing. They are:

Using BeautifulSoup alongside with lxml parser

For the purpose of reading and writing the xml file we would be using a Python library named BeautifulSoup. In order to install the library, type the following command into the terminal.

pip install beautifulsoup4

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser (used for parsing XML/HTML documents). lxml could be installed by running the following command in the command processor of your Operating system:

Читайте также:  Php check permissions file

Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it.

Reading Data From an XML File

There are two steps required to parse a xml file:-

XML File used:

Python3

Writing an XML File

Writing a xml file is a primitive process, reason for that being the fact that xml files aren’t encoded in a special way. Modifying sections of a xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned xml document.

Python3

Using Elementtree

Elementtree module provides us with a plethora of tools for manipulating XML files. The best part about it being its inclusion in the standard Python’s built-in library. Therefore, one does not have to install any external modules for the purpose. Due to the xmlformat being an inherently hierarchical data format, it is a lot easier to represent it by a tree. The module provides ElementTree provides methods to represent whole XML document as a single tree.

In the later examples, we would take a look at discrete methods to read and write data to and from XML files.

Reading XML Files

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree.parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot(). Then displayed (printed) the root tag of our xml file (non-explicit way). Then displayed the attributes of the sub-tag of our parent tag using root[0].attrib. root[0] for the first tag of parent root and attrib for getting it’s attributes. Then we displayed the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.

Python3

Writing XML Files

Now, we would take a look at some methods which could be used to write data on an xml document. In this example we would create a xml file from scratch.

To do the same, firstly, we create a root (parent) tag under the name of chess using the command ET.Element(‘chess’). All the tags would fall underneath this tag, i.e. once a root tag has been defined, other sub-elements could be created underneath it. Then we created a subtag/subelement named Opening inside the chess tag using the command ET.SubElement(). Then we created two more subtags which are underneath the tag Opening named E4 and D4. Then we added attributes to the E4 and D4 tags using set() which is a method found inside SubElement(), which is used to define attributes to a tag. Then we added text between the E4 and D4 tags using the attribute text found inside the SubElement function. In the end we converted the datatype of the contents we were creating from ‘xml.etree.ElementTree.Element’ to bytes object, using the command ET.tostring() (even though the function name is tostring() in certain implementations it converts the datatype to `bytes` rather than `str`). Finally, we flushed the data to a file named gameofsquares.xml which is a opened in `wb` mode to allow writing binary data to it. In the end, we saved the data to our file.

Источник

XML Files Handling

XML Files Handling in Python

XML files in Python allow for the manipulation and parsing of XML data. XML (Extensible Markup Language) is a widely used data interchange format.

Open XML File and Read Data with Python

To read data from an XML file with Python, you can use in-built XML parser module. The most commonly used libraries for parsing XML files are lxml and ElementTree.

Using lxml library

The lxml library is popular and efficient for parsing XML files. You can install the lxml library by using the pip command.

from lxml import etree root = etree.parse('file.xml') for elem in root.iter(): print(elem.tag, elem.text) 

Using ElementTree

ElementTree is an in-built library that allows parsing of XML files. With ElementTree, come built-in modules that allow parsing and creating of elements. To use the ElementTree library, you need to import it.

import xml.etree.ElementTree as ET tree = ET.parse('file.xml') root = tree.getroot() for elem in root: print(elem.tag, elem.text) 

By using either of these methods, you can read XML files efficiently.

How to Write XML

To write XML in Python, you can use the ElementTree XML API library. Here are two code examples to demonstrate how to create and write XML:

Example 1: Creating and Writing XML in Python

import xml.etree.cElementTree as ET ### Create XML element tree root = ET.Element("Person") name = ET.SubElement(root, "Name") name.text = "John" age = ET.SubElement(root, "Age") age.text = "30" ### Write XML element tree to file tree = ET.ElementTree(root) tree.write("person.xml") 

Example 2: Creating and Writing XML with Attributes

import xml.etree.cElementTree as ET ### Create XML element tree with attributes root = ET.Element("Person", ) name = ET.SubElement(root, "Name") name.text = "Jane" age = ET.SubElement(root, "Age") age.text = "25" ### Write XML element tree to file with custom formatting tree = ET.ElementTree(root) tree.write("person.xml", encoding="utf-8", xml_declaration=True) 

In both examples, the ElementTree() class is used to create an XML element tree . The write() method is then used to write the element tree to an XML file. By specifying encoding and xml_declaration in the second example, a custom-formatted XML file is created with an XML declaration at the top.

How to Convert XML to JSON

Converting XML to JSON is a common task that can be achieved easily.

The xmltodict module allows us to convert an XML document into a dictionary, which can then be easily converted into JSON using the built-in json module. Below is an example code snippet that demonstrates how to use this approach:

import xmltodict import json # Load XML file with open('data.xml') as xml_file: xml_data = xml_file.read() # Convert XML to Python dictionary dict_data = xmltodict.parse(xml_data) # Convert dictionary to JSON json_data = json.dumps(dict_data) # Output JSON data print(json_data) 

The xml.etree.ElementTree module allows us to parse the XML document and create an Element object, which can be traversed to get the required data. Once we have the data as a dictionary, we can use the json module to convert it to JSON. Here is an example code snippet that demonstrates how to use this approach:

import xml.etree.ElementTree as ET import json # Load XML file tree = ET.parse('data.xml') root = tree.getroot() # Traverse the Element object to get required data xml_dict = <> for child in root: xml_dict[child.tag] = child.text # Convert dictionary to JSON json_data = json.dumps(xml_dict) # Output JSON data print(json_data) 

How to Convert XML to CSV

To convert XML to CSV, you can use the xml.etree.ElementTree module and the csv module. Here are two code examples to help you get started:

Example 1: Using ElementTree and CSV modules

import csv import xml.etree.ElementTree as ET ### Open the XML file tree = ET.parse('example.xml') root = tree.getroot() ### Open the CSV file csv_file = open('example.csv', 'w') csvwriter = csv.writer(csv_file) ### Write the column headers header = [] for child in root[0]: header.append(child.tag) csvwriter.writerow(header) ### Write the data rows for item in root.findall('.//item'): row = [] for child in item: row.append(child.text) csvwriter.writerow(row) ### Close the CSV file csv_file.close() 

Example 2: Using pandas

import pandas as pd import xml.etree.ElementTree as ET ### Load the XML file into a dataframe tree = ET.parse('example.xml') root = tree.getroot() dfcols = ['name', 'email', 'phone'] df = pd.DataFrame(columns=dfcols) for node in root: name = node.find('name').text email = node.find('email').text phone = node.find('phone').text df = df.append( pd.Series([name, email, phone], index=dfcols), ignore_index=True) ### Save the dataframe to a CSV file df.to_csv('example.csv', index=False) 

In both of these examples, the xml.etree.ElementTree module is used to parse the XML file and extract the data. The csv module (in Example 1) or the pandas library (in Example 2) is used to write the data to a CSV file.

Источник

Оцените статью