Python validate xml by xsd

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

XML Schema validator and data conversion library for Python

License

sissaschool/xmlschema

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.rst

The xmlschema library is an implementation of XML Schema for Python (supports Python 3.7+).

This library arises from the needs of a solid Python layer for processing XML Schema based files for MaX (Materials design at the Exascale) European project. A significant problem is the encoding and the decoding of the XML data files produced by different simulation software. Another important requirement is the XML data validation, in order to put the produced data under control. The lack of a suitable alternative for Python in the schema-based decoding of XML data has led to build this library. Obviously this library can be useful for other cases related to XML Schema based processing, not only for the original scope.

This library includes the following features:

  • Full XSD 1.0 and XSD 1.1 support
  • Building of XML schema objects from XSD files
  • Validation of XML instances against XSD schemas
  • Decoding of XML data into Python data and to JSON
  • Encoding of Python data and JSON to XML
  • Data decoding and encoding ruled by converter classes
  • An XPath based API for finding schema’s elements and attributes
  • Support of XSD validation modes strict/lax/skip
  • XML attacks protection using an XMLParser that forbids entities
  • Access control on resources addressed by an URL or filesystem path
  • XML data bindings based on DataElement class
  • Static code generation with Jinja2 templates

You can install the library with pip in a Python 3.7+ environment:

The library uses the Python’s ElementTree XML library and requires elementpath additional package. The base schemas of the XSD standards are included in the package for working offline and to speed-up the building of schema instances.

Import the library and then create a schema instance using the path of the file containing the schema as argument:

>>> import xmlschema >>> my_schema = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')

For XSD 1.1 schemas use the class XMLSchema11, because the default class XMLSchema is an alias of the XSD 1.0 validator class XMLSchema10.

The schema can be used to validate XML documents:

>>> my_schema.is_valid('tests/test_cases/examples/vehicles/vehicles.xml') True >>> my_schema.is_valid('tests/test_cases/examples/vehicles/vehicles-1_error.xml') False >>> my_schema.validate('tests/test_cases/examples/vehicles/vehicles-1_error.xml') Traceback (most recent call last): File "", line 1, in File "/home/brunato/Development/projects/xmlschema/xmlschema/validators/xsdbase.py", line 393, in validate raise error xmlschema.validators.exceptions.XMLSchemaValidationError: failed validating cars' at 0x7f8032768458> with XsdGroup(model='sequence'). Reason: character data between child elements not allowed! Schema:   Instance: NOT ALLOWED CHARACTER DATA  

Using a schema you can also decode the XML documents to nested dictionaries, with values that match to the data types declared by the schema:

>>> import xmlschema >>> from pprint import pprint >>> xs = xmlschema.XMLSchema('tests/test_cases/examples/collection/collection.xsd') >>> pprint(xs.to_dict('tests/test_cases/examples/collection/collection.xml')) , 'estimation': Decimal('10000.00'), 'position': 1, 'title': 'The Umbrellas', 'year': '1886'>, , 'position': 2, 'title': None, 'year': '1925'>]>

Davide Brunato and others who have contributed with code or with sample cases.

This software is distributed under the terms of the MIT License. See the file ‘LICENSE’ in the root directory of the present distribution, or http://opensource.org/licenses/MIT.

About

XML Schema validator and data conversion library for Python

Источник

Validating XML using lxml in Python

Often when working with XML documents, it’s required that we validate our document with a predefined schema. These schemas usually come in XSD (XML Schema Definition) files and while there are commercial and open source applications that can do these validations, it’s more flexible and a good learning experience to do it using Python.

Prerequisites

You need Python installed obviously (I’ll be using Python 3, but the codes should work in Python 2 with minimal modifications). You’ll also need the lxml package to handle schema validations. You can install it using pip:

Importing and using lxml

For XML schema validation, we need the etree module from the lxml package. Let’s also import StringIO from the io package for passing strings as files to etree , as well as sys for handling input.

from lxml import etree from io import StringIO import sys

I prefer giving file names as command line arguments to the python file as it simplifies the handling:

filename_xml = sys.argv[1] filename_xsd = sys.argv[2]

Let’s open and read both files:

# open and read schema file with open(filename_xsd, 'r') as schema_file: schema_to_check = schema_file.read() # open and read xml file with open(filename_xml, 'r') as xml_file: xml_to_check = xml_file.read()

Parsing XML and XSD files

We can parse the XML files/schemas using the etree.parse() method, and we can load the schema to memory using etree.XMLSchema() . As schemas usually arrive well-formed and correctly formatted, I skipped error checking here for the schema parsing.

xmlschema_doc = etree.parse(StringIO(schema_to_check)) xmlschema = etree.XMLSchema(xmlschema_doc)

Next is the parsing of the actual XML document. I usually do error checking here to catch syntax errors and not well-formed XML documents. lxml throws and etree.XMLSyntaxError exception if it finds errors in the XML document and provides an error_log in the exception. We can write this to a file check the incorrect lines and tags:

# parse xml try: doc = etree.parse(StringIO(xml_to_check)) print('XML well formed, syntax ok.') # check for file IO error except IOError: print('Invalid File') # check for XML syntax errors except etree.XMLSyntaxError as err: print('XML Syntax Error, see error_syntax.log') with open('error_syntax.log', 'w') as error_log_file: error_log_file.write(str(err.error_log)) quit() except: print('Unknown error, exiting.') quit()

Validating with Schema

At the final step we can validate our XML document against the XSD schema using assertValid method from etree.XMLSchema . This method will get our parsed XML file (in variable doc above) and try to validate it using the schema definitions. It throws an etree.DocumentInvalid exception with an error_log object as above. We can also write this to a file to check any invalid tags or values.

# validate against schema try: xmlschema.assertValid(doc) print('XML valid, schema validation ok.') except etree.DocumentInvalid as err: print('Schema validation error, see error_schema.log') with open('error_schema.log', 'w') as error_log_file: error_log_file.write(str(err.error_log)) quit() except: print('Unknown error, exiting.') quit()

You can save this script (i.e. as ‘validation.py’) and use it with:

Any errors will be written to ‘error_syntax.log’ and ‘error_schema.log’ files (in the same directory as your .py file) with timestamps, line number and detailed explanation of validation errors. You can check and correct your XML documents before validating using this script again.

lxml is quite an extensive and flexible package to handle and process XML and related files. Check the sources below for tutorials, references and more information.

Sources

John Otander

Emre is a part-time MBA Big Data & Business Analytics student at UvA and a full-time business intelligence specialist.
He’s on a journey to become a better data scientist.

Источник

xmlschema 2.3.1

The xmlschema library is an implementation of XML Schema for Python (supports Python 3.7+).

This library arises from the needs of a solid Python layer for processing XML Schema based files for MaX (Materials design at the Exascale) European project. A significant problem is the encoding and the decoding of the XML data files produced by different simulation software. Another important requirement is the XML data validation, in order to put the produced data under control. The lack of a suitable alternative for Python in the schema-based decoding of XML data has led to build this library. Obviously this library can be useful for other cases related to XML Schema based processing, not only for the original scope.

Features

This library includes the following features:

  • Full XSD 1.0 and XSD 1.1 support
  • Building of XML schema objects from XSD files
  • Validation of XML instances against XSD schemas
  • Decoding of XML data into Python data and to JSON
  • Encoding of Python data and JSON to XML
  • Data decoding and encoding ruled by converter classes
  • An XPath based API for finding schema’s elements and attributes
  • Support of XSD validation modes strict/lax/skip
  • XML attacks protection using an XMLParser that forbids entities
  • Access control on resources addressed by an URL or filesystem path
  • XML data bindings based on DataElement class
  • Static code generation with Jinja2 templates

Installation

You can install the library with pip in a Python 3.7+ environment:

The library uses the Python’s ElementTree XML library and requires elementpath additional package. The base schemas of the XSD standards are included in the package for working offline and to speed-up the building of schema instances.

Usage

Import the library and then create a schema instance using the path of the file containing the schema as argument:

For XSD 1.1 schemas use the class XMLSchema11 , because the default class XMLSchema is an alias of the XSD 1.0 validator class XMLSchema10 .

The schema can be used to validate XML documents:

    File cars' at 0x7f8032768458> with XsdGroup(model='sequence').    NOT ALLOWED CHARACTER DATA  

Using a schema you can also decode the XML documents to nested dictionaries, with values that match to the data types declared by the schema:

 , 'estimation': Decimal('10000.00'), 'position': 1, 'title': 'The Umbrellas', 'year': '1886'>, , 'position': 2, 'title': None, 'year': '1925'>]>

Authors

Davide Brunato and others who have contributed with code or with sample cases.

Источник

Читайте также:  Java size of double
Оцените статью