Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Schematron: A language for validating XML
Schematron: A language for validating XML
Schematron: A language for validating XML
Ebook457 pages4 hours

Schematron: A language for validating XML

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Schematron is a validation language that checks XML documents against business rules. It extends the validation provided by languages such as Document Type Definitions (DTD), W3C XML Schema, and RELAX NG, giving you the ability to check your XML documents for compliance with rules that can be difficult, if not impossible, to check with the other validation languages.


Schematron: A language for validating XML is aimed at programmers and others who process XML. It explains the language in detail along with many examples. Anyone who uses Schematron or who would like to begin using it will find a wealth of information in this book.

LanguageEnglish
PublisherXML Press
Release dateOct 25, 2022
ISBN9781937434816
Schematron: A language for validating XML
Author

Erik Siegel

Erik Siegel runs Xatapult, a consultancy that offers coaching, training, applications, and more to the publishing world.

Related to Schematron

Related ebooks

Programming For You

View More

Related articles

Reviews for Schematron

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Schematron - Erik Siegel

    Front cover of Schematron: A language for validating XML

    Schematron

    Table of Contents

    1. Preface

    1.1. Who is this book for?

    1.2. How to use this book

    1.3. Using and finding the code examples

    1.4. The oXygen IDE

    1.5. Contact information

    1.6. Acknowledgements

    2. Introduction to Schematron

    2.1. What is Schematron?

    2.2. Why Schematron?

    2.3. The history of Schematron in a nutshell

    2.4. The Schematron standard

    2.4.1. A critical note on the standard

    2.5. The Schematroll

    2.6. An illustrative example

    2.6.1. The example document

    2.6.2. Running the examples

    2.6.3. Checking the value of the code attribute

    2.6.4. Improving the message

    2.6.5. Using variables

    3. Schematron in context

    3.1. Validating XML documents

    3.1.1. Restricting XML freedom

    3.1.2. How to define XML formats

    3.1.3. Validating XML documents

    3.2. The main schema languages

    3.2.1. Document Type Definition (DTD)

    3.2.2. W3C XML Schema

    3.2.2.1. W3C XML Schema version 1.1

    3.2.3. RELAX NG

    3.2.4. Schematron

    4. Applying Schematron

    4.1. Creating Schematron schemas

    4.2. IDE-based Schematron validation

    4.3. Raw Schematron validation

    4.4. Validation results

    4.4.1. SVRL

    4.4.2. XVRL

    4.5. Schematron validation processors

    4.5.1. The skeleton XSLT Schematron processor

    4.5.2. The SchXslt Schematron processor

    4.5.2.1. Running SchXslt from the command line

    5. Schematron basics

    5.1. Setting up a Schematron schema

    5.2. Patterns, rules, assertions, and reports

    5.2.1. The Schematron processing algorithm

    5.2.2. Rule processing

    5.2.3. Assert and Report processing

    5.2.3.1. Using reports instead of assertions

    5.2.3.2. The message texts

    5.3. More meaningful messages:

    5.4. Declaring and using variables:

    5.4.1. Ground rules for variables

    5.4.2. Variable usage example

    5.5. Declaring namespaces

    6. Advanced Schematron

    6.1. Providing multiple messages:

    6.1.1. Multiple messages

    6.1.2. Localization of messages

    6.2. Selecting what patterns are active:

    6.3. Reusing rules: abstract rules

    6.3.1. Abstract rules in external documents

    6.3.2. Alternative to abstract rules

    6.4. Reusing patterns: Abstract patterns

    6.5. Including documents:

    7. Query language binding and using XSLT

    7.1. Introduction to query language binding

    7.2. Using XSLT in Schematron

    7.2.1. Using XSLT keys

    7.2.2. Using XSLT functions

    7.2.3. Using other XSLT features

    8. Additional features

    8.1. Messages with markup: , , and

    8.2. Flags

    8.3. Properties

    8.4. Adding structured comments: and <p>

    8.5. Validating documents referenced by XInclude

    8.6. Specifying a role: the role attribute

    8.7. Specify a different location: the subject attribute

    8.8. The element

    8.9. The icon, see, and fpi attributes

    9. Schematron examples and recipes

    9.1. Validating a Schematron schema

    9.2. Handling a default namespace

    9.3. Using Schematron for schema validation

    9.4. Checking multiple identifier references

    9.5. Validating processing instructions and comments

    9.6. Validating doubled elements in mixed content

    A. XPath technology primer

    A.1. XML as a tree

    A.1.1. Basic trees: documents, elements, and text

    A.1.2. Attributes in the tree

    A.1.3. Representing mixed content

    A.2. Basic tree navigation

    A.2.1. Using navigation expressions

    A.2.2. Basic navigation to attributes

    A.2.3. Multiple selections: sequences

    A.3. Tree navigation and the context item

    A.4. Some special operators

    A.4.1. The context item single dot . operator

    A.4.2. The parent double dot .. operator

    A.4.3. The * and @* wildcard operators

    A.4.4. The search // operator

    A.5. Predicates in tree navigation

    A.6. Expressions on simple data

    A.6.1. The XPath function library

    A.6.2. XPath data types

    A.6.2.1. Explicit data typing and data type conversions

    A.6.3. Numerical expressions

    A.6.4. String expressions

    A.6.5. Boolean expressions and comparisons

    A.6.5.1. Testing sequences

    A.6.5.2. Testing string values

    A.6.6. Working with dates, times and durations

    A.6.6.1. Additional date, time, and duration functions

    B. An introduction to namespaces

    B.1. Why namespaces?

    B.2. Namespace names

    B.3. Declaring namespaces

    B.3.1. Defining a default namespace

    B.3.2. Defining and using namespace prefixes

    B.3.3. The XML namespace

    B.4. Namespaces in Schematron

    C. Schematron reference

    C.1. XML structure overview notation

    C.2. The Schematron namespace

    C.2.1. Using other namespaces

    C.3. Root element:

    C.4. Declaring a namespace:

    C.5. Defining validation phases:

    C.5.1. Attaching a single pattern to a phase:

    C.6. Creating validation patterns:

    C.6.1. Rules in validation patterns:

    C.6.1.1. Defining assertions:

    C.6.1.2. Defining reports:

    C.6.1.3. Referencing an abstract rule:

    C.6.2. Parameters for abstract patterns:

    C.7. Defining diagnostic messages:

    C.7.1. Diagnostic message:

    C.8. Defining additional properties:

    C.8.1. Defining a property:

    C.9. Message markup: mixed contents

    C.9.1. Writing direction:

    C.9.2. Emphasis:

    C.9.3. Retrieving the name of a node:

    C.9.4. Spanning text:

    C.9.5. Value of an XPath expression:

    C.10. Standard attributes and elements

    C.10.1. The flag attribute

    C.10.2. The fpi attribute

    C.10.3. The icon attribute

    C.10.4. The role attribute

    C.10.5. The subject attribute

    C.10.6. The see attribute

    C.10.7. The xml:lang attribute

    C.10.8. The xml:space attribute

    C.10.9. The element

    C.10.10. The

    element

    C.10.11. The element

    C.10.12. The element

    D. SVRL reference

    D.1. SVRL in the Schematron standard

    D.2. The SVRL namespace

    D.3. Root element:

    D.4. Namespace declarations:

    D.5. Active patterns:

    D.6. Fired rules:

    D.7. Failed asserts:

    D.7.1. Diagnostic reference:

    D.7.2. Property reference:

    D.8. Successful reports:

    D.9. Mixed text:

    D.9.1. Writing direction:

    D.9.2. Emphasis:

    D.9.3. Spanning text:

    E. Schematron QuickFix

    F. Additional resources

    F.1. Schematron

    F.2. XPath

    F.3. XML Schema

    F.4. RELAX NG

    F.5. XSLT

    F.6. XQuery

    F.7. XProc

    F.8. Other information

    G. Copyright and Legal Notices

    Schematron

    A Language for Validating XML

    Erik Siegel

    Chapter 1. Preface

    Computer systems process, read, exchange, and spit out data like the contents of this book, your tax form, electronic health records, social media messages, and much more. All such computer data must be in some kind of well-defined format, so it can be understood and processed correctly.

    Computer data formats can be organized into families, according to the ground rules they follow. This is analogous to natural languages. In the western world we have a limited character set (A-Z, sometimes a few more), and we write from left to right. Other language families, for example Chinese, have a much more extended character set and a different writing orientation. If you speak a western language but don’t understand French, you can’t comprehend text in French, but you can still recognize constructs from the Western language family, such as characters, words, sentences, and paragraphs. For Chinese that will be much more difficult.

    Computer data format families work the same way. Examples of data format families include EDI, CSV, JSON, and XML. Each has its own syntax rules, application domains, pros, cons, and fan club. And just as a French person and a Swedish person cannot easily communicate, even though their languages are in the same family, a computer system that talks XML language A cannot understand XML language B. People and computers both can perceive familiar language constructs, but they cannot understand what the other person/computer is saying unless they know the specific language being used.

    In the natural language world we can sometimes partly solve that communication problem with a little improvisation, such as sign language or gestures. However, for computers that’s not so easy. What goes in and what comes out must be correct and understandable. The process of establishing this correctness for computer data is called validation.

    Validation is usually done in phases:

    First the data is checked to see whether it follows the language-family ground rules.

    Then checks are done to determine whether it’s in the correct language and follows the grammar of that language.

    And, optionally, business rules are checked to determine whether the data makes sense.

    Table 1.1 compares these phases for natural and computer languages:

    Table 1.1 – Comparison of natural versus computer language validation

    The rules for computer data checks can be expressed in special formal computer languages called validation languages. Validation languages allow the computer system to validate whether the data it consumes or produces is correct and valid. Schematron, the subject of this book, is about validating XML, so let’s focus on that.

    For XML, checking the ground rules of the language family is an absolute precondition for any subsequent checks. An XML document that passes this stage is called well-formed.

    To check grammar you can choose from several XML validation languages. The most common ones are DTD, W3C XML Schema, and RELAX NG. If this stage is passed, an XML document is called valid.

    Schematron is a validation language that checks business rules. A set of checks written in the Schematron language is called a Schematron schema. There’s no official name for a document that passes this stage, but let’s call it Schematron valid.

    This process of validating XML documents is explained in much more detail in Chapter 3.

    A Schematron schema consists (mostly) of assertions. A simple example would be that, somewhere in an XML document, a given start date must always be before a given end date. You can write such an assertion in the Schematron language and then validate that it’s true for your data. If not, some error message is produced.

    This book explains how to create Schematron schemas and use them to check XML data for compliance with your business rules.

    1.1. Who is this book for?

    This book is for anyone who wants to learn Schematron or expand their existing knowledge of the language.

    I assume you have at least a basic knowledge of XML. That is, you know roughly what documents, elements, and attributes are, and you have seen data formatted like Example 1.1 before:

    Example 1.1 – An example XML document (schematron-book-code/data/invoices.xml)

    2020-11-11 total=10101.33>

      2020-10-04 total=10000.50 id=12345/>

      2020-10-16 total=100.83 id=56789/>

    Or like Example 1.2:

    Example 1.2 – Another example XML document (schematron-book-code/data/text.xml)

     

    greeting>Hello Schematron!

    If the examples above look like complete gibberish, this book is probably not for you. But if you know your way around XML at this level, read on.

    In writing this book, I assumed that Schematron is used by a wide variety of people with a broad range of backgrounds. Some will be XSLT or XQuery programmers. Some will previously have written schemas in other languages like W3C XML Schema, or RELAX NG. Others will be less tech-savvy or newer to the XML world. This book tries to explain Schematron for both XML aficionados and people who are less experienced. I assume a basic familiarity with XML. A little programming experience helps but is not strictly necessary. If your experience fits this profile, basic Schematron should be no problem.

    For those that want to go beyond basics, there’s also a lot to be gained. Schematron schemas can do amazing things, like combining data from many sources or even digging into databases. You can mix it with programming languages like XSLT, unleashing all the power these languages provide. This book will teach you how to do this.

    If, after reading the above, you think Schematron might be interesting but are still a little unsure, please jump ahead to Section 2.6, An illustrative example. This will provide you with an example of basic Schematron usage. Hopefully this will give you a good picture.

    1.2. How to use this book

    There are several ways you can use this book. Here are a few suggestions:

    For those that are new to the XML world or have limited programming experience, this book contains two introductions on important topics:

    To create Schematron schemas, you need at least a basic knowledge of XPath. If you don’t know what this is and expressions like /data/entry or //invoice[1]/@id don’t make sense to you, please read the technology primer, Appendix A, first.

    Appendix B introduces another tricky subject in the XML world: namespaces. Even if you have nothing to do with Schematron, there’s a good chance you need to know something about namespaces when working with XML.

    If you want to know more about validating documents in general, using Schematron and other languages, read Chapter 2 and Chapter 3.

    If you want to know how to apply Schematron, how to set a Schematron schema to work, read Chapter 4.

    Chapter 5 explains the basics of Schematron. This will provide you with enough information to get going and write simple, but nonetheless useful, Schematron schemas.

    Chapter 6, Chapter 7, and Chapter 8 go beyond the basics to teach you to create reusable schema components, integrate other programming languages, and use other nifty features.

    Chapter 9 contains more advanced examples and recipes.

    For those that need to see the nitty-gritty details of the language, please refer to the references in Appendix C and Appendix D.

    SQF (Schematron QuickFix), a not widely supported but nonetheless valuable extension to Schematron, is explained in Appendix E.

    1.3. Using and finding the code examples

    The code examples in this book are at https://github.com/xatapult/schematron-book-code, a public GitHub repository. Feel free to download or clone the repository and use the code and data in any way you like. The examples that come from this repository contain a path between parentheses in the example’s title, like Example 1.3.

    Example 1.3 – Example document from the book code repository (schematron-book-code/data/parcels-valid.xml)

    100 delivery-date=2021-10-10>

      50 date=2021-09-29>

        A large quantity of mouth masks

     

      45 date=2021-09-12>

        Toys for children

     

    For example, if you’ve cloned the repository in /work/data, Example 1.3 can be found in /work/data/schematron-book-code/data/parcels-valid.xml.

    1.4. The oXygen IDE

    Applying Schematron, especially when you’re just beginning, is easiest using an IDE (Integrated Development Environment) that natively supports Schematron. Such an IDE helps you write schemas. You can run validations interactively and see the results in the user interface. As far as I know, the only IDE currently available (2022) that supports Schematron this way is oXygen.

    oXygen is not free software. However, it offers a trial license, so you can use it for a limited period of time to get acquainted. For more information, please visit https://www.oxygenxml.com.

    Using oXygen is by far the easiest way to follow the examples in this book and learn about Schematron. Therefore, I mention oXygen frequently. This may look like I own shares in the company, but I’m a paying customer myself. There simply seems to be no competition with regards to Schematron. If I’m wrong (maybe you just created the most awesome interactive Schematron validation IDE yourself…), please drop me an email and let me know.

    If you can’t or don’t want to use oXygen, you can also run a Schematron validation from the command line. See Section 4.5.2.1, Running SchXslt from the command line.

    1.5. Contact information

    This book was written by Erik Siegel (Xatapult, http://www.xatapult.com). You can reach me at [email protected].

    I would definitely like to hear from you. Whatever you have to say about this book, please drop me an email. Knowing that there are people who use what I have written keeps me motivated.

    1.6. Acknowledgements

    I would like to thank all people that reviewed (parts of) this book and provided me with valuable feedback. In alphabetical order: Paul van Aalstede, Jennifer Flint, Tony Graham, Rick Jelliffe, Pieter Lamers, Pieter Masereeuw, Birgit Orthofer, and Andrew Sales. Also, many thanks to my editor and publisher, Richard Hamilton. His attention to detail, both technical and editorial, made this a much better book. Thanks to all of you for your time and effort!

    Chapter 2. Introduction to Schematron

    This chapter provides a high-level overview of Schematron. The last section, Section 2.6, An illustrative example, contains an example of Schematron usage to introduce you to the language itself.

    2.1. What is Schematron?

    Here’s an overview of Schematron’s main high-level characteristics:

    Schematron is a formal schema language in which you can express rules for XML documents.

    There are two types of rules:

    Assertions: when the condition for an assertion fails, an error message is issued.

    Reports: when the condition for a report holds, a report message is issued.

    In practice, you will use assertions more often than reports.

    In Schematron you define all the error and report messages in your own words.

    Schematron is expressed in XML: a Schematron schema is an XML document.

    Schematron allows you to specify the underlying language for its expressions. In practice, XPath is the only language supported.

    Schematron can, by design, incorporate constructs from other programming languages. However, most public implementations support XSLT only.

    2.2. Why Schematron?

    There are a number of reasons why Schematron is such a useful tool in the XML toolbox. Here are the most important ones:

    Schematron is a relatively simple but powerful validation language. Basic Schematron (covered in Chapter 5) already has a wide field of application and is relatively easy to master.

    Schematron can go way beyond the validations of the classic validation languages like DTD, W3C XML Schema, and RELAX NG. It allows you to do extensive checks on XML structures and data that are not possible in other languages. Anything you can express as an XPath test can be used for validation purposes. More experienced users can take advantage of XSLT features such as keys and functions.

    In Schematron you define all the error or report messages yourself. For other validation languages you’re at the mercy of the validation processor’s implementer, and this often results in technically correct but, for users, obscure messages. In Schematron this is completely under your control. Messages can be enriched with computed text from or about the validated document by using XPath expressions.

    Since the messages are under your control, Schematron is often used to partially take over validations normally done by other validation languages. Messages can be tailored to the user’s knowledge level or context. So instead of:

    The content of element 'section' is not complete. One of '{para}' is expected

    You could tell the user:

    A section in a report must have at least one paragraph of text

    2.3. The history

    Enjoying the preview?
    Page 1 of 1