Schematron: A language for validating XML
By Erik Siegel
()
About this ebook
Schematron is a validation language that checks XML documents against business rules. It extends the validation provided by languages such as Document Type Definitions (DTD), W3C XML Schema, and RELAX NG, giving you the ability to check your XML documents for compliance with rules that can be difficult, if not impossible, to check with the other validation languages.
Schematron: A language for validating XML is aimed at programmers and others who process XML. It explains the language in detail along with many examples. Anyone who uses Schematron or who would like to begin using it will find a wealth of information in this book.
Erik Siegel
Erik Siegel runs Xatapult, a consultancy that offers coaching, training, applications, and more to the publishing world.
Related to Schematron
Related ebooks
Mastering Go Programming: From Basics to Expert Proficiency Rating: 0 out of 5 stars0 ratingsIntroduction to Google's Go Programming Language: GoLang Rating: 0 out of 5 stars0 ratingsSQLite Database Programming for Xamarin: Cross-platform C# database development for iOS and Android using SQLite.XM Rating: 0 out of 5 stars0 ratingsLinux Shell Scripting Essentials Rating: 1 out of 5 stars1/5Clean Code: An Agile Guide to Software Craft Rating: 0 out of 5 stars0 ratingsRust In Practice Rating: 0 out of 5 stars0 ratingsRuby in Practice Rating: 0 out of 5 stars0 ratingsF# High Performance Rating: 0 out of 5 stars0 ratingsLearning .NET High-performance Programming Rating: 0 out of 5 stars0 ratingsSoftware Documentation Strategy A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsHands-on Ansible Automation: Streamline your workflow and simplify your tasks with Ansible (English Edition) Rating: 0 out of 5 stars0 ratingsOpa Application Development Rating: 0 out of 5 stars0 ratingsPowerShell Essential Guide: Master the fundamentals of PowerShell scripting and automation (English Edition) Rating: 0 out of 5 stars0 ratingsMastering Akka Rating: 0 out of 5 stars0 ratingsProgramming ADO.NET Rating: 0 out of 5 stars0 ratingsCentOS High Availability Rating: 5 out of 5 stars5/5Beginning C# and .NET Rating: 0 out of 5 stars0 ratingsPostgreSQL 9 Administration Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Go: Navigating the World of Concurrent Programming Rating: 0 out of 5 stars0 ratingsAspectJ in Action: Enterprise AOP with Spring Applications Rating: 0 out of 5 stars0 ratingsCentOS High Performance Rating: 0 out of 5 stars0 ratingsCode Beneath the Surface: Mastering Assembly Programming Rating: 0 out of 5 stars0 ratingsRestlet in Action: Developing RESTful web APIs in Java Rating: 0 out of 5 stars0 ratingsProfessional Java EE Design Patterns Rating: 0 out of 5 stars0 ratingsMastering Elasticsearch - Second Edition Rating: 0 out of 5 stars0 ratingsSharePoint 2013 WCM Advanced Cookbook Rating: 0 out of 5 stars0 ratingsDomain Driven Design A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsProfessional C# and .NET Rating: 0 out of 5 stars0 ratingsArchitecting CSS: The Programmer’s Guide to Effective Style Sheets Rating: 0 out of 5 stars0 ratings
Programming For You
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsPython Data Structures and Algorithms Rating: 5 out of 5 stars5/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5Raspberry Pi Electronics Projects for the Evil Genius Rating: 3 out of 5 stars3/5HTML in 30 Pages Rating: 5 out of 5 stars5/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5C All-in-One Desk Reference For Dummies Rating: 5 out of 5 stars5/5Coding with JavaScript For Dummies Rating: 0 out of 5 stars0 ratingsC Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5Narrative Design for Indies: Getting Started Rating: 4 out of 5 stars4/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Beginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5
Reviews for Schematron
0 ratings0 reviews
Book preview
Schematron - Erik Siegel
Schematron
Table of Contents
1. Preface
1.1. Who is this book for?
1.2. How to use this book
1.3. Using and finding the code examples
1.4. The oXygen IDE
1.5. Contact information
1.6. Acknowledgements
2. Introduction to Schematron
2.1. What is Schematron?
2.2. Why Schematron?
2.3. The history of Schematron in a nutshell
2.4. The Schematron standard
2.4.1. A critical note on the standard
2.5. The Schematroll
2.6. An illustrative example
2.6.1. The example document
2.6.2. Running the examples
2.6.3. Checking the value of the code attribute
2.6.4. Improving the message
2.6.5. Using variables
3. Schematron in context
3.1. Validating XML documents
3.1.1. Restricting XML freedom
3.1.2. How to define XML formats
3.1.3. Validating XML documents
3.2. The main schema languages
3.2.1. Document Type Definition (DTD)
3.2.2. W3C XML Schema
3.2.2.1. W3C XML Schema version 1.1
3.2.3. RELAX NG
3.2.4. Schematron
4. Applying Schematron
4.1. Creating Schematron schemas
4.2. IDE-based Schematron validation
4.3. Raw Schematron validation
4.4. Validation results
4.4.1. SVRL
4.4.2. XVRL
4.5. Schematron validation processors
4.5.1. The skeleton
XSLT Schematron processor
4.5.2. The SchXslt Schematron processor
4.5.2.1. Running SchXslt from the command line
5. Schematron basics
5.1. Setting up a Schematron schema
5.2. Patterns, rules, assertions, and reports
5.2.1. The Schematron processing algorithm
5.2.2. Rule processing
5.2.3. Assert and Report processing
5.2.3.1. Using reports instead of assertions
5.2.3.2. The message texts
5.3. More meaningful messages:
5.4. Declaring and using variables:
5.4.1. Ground rules for variables
5.4.2. Variable usage example
5.5. Declaring namespaces
6. Advanced Schematron
6.1. Providing multiple messages:
6.1.1. Multiple messages
6.1.2. Localization of messages
6.2. Selecting what patterns are active:
6.3. Reusing rules: abstract rules
6.3.1. Abstract rules in external documents
6.3.2. Alternative to abstract rules
6.4. Reusing patterns: Abstract patterns
6.5. Including documents:
7. Query language binding and using XSLT
7.1. Introduction to query language binding
7.2. Using XSLT in Schematron
7.2.1. Using XSLT keys
7.2.2. Using XSLT functions
7.2.3. Using other XSLT features
8. Additional features
8.1. Messages with markup:
8.2. Flags
8.3. Properties
8.4. Adding structured comments:
8.5. Validating documents referenced by XInclude
8.6. Specifying a role: the role attribute
8.7. Specify a different location: the subject attribute
8.8. The
8.9. The icon, see, and fpi attributes
9. Schematron examples and recipes
9.1. Validating a Schematron schema
9.2. Handling a default namespace
9.3. Using Schematron for schema validation
9.4. Checking multiple identifier references
9.5. Validating processing instructions and comments
9.6. Validating doubled elements in mixed content
A. XPath technology primer
A.1. XML as a tree
A.1.1. Basic trees: documents, elements, and text
A.1.2. Attributes in the tree
A.1.3. Representing mixed content
A.2. Basic tree navigation
A.2.1. Using navigation expressions
A.2.2. Basic navigation to attributes
A.2.3. Multiple selections: sequences
A.3. Tree navigation and the context item
A.4. Some special operators
A.4.1. The context item single dot . operator
A.4.2. The parent double dot .. operator
A.4.3. The * and @* wildcard operators
A.4.4. The search
// operator
A.5. Predicates in tree navigation
A.6. Expressions on simple data
A.6.1. The XPath function library
A.6.2. XPath data types
A.6.2.1. Explicit data typing and data type conversions
A.6.3. Numerical expressions
A.6.4. String expressions
A.6.5. Boolean expressions and comparisons
A.6.5.1. Testing sequences
A.6.5.2. Testing string values
A.6.6. Working with dates, times and durations
A.6.6.1. Additional date, time, and duration functions
B. An introduction to namespaces
B.1. Why namespaces?
B.2. Namespace names
B.3. Declaring namespaces
B.3.1. Defining a default namespace
B.3.2. Defining and using namespace prefixes
B.3.3. The XML namespace
B.4. Namespaces in Schematron
C. Schematron reference
C.1. XML structure overview notation
C.2. The Schematron namespace
C.2.1. Using other namespaces
C.3. Root element:
C.4. Declaring a namespace:
C.5. Defining validation phases:
C.5.1. Attaching a single pattern to a phase:
C.6. Creating validation patterns:
C.6.1. Rules in validation patterns:
C.6.1.1. Defining assertions:
C.6.1.2. Defining reports:
C.6.1.3. Referencing an abstract rule:
C.6.2. Parameters for abstract patterns:
C.7. Defining diagnostic messages:
C.7.1. Diagnostic message:
C.8. Defining additional properties:
C.8.1. Defining a property:
C.9. Message markup: mixed contents
C.9.1. Writing direction:
C.9.2. Emphasis:
C.9.3. Retrieving the name of a node:
C.9.4. Spanning text:
C.9.5. Value of an XPath expression:
C.10. Standard attributes and elements
C.10.1. The flag attribute
C.10.2. The fpi attribute
C.10.3. The icon attribute
C.10.4. The role attribute
C.10.5. The subject attribute
C.10.6. The see attribute
C.10.7. The xml:lang attribute
C.10.8. The xml:space attribute
C.10.9. The
C.10.10. The
element
C.10.11. The
C.10.12. The
D. SVRL reference
D.1. SVRL in the Schematron standard
D.2. The SVRL namespace
D.3. Root element:
D.4. Namespace declarations:
D.5. Active patterns:
D.6. Fired rules:
D.7. Failed asserts:
D.7.1. Diagnostic reference:
D.7.2. Property reference:
D.8. Successful reports:
D.9. Mixed text:
D.9.1. Writing direction:
D.9.2. Emphasis:
D.9.3. Spanning text:
E. Schematron QuickFix
F. Additional resources
F.1. Schematron
F.2. XPath
F.3. XML Schema
F.4. RELAX NG
F.5. XSLT
F.6. XQuery
F.7. XProc
F.8. Other information
G. Copyright and Legal Notices
Schematron
A Language for Validating XML
Erik Siegel
Chapter 1. Preface
Computer systems process, read, exchange, and spit out data like the contents of this book, your tax form, electronic health records, social media messages, and much more. All such computer data must be in some kind of well-defined format, so it can be understood and processed correctly.
Computer data formats can be organized into families, according to the ground rules they follow. This is analogous to natural languages. In the western world we have a limited character set (A-Z, sometimes a few more), and we write from left to right. Other language families, for example Chinese, have a much more extended character set and a different writing orientation. If you speak a western language but don’t understand French, you can’t comprehend text in French, but you can still recognize constructs from the Western language family, such as characters, words, sentences, and paragraphs. For Chinese that will be much more difficult.
Computer data format families work the same way. Examples of data format families include EDI, CSV, JSON, and XML. Each has its own syntax rules, application domains, pros, cons, and fan club. And just as a French person and a Swedish person cannot easily communicate, even though their languages are in the same family, a computer system that talks XML language A cannot understand XML language B. People and computers both can perceive familiar language constructs, but they cannot understand what the other person/computer is saying unless they know the specific language being used.
In the natural language world we can sometimes partly solve that communication problem with a little improvisation, such as sign language or gestures. However, for computers that’s not so easy. What goes in and what comes out must be correct and understandable. The process of establishing this correctness for computer data is called validation.
Validation is usually done in phases:
First the data is checked to see whether it follows the language-family ground rules.
Then checks are done to determine whether it’s in the correct language and follows the grammar of that language.
And, optionally, business rules are checked to determine whether the data makes sense.
Table 1.1 compares these phases for natural and computer languages:
Table 1.1 – Comparison of natural versus computer language validation
The rules for computer data checks can be expressed in special formal computer languages called validation languages. Validation languages allow the computer system to validate whether the data it consumes or produces is correct and valid. Schematron, the subject of this book, is about validating XML, so let’s focus on that.
For XML, checking the ground rules of the language family is an absolute precondition for any subsequent checks. An XML document that passes this stage is called well-formed.
To check grammar you can choose from several XML validation languages. The most common ones are DTD, W3C XML Schema, and RELAX NG. If this stage is passed, an XML document is called valid.
Schematron is a validation language that checks business rules. A set of checks written in the Schematron language is called a Schematron schema. There’s no official name for a document that passes this stage, but let’s call it Schematron valid.
This process of validating XML documents is explained in much more detail in Chapter 3.
A Schematron schema consists (mostly) of assertions. A simple example would be that, somewhere in an XML document, a given start date must always be before a given end date. You can write such an assertion in the Schematron language and then validate that it’s true for your data. If not, some error message is produced.
This book explains how to create Schematron schemas and use them to check XML data for compliance with your business rules.
1.1. Who is this book for?
This book is for anyone who wants to learn Schematron or expand their existing knowledge of the language.
I assume you have at least a basic knowledge of XML. That is, you know roughly what documents, elements, and attributes are, and you have seen data formatted like Example 1.1 before:
Example 1.1 – An example XML document (schematron-book-code/data/invoices.xml)
10101.33
>
10000.50
id=12345
/>
100.83
id=56789
/>
Or like Example 1.2:
Example 1.2 – Another example XML document (schematron-book-code/data/text.xml)
greeting>Hello Schematron!
If the examples above look like complete gibberish, this book is probably not for you. But if you know your way around XML at this level, read on.
In writing this book, I assumed that Schematron is used by a wide variety of people with a broad range of backgrounds. Some will be XSLT or XQuery programmers. Some will previously have written schemas in other languages like W3C XML Schema, or RELAX NG. Others will be less tech-savvy or newer to the XML world. This book tries to explain Schematron for both XML aficionados and people who are less experienced. I assume a basic familiarity with XML. A little programming experience helps but is not strictly necessary. If your experience fits this profile, basic Schematron should be no problem.
For those that want to go beyond basics, there’s also a lot to be gained. Schematron schemas can do amazing things, like combining data from many sources or even digging into databases. You can mix it with programming languages like XSLT, unleashing all the power these languages provide. This book will teach you how to do this.
If, after reading the above, you think Schematron might be interesting but are still a little unsure, please jump ahead to Section 2.6, An illustrative example
. This will provide you with an example of basic Schematron usage. Hopefully this will give you a good picture.
1.2. How to use this book
There are several ways you can use this book. Here are a few suggestions:
For those that are new to the XML world or have limited programming experience, this book contains two introductions on important topics:
To create Schematron schemas, you need at least a basic knowledge of XPath. If you don’t know what this is and expressions like /data/entry or //invoice[1]/@id don’t make sense to you, please read the technology primer, Appendix A, first.
Appendix B introduces another tricky subject in the XML world: namespaces. Even if you have nothing to do with Schematron, there’s a good chance you need to know something about namespaces when working with XML.
If you want to know more about validating documents in general, using Schematron and other languages, read Chapter 2 and Chapter 3.
If you want to know how to apply Schematron, how to set a Schematron schema to work, read Chapter 4.
Chapter 5 explains the basics of Schematron. This will provide you with enough information to get going and write simple, but nonetheless useful, Schematron schemas.
Chapter 6, Chapter 7, and Chapter 8 go beyond the basics to teach you to create reusable schema components, integrate other programming languages, and use other nifty features.
Chapter 9 contains more advanced examples and recipes.
For those that need to see the nitty-gritty details of the language, please refer to the references in Appendix C and Appendix D.
SQF (Schematron QuickFix), a not widely supported but nonetheless valuable extension to Schematron, is explained in Appendix E.
1.3. Using and finding the code examples
The code examples in this book are at https://github.com/xatapult/schematron-book-code, a public GitHub repository. Feel free to download or clone the repository and use the code and data in any way you like. The examples that come from this repository contain a path between parentheses in the example’s title, like Example 1.3.
Example 1.3 – Example document from the book code repository (schematron-book-code/data/parcels-valid.xml)
2021-10-10
>
2021-09-29
>
2021-09-12
>
For example, if you’ve cloned the repository in /work/data, Example 1.3 can be found in /work/data/schematron-book-code/data/parcels-valid.xml.
1.4. The oXygen IDE
Applying Schematron, especially when you’re just beginning, is easiest using an IDE (Integrated Development Environment) that natively supports Schematron. Such an IDE helps you write schemas. You can run validations interactively and see the results in the user interface. As far as I know, the only IDE currently available (2022) that supports Schematron this way is oXygen.
oXygen is not free software. However, it offers a trial license, so you can use it for a limited period of time to get acquainted. For more information, please visit https://www.oxygenxml.com.
Using oXygen is by far the easiest way to follow the examples in this book and learn about Schematron. Therefore, I mention oXygen frequently. This may look like I own shares in the company, but I’m a paying customer myself. There simply seems to be no competition with regards to Schematron. If I’m wrong (maybe you just created the most awesome interactive Schematron validation IDE yourself…), please drop me an email and let me know.
If you can’t or don’t want to use oXygen, you can also run a Schematron validation from the command line. See Section 4.5.2.1, Running SchXslt from the command line
.
1.5. Contact information
This book was written by Erik Siegel (Xatapult, http://www.xatapult.com). You can reach me at [email protected].
I would definitely like to hear from you. Whatever you have to say about this book, please drop me an email. Knowing that there are people who use what I have written keeps me motivated.
1.6. Acknowledgements
I would like to thank all people that reviewed (parts of) this book and provided me with valuable feedback. In alphabetical order: Paul van Aalstede, Jennifer Flint, Tony Graham, Rick Jelliffe, Pieter Lamers, Pieter Masereeuw, Birgit Orthofer, and Andrew Sales. Also, many thanks to my editor and publisher, Richard Hamilton. His attention to detail, both technical and editorial, made this a much better book. Thanks to all of you for your time and effort!
Chapter 2. Introduction to Schematron
This chapter provides a high-level overview of Schematron. The last section, Section 2.6, An illustrative example
, contains an example of Schematron usage to introduce you to the language itself.
2.1. What is Schematron?
Here’s an overview of Schematron’s main high-level characteristics:
Schematron is a formal schema language in which you can express rules for XML documents.
There are two types of rules:
Assertions: when the condition for an assertion fails, an error message is issued.
Reports: when the condition for a report holds, a report message is issued.
In practice, you will use assertions more often than reports.
In Schematron you define all the error and report messages in your own words.
Schematron is expressed in XML: a Schematron schema is an XML document.
Schematron allows you to specify the underlying language for its expressions. In practice, XPath is the only language supported.
Schematron can, by design, incorporate constructs from other programming languages. However, most public implementations support XSLT only.
2.2. Why Schematron?
There are a number of reasons why Schematron is such a useful tool in the XML toolbox. Here are the most important ones:
Schematron is a relatively simple but powerful validation language. Basic Schematron (covered in Chapter 5) already has a wide field of application and is relatively easy to master.
Schematron can go way beyond the validations of the classic
validation languages like DTD, W3C XML Schema, and RELAX NG. It allows you to do extensive checks on XML structures and data that are not possible in other languages. Anything you can express as an XPath test can be used for validation purposes. More experienced users can take advantage of XSLT features such as keys and functions.
In Schematron you define all the error or report messages yourself. For other validation languages you’re at the mercy of the validation processor’s implementer, and this often results in technically correct but, for users, obscure messages. In Schematron this is completely under your control. Messages can be enriched with computed text from or about the validated document by using XPath expressions.
Since the messages are under your control, Schematron is often used to partially take over validations normally done by other validation languages. Messages can be tailored to the user’s knowledge level or context. So instead of:
The content of element 'section' is not complete. One of '{para}' is expected
You could tell the user:
A section in a report must have at least one paragraph of text