Different types of schema languages

RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

Different types of schema languages
Prev�	Chapter 2: Simple Foundations Are Beautiful	�Next

Different types of schema languages

While the different schema languages all operate on infoset views of documents, they have chosen different ways of defining constraints:

Constraints may be expressed as rules. In Schematron, for instance, a schema is a set of rules like "the element named book must have an attribute named id and this attribute's content must must match this specific rule...".
Constraints may be expressed as a thorough description of each element and attribute like DTDs and W3C XML Schema: "it's an element named book and it has two attributes named id and available which look like this...".
Constraints may be expressed as patterns. Patterns are used to match the structures of permissible elements, attributes, and text nodes, much as the regular expressions used in programming can be used to match characters in text. We will cover this third way of defining constraints in detail in this book, as this is the method that RELAX NG uses.

The first XML schema language was the Document Type Definition (DTD), which was part of XML 1.0. DTDs provide more than just schema validation features--they include the definition of internal and external entities--but their schema features focus on describing elements. Every element and attribute used by the document type defined by the DTD must be described. Each element must have a content model, identifying which child elements or text nodes are allowed, as well as a list of permissible attributes, if any attributes are allowed. To avoid redundant declarations, DTD developers may use parameter entities, which describe larger pieces of content models and work like a kind of macro processing.

W3C XML Schema extends this foundation and defines several kind of components, including elements, attributes, datatypes, groups of elements, and groups of attributes. (Datatypes are containers for various kinds of content, from text to integers to dates.) The approach is still very focused on elements and attributes which are clearly differentiated.

RELAX NG, on the other hand, is based on the generic concept of patterns. Patterns are similar to the XPath node sets, a collection of nodes with an internal structure. For starters, a pattern could be defined as the description of a set of valid node sets.

The difference between patterns and the other approaches may seem subtle, but when we define an element with a DTD or W3C XML Schema, we try to give a description of the element itself. When we define the same element with RELAX NG, we define a pattern which will be checked against elements in the instance document to see if they match, much as if it were a regular expression being used to match text. The difference is miniscule on the surface, but the pattern approach gives us far more flexibility to write, maintain and combine schemas.

You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.

Prev�	Up	�Next
Documents and Infosets�	Home	�A simple example: