by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
While the different schema languages all operate on infoset views of documents, they have chosen different ways of defining constraints:
Constraints may be expressed as rules. In Schematron, for instance, a schema is a set of rules like "the element named book must have an attribute named id and this attribute's content must match this specific rule...."
Constraints may be expressed as a thorough description of each element and attribute like DTDs and W3C XML Schema: "it's an element named book, and it has two attributes named id and available, which look like this...."
Constraints may be expressed as patterns. Patterns are used to match the structures of permissible elements, attributes, and text nodes, much as the regular expressions used in programming can be used to match characters in text. I will cover this third way of defining constraints in detail in this book because this is the method that RELAX NG uses.
The first XML schema language was the Document Type Definition (DTD), which was part of XML 1.0. DTDs provide more than just schema validation features—they include the definition of internal and external entities—but their schema features focus on describing elements. Every element and attribute used by the document type defined by the DTD must be described. Each element must have a content model, identifying which child elements or text nodes are allowed, as well as a list of permissible attributes, if any attributes are allowed. To avoid redundant declarations, DTD developers may use parameter entities, which describe larger pieces of content models and work like a kind of macro processing.
W3C XML Schema extends this foundation and defines several kind of components, including elements, attributes, datatypes, groups of elements, and groups of attributes. (Datatypes are containers for various kinds of content, from text to integers to dates.) The approach is still very focused on elements and attributes, which are clearly differentiated.
RELAX NG, on the other hand, is based on the generic concept of patterns. Patterns are similar to the XPath node sets, a collection of nodes with an internal structure. To begin with, a pattern can be defined as the description of a set of valid node sets.
The difference between patterns and the other approaches may seem subtle, but a DTD or W3C XML Schema element definition tries to give a description of the element itself. When RELAX NG defines the same element, a pattern is defined that is checked against elements in the instance document to see if they match, much as if it were a regular expression being used to match text. The difference is miniscule on the surface, but the pattern approach gives far more flexibility to write, maintain, and combine schemas.
This text is released under the Free Software Foundation GFDL.