RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

You are welcome to use our annotation system to give your feedback.


A multi-part standard

DSDL is still work in progress. It is a multi-part specification, each of the parts presenting a different schema language (except part 1 which is an introduction and part 10 which is the description of the framework itself).

This part of DSDL will describe the next release of the rule-based schema language known as Schematron. The current version of Schematron has been defined by Rick Jelliffe and other contributors as a language used to express sets of rules as XPath expressions (or more accurately as XSLT expressions since XSLT functions such as document() are also supported in XPath expressions). Its home page is http://www.ascc.net/xml/schematron/.

Without going into the details of the language, we can say that a Schematron schema is composed of sets of rules named "patterns" (these patterns shouldn't be confused with RELAX NG patterns). Each pattern includes one or more rules. Each rule sets the context nodes under which tests will be performed and each tests is performed either as an assert or as a report. An assert is a test which raises an error if it is not verified, while a report is a test which raises an error if it is specified.

A fragment of a Schematron schema for our library could be:

 <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
  <sch:title>Schematron Schema for library</sch:title>
  <sch:pattern>
   <sch:rule context="/">
    <sch:assert test="library">The document element should be "library".</sch:assert>
   </sch:rule>
   <sch:rule context="/library">
    <sch:assert test="book">There should be at least a book!</sch:assert>
    <sch:assert test="not(@*)">No attribute for library, please!</sch:assert>
   </sch:rule>
   <sch:rule context="/library/book">
    <sch:report test="following-sibling::book/@id=@id">Duplicated ID for this book.</sch:report>
    <sch:assert test="@id=concat('_', isbn)">The id should be derived from the ISBN.</sch:assert>
   </sch:rule>
   <sch:rule context="/library/*">
    <sch:assert test="self::book or self::author or self::character">This element shouldn't
be here...</sch:assert>
   </sch:rule>
  </sch:pattern>
 </sch:schema>

We see from that simple example that it would be very verbose to write a full schema with Schematron since it would mean writing a rule for each element. In this rule writing, all the individual tests that check the content model and eventually the relative order between children elements, must be specified. We see also that it does very well at expressing what are often called business rules, such as:

 <sch:assert test="@id=concat('_', isbn)">The id should be derived
from the ISBN.</sch:assert>

which checks that the id attribute of a book is derived from its ISBN element by adding a leading underscore.

DSDL Part 3, the next version of Schematron should keep this structure and add still more power by allowing it to use, not only XPath 1.0 expressions, but also expressions taken from other languages such as EXSLT (a standard extension library for XSLT), XPath 2.0, XSLT 2.0, and even XQuery 1.0.

Although RELAX NG provides a way to write and combine modular schemas, it is often the case that you need to validate a composite document against existing schemas which can be written using different languages: you may want for instance to validate XHTML documents with embedded RDF statements. In this case, you need to split your documents into pieces and validate each of these pieces against its own schema.

The first contribution to Part 4 was an ISO specification known as "RELAX Namespace" by Murata Makoto. This contribution has been followed by a couple of others, namely Modular Namespaces (MNS) by James Clark and Namespace Switchboard by Rick Jelliffe. The latest contribution, Namespace Routing Language (NRL) was made by James Clark in June 2003 and builds on previous proposals. Although it is too early to say if NRL will become DSDL Part 4, it will most likely influence it heavily. NRL is implemented in the latest versions of Jing.

The first example given in the specification (http://www.thaiopensource.com/relaxng/nrl.html) shows how NRL can be used to validate a SOAP message containing one or more XHTML documents:

 <rules xmlns="http://www.thaiopensource.com/validate/nrl">
  <namespace ns="http://schemas.xmlsoap.org/soap/envelope/">
    <validate schema="soap-envelope.xsd"/>
  </namespace>
  <namespace ns="http://www.w3.org/1999/xhtml">
    <validate schema="xhtml.rng"/>
  </namespace>
 </rules>

This example would split the SOAP messages into two parts. The SOAP envelope validated against the W3C XML Schema soap-envelope.xsd. The one or more XHTML documents found in the body of the SOAP message will be validated against the RELAX NG schema xhtml.rng.

More advanced features are available including namespace wildcards, validation modes, open schemas, transparent namespaces, and NRL. These features seem to be able to handle the most complex cases until the basic assumption that instance documents may be split according to the namespaces of its elements and attributes is met.

Part 7 will allow us to specify which characters may be used in specific elements and attributes or within entire XML documents. The W3C note, "A Notation for Character Collections for the WWW" (http://www.w3.org/TR/charcol/), is used as an input for Part 7. The first contribution is "Character Repertoire Validation for XML" (CRVX) (http://dret.net/netdret/docs/wilde-crvx-www2003.html).

A simple example of CRVX is:

 <crvx xmlns="http://dret.net/xmlns/crvx10">
  <restrict structure="ename aname pitarget" charrep="\p{IsBasicLatin}"/>
  <restrict structure="ename aname" charrep="[^0-9]"/>
 </crvx>

In this proposal, the structure attribute contains identifiers for "element names" (ename), "attribute names" (aname)", Processing Instruction target pitarget and other XML constructions including element and attribute contents. This example would thus impose that element and attribute names and Processing Instruction targets must all use characters from the BasicLatin block and that element and attribute names must not use digits.

There is some overlap between Part 7 and other schema languages such as Part 2 (RELAX NG). You'd need to take care that your names match the rules defined in both places and can use the data pattern to check the content of attributes and simple content elements. However, Part 7 gives you a more focused means of expressing these rules independently of other schemas. It filling some gaps in such constraints: RELAX NG cannot express such constraints on name classes nor on mixed content elements.


You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.