RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

You are welcome to use our annotation system to give your feedback.


Annotation for applications

As mentioned in the introduction of this chapter, common uses of annotations by applications include using them as pre-processing instructions, as helpers for generating other schemas out of a RELAX NG schema, and as extensions of RELAX NG itself.

Bob DuCharme has proposed an interesting application of annotation for pre-processing. He says that annotation can be used to derive specific schemas by the restriction of a generic schema. The benefits of this approach are that it is extremely simple and provides a very straightforward workaround to the lack of derivation by restriction (a W3C XML Schema feature) in RELAX NG. It is language neutral and can be applied to other schema languages such as W3C XML Schema: it is much simpler than the derivation by restriction feature built into the language.

You can find Bob DuCharme's proposal on the web at: http://www.snee.com/xml/schemaStages.html and can download the XSLT transformation implementing it at http://www.snee.com/xml/schemaStages.zip.

The idea behind his proposal is to add annotations in elements which need to be removed in a variant of the schema. Then you would use these annotations to generate the different variants by using an XSLT transformation. Each variant is called a stage. The list of the available stages is declared in an sn:stages element. For each element that is conditional, the list of the stages in which it needs to be kept is declared through an sn:stages attributes.

Since this technique uses annotations, the global schema can still be a valid schema which will validate a superset of the instance documents which would be valid per each of the stages.

If we wanted to derive schemas requiring either a book, author, library or character element or both book or author as a document element from a generic schema that allows any of these, we could write:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:sn="http://www.snee.com/ns/stages">
   <sn:stages>
     <sn:stage name="library"/>
     <sn:stage name="book"/>
     <sn:stage name="author"/>
     <sn:stage name="character"/>
     <sn:stage name="author-or-book"/>
   </sn:stages>
   <start>
     <choice>
       <ref name="library-element" sn:stages="library"/>
       <ref name="book-element" sn:stages="book author-or-book"/>
       <ref name="author-element" sn:stages="author author-or-book"/>
       <ref name="character-element" sn:stages="character"/>
     </choice>
   </start>
   .../...
 </grammar>

or:

 namespace sn = "http://www.snee.com/ns/stages"
      
 sn:stages [[
   sn:stage [ name = "library" ]
   sn:stage [ name = "book" ]
   sn:stage [ name = "author" ]
   sn:stage [ name = "character" ]
   sn:stage [ name = "author-or-book" ]
 ]
 start =
   [ sn:stages = "library" ] library-element
   | [ sn:stages = "book author-or-book" ] book-element
   | [ sn:stages = "author author-or-book" ] author-element
   | [ sn:stages = "character" ] character-element
 .../...

This schema is a valid RELAX NG schema which would accept any of these elements as a root. A transformation of the XML syntax through the XSLT transformation "getStage.xsl", provided in the zip file mentioned above, with a parameter stageName set to author-or-book would remove all the elements with a sn:stage attribute that do not have author-or-book in their list of values:

 $ xsltproc --stringparam stageName author-or-book getStage.xsl doc-snee.rng
 <?xml version="1.0"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:sn="http://www.snee.com/ns/stages">
  <start>
    <choice>
      <ref name="book-element"/>
      <ref name="author-element"/>
    </choice>
  </start>
 .../...
 </grammar>

This transformation has thus performed a restriction on the schema. You can generate as many schemas this way as there are stages that have been declared in the sn:stages element.

We will see in Appendix B: Using RELAX NG As a Pivot Format, that RELAX NG is a good fit to be in a pivot format. A pivot format is a reference format in which schemas are kept and transformed into other languages. One of the limits of the pivot approach is that features which are part of the target languages but not part of RELAX NG would seem to be out of reach. That would be true, but we have annotations. The two most common examples of such annotations are used for generating DTDs and W3C XML Schema.

This is the third and final facet of the DTD Compatibility specification and it deals with default values for attributes. They can be declared using an a:defaultValue attribute:

 <?xml version="1.0" encoding="utf-8"?>
 <element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
name="library">
   <oneOrMore>
     <element name="book">
       <attribute name="id"/>
        <optional>
         <attribute name="available" a:defaultValue="true">
           <choice>
             <value>true</value>
             <value>false</value>
           </choice>
         </attribute>
       </optional>
       .../...
     </element>
   </oneOrMore>
    .../...
 </element>

or:

 namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
        
 element library {
   element book {
     attribute id { text },
     [ a:defaultValue = "true" ]
     attribute available { "true" | "false" }?,
     element isbn { text },
     element title {
       attribute xml:lang { text },
       text
     },
     .../...
   }+
 }

The attribute needs to be declared as optional to use this feature. That means that there is no impact on the validation by a RELAX NG processor. However, converters such as Trang will use this annotation to generate a default value in a DTD:

 <!ATTLIST book
  id CDATA #REQUIRED
  available (true|false) 'true'>

There is no official specification about how to generate W3C XML Schema from RELAX NG and what we will say in this small section is derived from the documentation of Trang.

[Note]Note

Information on how to use Trang is available on the web at http://www.thaiopensource.com/relaxng/trang-manual.html.

The first thing to note is that Trang supports the a:defaultValue attribute. The schema presented above would be translated as:

  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="author"/>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="character"/>
      </xs:sequence>
      <xs:attribute name="id" use="required"/>
      <xs:attribute name="available" default="true">
        <xs:simpleType>
          <xs:restriction base="xs:token">
            <xs:enumeration value="true"/>
            <xs:enumeration value="false"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Note the default attribute in the declaration of the available attribute.

In addition to this annotation, James Clark has created a specific namespace: http://www.thaiopensource.com/ns/relaxng/xsd to manage the translation to W3C XML Schema. This translation is far from being obvious and a RELAX NG schema can often be translated using different features of W3C XML Schema. James Clark has made a lot of choices in his implementation based on best practices, but there are still some options which are context dependent: for those situations the users may be given a choice.

In the current version (as of 30 January 2003), there is only one annotation attribute available to perform such choices, the tx:enableAbstractElements attribute which may be included in grammar, div or include. This attribute can take the values true or false and controls whether abstract elements may be used in substitution groups. Substitution groups are a fairly advanced feature of W3C XML Schema and we won't present it here, but you can find more information on this feature in my tutorial on XML.com: http://xml.com/pub/a/2000/11/29/schemas/part1.html or in my book "XML Schema: The W3C Object-Oriented Descriptions for XML" (O'Reilly).

The Trang manual indicates that more annotations might be added in the future.

The Schema Adjunct Framework (SAF) is a generic framework to store processing information in relation with schemas and can work either as standalone or as "schema adornments"to Annotations embedded in schemas. Although it has been developed to work with W3C XML Schema, there is no reason that it couldn't be used to adorn RELAX NG schemas.

You can find more information about SAF on the web: http://www.tibco.com/solutions/products/extensibility/resources/saf.jsp

The momentum behind SAF seems to have decreased a lot since end of 2001, but it's definitely something worth examining if you need to add processing information to a schema. A simple example of a SAF adornment in RELAX NG could look like:

  <define name="author-element">
    <sql:select>select <sql:elem>name</sql:elem>, <sql:elem>birthdate</sql:elem>,
<sql:elem>deathdate</sql:elem>
      from tbl_author</sql:select>
    <element name="author">
      <attribute name="id"/>
      <ref name="name-element"/>
      <ref name="born-element"/>
      <optional>
        <ref name="died-element"/>
      </optional>
    </element>
  </define>

or:

 [[
   sql:select [[
     "select "
     sql:elem [ "name" ]
     ", "
     sql:elem [ "birthdate" ]
     ", "
     sql:elem [ "deathdate" ]
     " from tbl_author"
   ]
 ]
 author-element =
   element author {
     attribute id { text },
     name-element,
     born-element,
     died-element?
   }

These both add SQL-based processing information to the schema.

Annotations can also be used as extensions to influence the behavior of the RELAX NG processors which support them. This is controversial, but can also be very useful. The two applications which I am aware of in this category are one for embedding Schematron rules and my own XVIF project which allows a user to define validation and transformation pipes which act as RELAX NG patterns.

Schematron is an XML schema language that's rather atypical. Instead of being grammar-based like RELAX NG and focusing on describing documents, Schematron is rule based and consists of lists of rules to check against documents. Giving the exhaustive list of all the rules needed to validate a document is a very verbose and error prone task but on the other hand, the ability to write your own rules gives a flexibility and a power which can't be matched by a grammar based schema language. The two types of languages appear to be more complementary than their competitors are. Using both together allows you to get the best from each of them.

[Note]Note

You will find more information about Schematron on its web site: http://www.ascc.net/xml/resource/schematron/schematron.html.

Schematron can get into places no other schema language can. For example, Schematron is a good fit if we want to check that the id attribute of our book element is composed of the ISBN number prefixed by the letter b. In this case we would write:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:s="http://www.ascc.net/xml/schematron">
   <define name="book-element">
     <element name="book">
       <s:rule context="book">
         <s:assert test="@id = concat('b', isbn)"> The id needs to be the isbn number
prefixed by &quot;b&quot; </s:assert>
       </s:rule>
       <attribute name="id"/>
       <attribute name="available"/>
       <ref name="isbn-element"/>
       <ref name="title-element"/>
       <zeroOrMore>
         <ref name="author-element"/>
       </zeroOrMore>
       <zeroOrMore>
         <ref name="character-element"/>
       </zeroOrMore>
     </element>
   </define>
   .../...
 </grammar>

or:

 namespace s = "http://www.ascc.net/xml/schematron"
 book-element =
   [[
     s:rule [[
       context = "book"
       s:assert [
         test = "@id = concat('b', isbn)"
         ' The id needs to be the isbn number prefixed by "b" '
       ]
     ]
   ]
   element book {
     attribute id { text },
     attribute available { text },
     isbn-element,
     title-element,
     author-element*,
     character-element*
   }
 .../...

The Schematron annotation is composed of a rule element which sets the context and embedded assert elements defining assertions. Instead of assert, report elements can be used. They are the opposite of assertions and report errors when they are true. These checks are applied to all the elements meeting the XPath expression provided in the context attribute of the rule elements and the test attribute of the assert or report elements are also XPath expressions.

At this point, we must note that there is an appreciable difference between implementations on the scope in which the rules must be applied leading to potential issues of interoperability between implementations.

On one side, the Schematron specification states that when Schematron rules are embedded in another language, they must be collected and bundled into a Schematron schema independently of where they have been found in the original schema. In other words, this means that the rule which we has been defined above should be applied to all the book elements in the instance documents. This is the approach taken by the Topologi multi validator (see http://www.topologi.com/products/validator/index.html).

On the other hand, when a Schematron rule is embedded in a RELAX NG element pattern, as is the case here, it is tempting to evaluate the rule in the context of the pattern. In that case, the rule will only apply to the book elements which are included in the context node. If the rule fails, the element pattern will fail and other alternatives will be checked. This is the approch taken by Sun's Multi Schema Validator (see http://wwws.sun.com/software/xml/developers/multischema/).

The difference can be seen in an example such as:

  <define name="book-element">
    <choice>
      <element name="book">
        <s:rule context="book">
          <s:assert test="@id = concat('b', isbn)"> The id needs to be the isbn number
prefixed by "b" </s:assert>
        </s:rule>
        <attribute name="id"/>
        <attribute name="available"/>
        <ref name="isbn-element"/>
        <ref name="title-element"/>
        <zeroOrMore>
          <ref name="author-element"/>
        </zeroOrMore>
        <zeroOrMore>
          <ref name="character-element"/>
        </zeroOrMore>
      </element>
      <element name="book">
        <attribute name="id">
          <value>ggjh0836217462</value>
        </attribute>
        <attribute name="available"/>
        <ref name="isbn-element"/>
        <ref name="title-element"/>
        <zeroOrMore>
          <ref name="author-element"/>
        </zeroOrMore>
        <zeroOrMore>
          <ref name="character-element"/>
        </zeroOrMore>
      </element>
    </choice>
  </define>

In this case, the approach taken by the Schematron specification would consider an instance document with a book id equal to "ggjh0836217462" as invalid. The evaluation of the Schematron rules is completely decoupled from the validation by the RELAX NG schema. The approach taken by MSV would consider the same document as valid since it meets one of the alternative definitions for the book element.

The interoperability issue mentioned above is a good illustration of the difficulty of mixing elements from different languages which have been specified independently. The XML Validation Interoperability Framework (XVIF) is a proposal for a framework which would take care of this kind of issue. You will find more information about XVIF at its home page: http://downloads.xmlschemata.org/python/xvif/.

The principle of XVIF is to define micro pipes, much like Unix pipes, of transformations and validations which can be embeded in different transformation and validation languages. When the host language is RELAX NG, these micro pipes behave as RELAX NG patterns.

There are many use cases for such micro pipes and one of them is to include transformations to fit text nodes into existing datatypes. For instance, we have been lucky enough to have dates which are using the ISO 8601 format in our documents, but we could just as well have had French date formats. In this case, a set of regular expressions could have been defined to do the transformation between these dates and the ISO 8601 format. XVIF gives a way to integrate these regular expressions in a RELAX NG schema:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
          datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <define name="born-element">
     <element name="born">
       <if:pipe>
         <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" 
                      apply="m/[0-9]+ .+ [0-9]+/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/^[ \t\n]*([0-9] .*)$/0\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) fevrier ([0-9]+)/\2-02-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) aout ([0-9]+)/\2-08-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) decembre ([0-9]+)/\2-12-\1/"/>
         <if:validate type="http://relaxng.org/ns/structure/1.0">
           <if:apply>
             <data type="date">
               <param name="minInclusive">1900-01-01</param>
               <param name="maxInclusive">2099-12-31</param>
             </data>
           </if:apply>
         </if:validate>
       </if:pipe>
       <text if:ignore="1"/>
     </element>
   </define>
   .../...
 </grammar>

or, in the compact syntax:

 namespace if = "http://namespaces.xmlschemata.org/xvif/iframe"
 namespace rng = "http://relaxng.org/ns/structure/1.0"
        
 datatypes d = "http://relaxng.org/ns/compatibility/datatypes/1.0"
        
 born-element =
   [[
     if:pipe [[
       if:validate [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "m/[0-9]+ .+ [0-9]+/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/^[ \t\n]*([0-9] .*)$/0\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) fevrier ([0-9]+)/\2-02-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) mars ([0-9]+)/\2-03-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) avril ([0-9]+)/\2-04-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) mai ([0-9]+)/\2-05-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) juin ([0-9]+)/\2-06-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) aout ([0-9]+)/\2-08-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) decembre ([0-9]+)/\2-12-\1/"
       ]
       if:validate [[
         type = "http://relaxng.org/ns/structure/1.0"
         if:apply [[
           rng:data [[
             type = "date"
             rng:param [ name = "minInclusive" "1900-01-01" ]
             rng:param [ name = "maxInclusive" "2099-12-31" ]
           ]
         ]
       ]
     ]
   ]
   element born { [ if:ignore = "1" ] text }

In this example, we have defined a pipe (if:pipe) of twelve transformations (if:transform) using regular expressions. Each of them converts one of the twelve months and a final validation (if:validate) which itself is using RELAX NG to check that the result is a ISO 8601 date between 1900 and 2099. The text pattern has an if:ignore attribute showing to XVIF compliant processors that it's a fallback pattern for the other RELAX NG processors.


You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.