Annotation for applications

Annotation for applications
Prev�	Chapter 13: Annotating Schemas	�Next

Annotations to generate DTDs

This is the third and final facet of the DTD Compatibility specification and it deals with default values for attributes. They can be declared using an a:defaultValue attribute:

 <?xml version="1.0" encoding="utf-8"?>
 <element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
name="library">
   <oneOrMore>
     <element name="book">
       <attribute name="id"/>
        <optional>
         <attribute name="available" a:defaultValue="true">
           <choice>
             <value>true</value>
             <value>false</value>
           </choice>
         </attribute>
       </optional>
       .../...
     </element>
   </oneOrMore>
    .../...
 </element>

or:

 namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
        
 element library {
   element book {
     attribute id { text },
     [ a:defaultValue = "true" ]
     attribute available { "true" | "false" }?,
     element isbn { text },
     element title {
       attribute xml:lang { text },
       text
     },
     .../...
   }+
 }

The attribute needs to be declared as optional to use this feature. That means that there is no impact on the validation by a RELAX NG processor. However, converters such as Trang will use this annotation to generate a default value in a DTD:

 <!ATTLIST book
  id CDATA #REQUIRED
  available (true|false) 'true'>

Annotations to generate W3C XML Schema

There is no official specification about how to generate W3C XML Schema from RELAX NG and what we will say in this small section is derived from the documentation of Trang.

	Note
	Information on how to use Trang is available on the web at http://www.thaiopensource.com/relaxng/trang-manual.html.

The first thing to note is that Trang supports the a:defaultValue attribute. The schema presented above would be translated as:

  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="isbn"/>
        <xs:element ref="title"/>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="author"/>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="character"/>
      </xs:sequence>
      <xs:attribute name="id" use="required"/>
      <xs:attribute name="available" default="true">
        <xs:simpleType>
          <xs:restriction base="xs:token">
            <xs:enumeration value="true"/>
            <xs:enumeration value="false"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Note the default attribute in the declaration of the available attribute.

In addition to this annotation, James Clark has created a specific namespace: http://www.thaiopensource.com/ns/relaxng/xsd to manage the translation to W3C XML Schema. This translation is far from being obvious and a RELAX NG schema can often be translated using different features of W3C XML Schema. James Clark has made a lot of choices in his implementation based on best practices, but there are still some options which are context dependent: for those situations the users may be given a choice.

In the current version (as of 30 January 2003), there is only one annotation attribute available to perform such choices, the tx:enableAbstractElements attribute which may be included in grammar, div or include. This attribute can take the values true or false and controls whether abstract elements may be used in substitution groups. Substitution groups are a fairly advanced feature of W3C XML Schema and we won't present it here, but you can find more information on this feature in my tutorial on XML.com: http://xml.com/pub/a/2000/11/29/schemas/part1.html or in my book "XML Schema: The W3C Object-Oriented Descriptions for XML" (O'Reilly).

The Trang manual indicates that more annotations might be added in the future.

Schema Adjunct Framework

The Schema Adjunct Framework (SAF) is a generic framework to store processing information in relation with schemas and can work either as standalone or as "schema adornments"to Annotations embedded in schemas. Although it has been developed to work with W3C XML Schema, there is no reason that it couldn't be used to adorn RELAX NG schemas.

You can find more information about SAF on the web: http://www.tibco.com/solutions/products/extensibility/resources/saf.jsp

The momentum behind SAF seems to have decreased a lot since end of 2001, but it's definitely something worth examining if you need to add processing information to a schema. A simple example of a SAF adornment in RELAX NG could look like:

  <define name="author-element">
    <sql:select>select <sql:elem>name</sql:elem>, <sql:elem>birthdate</sql:elem>,
<sql:elem>deathdate</sql:elem>
      from tbl_author</sql:select>
    <element name="author">
      <attribute name="id"/>
      <ref name="name-element"/>
      <ref name="born-element"/>
      <optional>
        <ref name="died-element"/>
      </optional>
    </element>
  </define>

or:

 [[
   sql:select [[
     "select "
     sql:elem [ "name" ]
     ", "
     sql:elem [ "birthdate" ]
     ", "
     sql:elem [ "deathdate" ]
     " from tbl_author"
   ]
 ]
 author-element =
   element author {
     attribute id { text },
     name-element,
     born-element,
     died-element?
   }

These both add SQL-based processing information to the schema.

Annotations for extension

Annotations can also be used as extensions to influence the behavior of the RELAX NG processors which support them. This is controversial, but can also be very useful. The two applications which I am aware of in this category are one for embedding Schematron rules and my own XVIF project which allows a user to define validation and transformation pipes which act as RELAX NG patterns.

Embedded Schematron rules

Schematron is an XML schema language that's rather atypical. Instead of being grammar-based like RELAX NG and focusing on describing documents, Schematron is rule based and consists of lists of rules to check against documents. Giving the exhaustive list of all the rules needed to validate a document is a very verbose and error prone task but on the other hand, the ability to write your own rules gives a flexibility and a power which can't be matched by a grammar based schema language. The two types of languages appear to be more complementary than their competitors are. Using both together allows you to get the best from each of them.

	Note
	You will find more information about Schematron on its web site: http://www.ascc.net/xml/resource/schematron/schematron.html.

Schematron can get into places no other schema language can. For example, Schematron is a good fit if we want to check that the id attribute of our book element is composed of the ISBN number prefixed by the letter b. In this case we would write:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:s="http://www.ascc.net/xml/schematron">
   <define name="book-element">
     <element name="book">
       <s:rule context="book">
         <s:assert test="@id = concat('b', isbn)"> The id needs to be the isbn number
prefixed by &quot;b&quot; </s:assert>
       </s:rule>
       <attribute name="id"/>
       <attribute name="available"/>
       <ref name="isbn-element"/>
       <ref name="title-element"/>
       <zeroOrMore>
         <ref name="author-element"/>
       </zeroOrMore>
       <zeroOrMore>
         <ref name="character-element"/>
       </zeroOrMore>
     </element>
   </define>
   .../...
 </grammar>

or:

 namespace s = "http://www.ascc.net/xml/schematron"

 book-element =
   [[
     s:rule [[
       context = "book"
       s:assert [
         test = "@id = concat('b', isbn)"
         ' The id needs to be the isbn number prefixed by "b" '
       ]
     ]
   ]
   element book {
     attribute id { text },
     attribute available { text },
     isbn-element,
     title-element,
     author-element*,
     character-element*
   }
 .../...

The Schematron annotation is composed of a rule element which sets the context and embedded assert elements defining assertions. Instead of assert, report elements can be used. They are the opposite of assertions and report errors when they are true. These checks are applied to all the elements meeting the XPath expression provided in the context attribute of the rule elements and the test attribute of the assert or report elements are also XPath expressions.

At this point, we must note that there is an appreciable difference between implementations on the scope in which the rules must be applied leading to potential issues of interoperability between implementations.

On one side, the Schematron specification states that when Schematron rules are embedded in another language, they must be collected and bundled into a Schematron schema independently of where they have been found in the original schema. In other words, this means that the rule which we has been defined above should be applied to all the book elements in the instance documents. This is the approach taken by the Topologi multi validator (see http://www.topologi.com/products/validator/index.html).

On the other hand, when a Schematron rule is embedded in a RELAX NG element pattern, as is the case here, it is tempting to evaluate the rule in the context of the pattern. In that case, the rule will only apply to the book elements which are included in the context node. If the rule fails, the element pattern will fail and other alternatives will be checked. This is the approch taken by Sun's Multi Schema Validator (see http://wwws.sun.com/software/xml/developers/multischema/).

The difference can be seen in an example such as:

  <define name="book-element">
    <choice>
      <element name="book">
        <s:rule context="book">
          <s:assert test="@id = concat('b', isbn)"> The id needs to be the isbn number
prefixed by "b" </s:assert>
        </s:rule>
        <attribute name="id"/>
        <attribute name="available"/>
        <ref name="isbn-element"/>
        <ref name="title-element"/>
        <zeroOrMore>
          <ref name="author-element"/>
        </zeroOrMore>
        <zeroOrMore>
          <ref name="character-element"/>
        </zeroOrMore>
      </element>
      <element name="book">
        <attribute name="id">
          <value>ggjh0836217462</value>
        </attribute>
        <attribute name="available"/>
        <ref name="isbn-element"/>
        <ref name="title-element"/>
        <zeroOrMore>
          <ref name="author-element"/>
        </zeroOrMore>
        <zeroOrMore>
          <ref name="character-element"/>
        </zeroOrMore>
      </element>
    </choice>
  </define>

In this case, the approach taken by the Schematron specification would consider an instance document with a book id equal to "ggjh0836217462" as invalid. The evaluation of the Schematron rules is completely decoupled from the validation by the RELAX NG schema. The approach taken by MSV would consider the same document as valid since it meets one of the alternative definitions for the book element.

XVIF

The interoperability issue mentioned above is a good illustration of the difficulty of mixing elements from different languages which have been specified independently. The XML Validation Interoperability Framework (XVIF) is a proposal for a framework which would take care of this kind of issue. You will find more information about XVIF at its home page: http://downloads.xmlschemata.org/python/xvif/.

The principle of XVIF is to define micro pipes, much like Unix pipes, of transformations and validations which can be embeded in different transformation and validation languages. When the host language is RELAX NG, these micro pipes behave as RELAX NG patterns.

There are many use cases for such micro pipes and one of them is to include transformations to fit text nodes into existing datatypes. For instance, we have been lucky enough to have dates which are using the ISO 8601 format in our documents, but we could just as well have had French date formats. In this case, a set of regular expressions could have been defined to do the transformation between these dates and the ISO 8601 format. XVIF gives a way to integrate these regular expressions in a RELAX NG schema:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe"
          datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <define name="born-element">
     <element name="born">
       <if:pipe>
         <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" 
                      apply="m/[0-9]+ .+ [0-9]+/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/^[ \t\n]*([0-9] .*)$/0\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) fevrier ([0-9]+)/\2-02-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) aout ([0-9]+)/\2-08-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/>
         <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" 
                       apply="s/([0-9]+) decembre ([0-9]+)/\2-12-\1/"/>
         <if:validate type="http://relaxng.org/ns/structure/1.0">
           <if:apply>
             <data type="date">
               <param name="minInclusive">1900-01-01</param>
               <param name="maxInclusive">2099-12-31</param>
             </data>
           </if:apply>
         </if:validate>
       </if:pipe>
       <text if:ignore="1"/>
     </element>
   </define>
   .../...
 </grammar>

or, in the compact syntax:

 namespace if = "http://namespaces.xmlschemata.org/xvif/iframe"
 namespace rng = "http://relaxng.org/ns/structure/1.0"
        
 datatypes d = "http://relaxng.org/ns/compatibility/datatypes/1.0"
        
 born-element =
   [[
     if:pipe [[
       if:validate [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "m/[0-9]+ .+ [0-9]+/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/^[ \t\n]*([0-9] .*)$/0\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) fevrier ([0-9]+)/\2-02-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) mars ([0-9]+)/\2-03-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) avril ([0-9]+)/\2-04-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) mai ([0-9]+)/\2-05-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) juin ([0-9]+)/\2-06-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) aout ([0-9]+)/\2-08-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"
       ]
       if:transform [[
         type = "http://namespaces.xmlschemata.org/xvif/regexp"
         apply = "s/([0-9]+) decembre ([0-9]+)/\2-12-\1/"
       ]
       if:validate [[
         type = "http://relaxng.org/ns/structure/1.0"
         if:apply [[
           rng:data [[
             type = "date"
             rng:param [ name = "minInclusive" "1900-01-01" ]
             rng:param [ name = "maxInclusive" "2099-12-31" ]
           ]
         ]
       ]
     ]
   ]
   element born { [ if:ignore = "1" ] text }

In this example, we have defined a pipe (if:pipe) of twelve transformations (if:transform) using regular expressions. Each of them converts one of the twelve months and a final validation (if:validate) which itself is using RELAX NG to check that the result is a ISO 8601 date between 1900 and 2099. The text pattern has an if:ignore attribute showing to XVIF compliant processors that it's a fallback pattern for the other RELAX NG processors.

You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.

Prev�	Up	�Next
Documentation�	Home	�Chapter 14: Generating RELAX NG schemas