Other options

RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

Other options
Prev�	Chapter 10: Creating Building Blocks	�Next

Other options

What if we really needed a feature which is really missing in RELAX NG to create our building blocks? What if, for instance, we needed to reuse a name class or a datatype parameter defined once and only once in multiple locations of a schema?

If this was an absolute requirement, which is not often the case, we would have to use non RELAX NG tools or features. RELAX NG has an advantage over DTDs or W3C XML Schema in that we have two possible syntaxes, leaving us the option to work with either XML mechanisms with the XML syntax or plain text tools with the compact syntax.

There is no limit to the tools we may want to use to produce our result, but let's set up a possible use case and some examples of implementations.

A possible use case

Let's just say we want to specify the set of possible characters in our documents and that we want to implement this rule in our RELAX NG schemas. The pattern we might have in mind to perform this restriction could be the one we've seen as an example in "Chapter 9: W3C XML Schema Regular Expressions". It's not very complex but not very simple either:

 pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"

Of course, we would like to be able to easily update it if we had to. We wouldn't want to have to copy it in each data type definition and we would like to be able to use this pattern in different contexts over different data types and eventually combined to other parameters.

XML tools

XML parsed entities (internal or external and in the internal DTD or in an external DTD) may be used in this case. Using internal entities in an internal DTD, we could for instance write:

 <?xml version = '1.0' encoding = 'utf-8' ?>
 <!DOCTYPE element [[
 <!ENTITY validChars "<param name=
'pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param>">
 ]>
 <element xmlns="http://relaxng.org/ns/structure/1.0" name="library"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <oneOrMore>
   <element name="book">
    <attribute name="id">
     <data type="NMTOKEN">&validChars;</data>
    </attribute>
    <attribute name="available">
     <data type="boolean"/>
    </attribute>
    <element name="isbn">
     <data type="NMTOKEN">&validChars;</data>
    </element>
    <element name="title">
     <attribute name="xml:lang">
      <data type="language"/>
     </attribute>
     <data type="token">&validChars;</data>
    </element>
    <zeroOrMore>
     <element name="author">
      <attribute name="id">
       <data type="NMTOKEN">&validChars;</data>
      </attribute>
      <element name="name">
       <data type="token">&validChars;</data>
      </element>
      <element name="born">
       <data type="date"/>
      </element>
      <optional>
       <element name="died">
        <data type="date"/>
       </element>
      </optional>
     </element>
    </zeroOrMore>
    <zeroOrMore>
     <element name="character">
      <attribute name="id">
       <data type="NMTOKEN">&validChars;</data>
      </attribute>
      <element name="name">
       <data type="token">&validChars;</data>
      </element>
      <element name="born">
       <data type="date"/>
      </element>
      <element name="qualification">
       <data type="token">&validChars;</data>
      </element>
     </element>
    </zeroOrMore>
   </element>
  </oneOrMore>
 </element>

The trickery here is the definition of an entity for the parameter:

 <!ENTITY validChars "<param name=
'pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param>">

And to use this entity where we need it, for instance:

       <data type="token">&validChars;</data>

What about the compact syntax? The compact syntax doesn't support entities but if I convert this schema into the compact syntax (using Trang) I get:

 element library {
   element book {
     attribute id {
       xsd:NMTOKEN {
         pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
       }
     },
     attribute available { xsd:boolean },
     element isbn {
       xsd:NMTOKEN {
         pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
       }
     },
     element title {
       attribute xml:lang { xsd:language },
       xsd:token {
         pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
       }
     },
     element author {
       attribute id {
         xsd:NMTOKEN {
           pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
         }
       },
       element name {
         xsd:token {
           pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
         }
       },
       element born { xsd:date },
       element died { xsd:date }?
     }*,
     element character {
       attribute id {
         xsd:NMTOKEN {
           pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
         }
       },
       element name {
         xsd:token {
           pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
         }
       },
       element born { xsd:date },
       element qualification {
         xsd:token {
           pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*"
         }
       }
     }*
   }+
 }

This means that as long as I keep the XML version as my reference for this schema, I can easily get the compact syntax but can't go the other way round (compact to XML) without losing my entity definition. The fact that I am using a XML mechanism has broken the round tripping between the two syntaxes.

Other XML tools (such as XInclude or writing the schema as a XSLT transformation) could be used with pretty much the same effect. Depending on the case, these solutions will be supported by the parser which will parse the RELAX NG schema (like this is the case with out internal entity) or will require a first phase during which your schema is compiled into a fully compatible RELAX NG schema.

For an example, let's use XSLT. When you need to do simple stuff, XSLT has a simplified syntax where the xsl:stylesheet and xsl:template elements may be omitted (exactly like the RELAX NG grammar and start elements may be omitted in a simple RELAX NG schema). That means that if we just want to use XSLT for its simplest features (here only to expend the values of variables), we can write our schema as:

 <?xml version = '1.0' encoding = 'utf-8' ?>
 <element xmlns="http://relaxng.org/ns/structure/1.0" name="library"
   datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xsl:version="1.0">
   <xsl:variable name="validChars">
     <param name='pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param>
   </xsl:variable>
   <oneOrMore>
    <element name="book">
     <attribute name="id">
      <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data>
     </attribute>
     <attribute name="available">
      <data type="boolean"/>
     </attribute>
     <element name="isbn">
      <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data>
     </element>
     <element name="title">
      <attribute name="xml:lang">
       <data type="language"/>
      </attribute>
      <data type="token"><xsl:copy-of select="$validChars"/></data>
     </element>
     <zeroOrMore>
      <element name="author">
       <attribute name="id">
        <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data>
       </attribute>
       <element name="name">
        <data type="token"><xsl:copy-of select="$validChars"/></data>
       </element>
       <element name="born">
        <data type="date"/>
       </element>
       <optional>
        <element name="died">
         <data type="date"/>
        </element>
       </optional>
      </element>
     </zeroOrMore>
     <zeroOrMore>
      <element name="character">
       <attribute name="id">
        <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data>
       </attribute>
       <element name="name">
        <data type="token"><xsl:copy-of select="$validChars"/></data>
       </element>
       <element name="born">
        <data type="date"/>
       </element>
       <element name="qualification">
        <data type="token"><xsl:copy-of select="$validChars"/></data>
       </element>
      </element>
     </zeroOrMore>
    </element>
   </oneOrMore>
  </element>

Applied to any XML document, this transformation will produce a RELAX NG schema where the XSLT instruction:

 <xsl:copy-of select="$validChars"/>

will have been replaced by the content of the variable $validChars, i.e.:

 <param name=
'pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param>

Text tools

Text tools are somewhat more limited. We can only use tools which, like the XSLT example just shown above, require a first phase to produce a schema. One of the first tools which will come to mind to people familiar with C programming is the C pre processor (CPP). The syntax for defining a text replacement with CPP is #define and references are just done using the name of the definition. Something equivalent to our two previous examples could thus be:

 #define VALIDCHARS  pattern = '[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*'
 element library {
   element book {
     attribute id {
       xsd:NMTOKEN {
         VALIDCHARS
       }
     },
     attribute available { xsd:boolean },
     element isbn {
       xsd:NMTOKEN {
         VALIDCHARS
       }
     },
     element title {
       attribute xml:lang { xsd:language },
       xsd:token {
         VALIDCHARS
       }
     },
     element author {
       attribute id {
         xsd:NMTOKEN {
           VALIDCHARS
         }
       },
       element name {
         xsd:token {
           VALIDCHARS
         }
       },
       element born { xsd:date },
       element died { xsd:date }?
     }*,
     element character {
       attribute id {
         xsd:NMTOKEN {
           VALIDCHARS
         }
       },
       element name {
         xsd:token {
           VALIDCHARS
         }
       },
       element born { xsd:date },
       element qualification {
         xsd:token {
           VALIDCHARS
         }
       }
     }*
   }+
 }

And, when compiled through CPP, this gives a fully valid RELAX NG schema (compact syntax) where the occurrences of VALIDCHARS have been replaced by the parameter.

You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.

Prev�	Up	�Next
A real world example: XHTML 2.0�	Home	�Chapter 11: Namespaces