by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
What if you really need a feature that's missing in RELAX NG to create building blocks? What if, for instance, you need to reuse a name class or a datatype parameter defined once and only once in multiple locations of a schema?
If this were an absolute requirement, which isn't often the case, you would have to use non-RELAX NG tools or features. RELAX NG has an advantage over DTDs or W3C XML Schema in that there are two possible syntaxes, leaving the option to work with either XML mechanisms with the XML syntax or plaintext tools with the compact syntax.
There is no limit to the tools to produce our result, but let's set up a possible use case and some examples of implementations.
Let's just say you want to specify the set of possible characters in your documents and that you want to implement this rule in your RELAX NG schemas. The pattern you might have in mind to perform this restriction could be the one that's an example in Chapter 9. It's not very complex, but not very simple either:
pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" |
Of course, you might want to easily update it if you had to. You wouldn't want to have to copy it in each datatype definition, and you might want to use this pattern in different contexts over different datatypes and eventually combine it with other parameters.
XML parsed entities (internal or external and in the internal DTD or in an external DTD) may be used in the above case. Using internal entities in an internal DTD, you can, for instance, write:
<?xml version = '1.0' encoding = 'utf-8' ?> <!DOCTYPE element [[ <!ENTITY validChars "<param name= 'pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param>"> ]> <element xmlns="http://relaxng.org/ns/structure/1.0" name="library" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <oneOrMore> <element name="book"> <attribute name="id"> <data type="NMTOKEN">&validChars;</data> </attribute> <attribute name="available"> <data type="boolean"/> </attribute> <element name="isbn"> <data type="NMTOKEN">&validChars;</data> </element> <element name="title"> <attribute name="xml:lang"> <data type="language"/> </attribute> <data type="token">&validChars;</data> </element> <zeroOrMore> <element name="author"> <attribute name="id"> <data type="NMTOKEN">&validChars;</data> </attribute> <element name="name"> <data type="token">&validChars;</data> </element> <element name="born"> <data type="date"/> </element> <optional> <element name="died"> <data type="date"/> </element> </optional> </element> </zeroOrMore> <zeroOrMore> <element name="character"> <attribute name="id"> <data type="NMTOKEN">&validChars;</data> </attribute> <element name="name"> <data type="token">&validChars;</data> </element> <element name="born"> <data type="date"/> </element> <element name="qualification"> <data type="token">&validChars;</data> </element> </element> </zeroOrMore> </element> </oneOrMore> </element> |
The trickery here is the definition of an entity for the parameter:
<!ENTITY validChars "<param name= 'pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param>"> |
And to use this entity where you need it; for instance:
<data type="token">&validChars;</data> |
What about the compact syntax? The compact syntax doesn't support entities, but if you convert this schema into the compact syntax (using Trang), you get:
element library { element book { attribute id { xsd:NMTOKEN { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } }, attribute available { xsd:boolean }, element isbn { xsd:NMTOKEN { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } }, element title { attribute xml:lang { xsd:language }, xsd:token { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } }, element author { attribute id { xsd:NMTOKEN { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } }, element name { xsd:token { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } }, element born { xsd:date }, element died { xsd:date }? }*, element character { attribute id { xsd:NMTOKEN { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } }, element name { xsd:token { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } }, element born { xsd:date }, element qualification { xsd:token { pattern = "[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*" } } }* }+ } |
This means that as long as you keep the XML version as your reference for this schema, you can easily get the compact syntax but can't go the other way round (compact to XML) without losing the entity definition. The fact that this example uses an XML mechanism has broken the round-tripping between the two syntaxes.
Other XML tools (such as XInclude or writing the schema as a XSLT transformation) can be used with pretty much the same effect. Depending on the case, these solutions are supported by the parser that parses the RELAX NG schema (this is the case with out internal entity) or requires a first phase during which your schema is compiled into a fully compatible RELAX NG schema.
For an example, let's use XSLT. When you need to do simple stuff, XSLT has a simplified syntax in which the xsl:stylesheet and xsl:template elements may be omitted (exactly like the RELAX NG grammar and start elements may be omitted in a simple RELAX NG schema). This means that if we just want to use XSLT for its simplest features (here only to expend the values of variables), we can write our schema as:
<?xml version = '1.0' encoding = 'utf-8' ?> <element xmlns="http://relaxng.org/ns/structure/1.0" name="library" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="1.0"> <xsl:variable name="validChars"> <param name='pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param> </xsl:variable> <oneOrMore> <element name="book"> <attribute name="id"> <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data> </attribute> <attribute name="available"> <data type="boolean"/> </attribute> <element name="isbn"> <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data> </element> <element name="title"> <attribute name="xml:lang"> <data type="language"/> </attribute> <data type="token"><xsl:copy-of select="$validChars"/></data> </element> <zeroOrMore> <element name="author"> <attribute name="id"> <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data> </attribute> <element name="name"> <data type="token"><xsl:copy-of select="$validChars"/></data> </element> <element name="born"> <data type="date"/> </element> <optional> <element name="died"> <data type="date"/> </element> </optional> </element> </zeroOrMore> <zeroOrMore> <element name="character"> <attribute name="id"> <data type="NMTOKEN"><xsl:copy-of select="$validChars"/></data> </attribute> <element name="name"> <data type="token"><xsl:copy-of select="$validChars"/></data> </element> <element name="born"> <data type="date"/> </element> <element name="qualification"> <data type="token"><xsl:copy-of select="$validChars"/></data> </element> </element> </zeroOrMore> </element> </oneOrMore> </element> |
Applied to any XML document, this transformation produces a RELAX NG schema in which the XSLT instruction:
<xsl:copy-of select="$validChars"/> |
is replaced by the content of the variable $validChars:
<param name= 'pattern'>[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*</param> |
Text tools are somewhat more limited. You can use only tools that, like the XSLT example just shown, require a first phase to produce a schema. One of the first tools that comes to mind to people familiar with C programming is the C preprocessor (CPP). The syntax for defining a text replacement with CPP is #define and references are done using the name of the definition. Something equivalent to our two previous examples could thus be:
#define VALIDCHARS pattern = '[\p{IsBasicLatin}\p{IsLatin-1Supplement}]*' element library { element book { attribute id { xsd:NMTOKEN { VALIDCHARS } }, attribute available { xsd:boolean }, element isbn { xsd:NMTOKEN { VALIDCHARS } }, element title { attribute xml:lang { xsd:language }, xsd:token { VALIDCHARS } }, element author { attribute id { xsd:NMTOKEN { VALIDCHARS } }, element name { xsd:token { VALIDCHARS } }, element born { xsd:date }, element died { xsd:date }? }*, element character { attribute id { xsd:NMTOKEN { VALIDCHARS } }, element name { xsd:token { VALIDCHARS } }, element born { xsd:date }, element qualification { xsd:token { VALIDCHARS } } }* }+ } |
When compiled through CPP, this gives a fully valid RELAX NG schema (compact syntax) in which the occurrences of VALIDCHARS have been replaced by the parameter.
This text is released under the Free Software Foundation GFDL.