by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
As mentioned in the introduction to this chapter, common uses of annotations by applications include preprocessing instructions, helpers for generating other schemas out of a RELAX NG schema, and extensions of RELAX NG itself.
Bob DuCharme has proposed an interesting application of annotation for preprocessing. He says that annotation can derive specific schemas by the restriction of a generic schema. The benefits of this approach are that it is extremely simple and that it provides a straightforward workaround to the lack of derivation by restriction (a W3C XML Schema feature) in RELAX NG. It is language-neutral and can be applied to other schema languages such as W3C XML Schema: it is much simpler than the derivation by restriction feature built into the language.
You can find Bob DuCharme's proposal on the web at: http://www.snee.com/xml/schemaStages.html and download the XSLT transformation implementing it from http://www.snee.com/xml/schemaStages.zip.
The idea behind his proposal is to add annotations in elements that need to be removed in a variant of the schema. You then use these annotations to generate the different variants using an XSLT transformation. Each variant is called a stage. The list of the available stages is declared in an sn:stages element. For each element that is conditional, the list of the stages in which it needs to be kept is declared through an sn:stages attributes.
Because this technique uses annotations, the global schema can still be a valid schema that validates a superset of the instance documents that are valid per each stage.
If you wanted to derive schemas requiring a book, author, library, or character element or both book and author as a document element from a generic schema that allows any of these, you could write:
<?xml version="1.0" encoding="utf-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:sn="http://www.snee.com/ns/stages"> <sn:stages> <sn:stage name="library"/> <sn:stage name="book"/> <sn:stage name="author"/> <sn:stage name="character"/> <sn:stage name="author-or-book"/> </sn:stages> <start> <choice> <ref name="library-element" sn:stages="library"/> <ref name="book-element" sn:stages="book author-or-book"/> <ref name="author-element" sn:stages="author author-or-book"/> <ref name="character-element" sn:stages="character"/> </choice> </start> ... </grammar> |
or:
namespace sn = "http://www.snee.com/ns/stages" sn:stages [ sn:stage [ name = "library" ] sn:stage [ name = "book" ] sn:stage [ name = "author" ] sn:stage [ name = "character" ] sn:stage [ name = "author-or-book" ] ] start = [ sn:stages = "library" ] library-element | [ sn:stages = "book author-or-book" ] book-element | [ sn:stages = "author author-or-book" ] author-element | [ sn:stages = "character" ] character-element ... |
This schema is a valid RELAX NG schema that accepts any of these elements as a root. A transformation of the XML syntax through the XSLT transformation getStage.xsl, provided in the ZIP file mentioned previously and with a parameter stageName set to author-or-book removes all elements with an sn:stage attribute that don't have author-or-book in their list of values:
$ xsltproc --stringparam stageName author-or-book getStage.xsl doc-snee.rng <?xml version="1.0"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:sn="http://www.snee.com/ns/stages"> <start> <choice> <ref name="book-element"/> <ref name="author-element"/> </choice> </start> ... </grammar> |
This transformation has thus performed a restriction on the schema. You can generate as many schemas this way, as there are stages that have been declared in the sn:stages element.
RELAX NG works well as a pivot format. A pivot format is a reference format in which schemas are kept and transformed into other languages. One of the limits of the pivot approach is that features that are part of the target languages but not part of RELAX NG seems to be out of reach. It would be true, except for annotations. The two most common examples of such annotations are used to generate DTDs and W3C XML Schema.
This is the third and final facet of the DTD Compatibility specification, and it deals with default values for attributes. They can be declared using an a:defaultValue attribute:
<?xml version="1.0" encoding="utf-8"?> <element xmlns="http://relaxng.org/ns/structure/1.0" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" name="library"> <oneOrMore> <element name="book"> <attribute name="id"/> <optional> <attribute name="available" a:defaultValue="true"> <choice> <value>true</value> <value>false</value> </choice> </attribute> </optional> ... </element> </oneOrMore> ... </element> |
or:
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0" element library { element book { attribute id { text }, [ a:defaultValue = "true" ] attribute available { "true" | "false" }?, element isbn { text }, element title { attribute xml:lang { text }, text }, ... }+ } |
The attribute needs to be declared as optional to use this feature. Hence there is no impact on the validation by a RELAX NG processor. However, converters such as Trang use this annotation to generate a default value in a DTD:
<!ATTLIST book id CDATA #REQUIRED available (true|false) 'true'> |
There is no official specification about how to generate W3C XML Schema from RELAX NG, so what I will say in this small section is derived from Trang's documentation.
Tip | |
---|---|
If you want to know how to use Trang, check its web page at http://www.thaiopensource.com/relaxng/trang-manual.html. |
The first thing to note is that Trang supports the a:defaultValue attribute. The schema presented earlier can be translated as:
<xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="isbn"/> <xs:element ref="title"/> <xs:element minOccurs="0" maxOccurs="unbounded" ref="author"/> <xs:element minOccurs="0" maxOccurs="unbounded" ref="character"/> </xs:sequence> <xs:attribute name="id" use="required"/> <xs:attribute name="available" default="true"> <xs:simpleType> <xs:restriction base="xs:token"> <xs:enumeration value="true"/> <xs:enumeration value="false"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> |
Note the default attribute in the declaration of the available attribute.
In addition to this annotation, James Clark has created a specific namespace, http://www.thaiopensource.com/ns/relaxng/xsd, to manage the translation to W3C XML Schema. This translation is far from obvious, and a RELAX NG schema can often be translated using different features of W3C XML Schema. James Clark has made a lot of choices in his implementation based on best practices, but there are still some context-dependent options; in those situations, the users can be given a choice.
In the current version (as of June 19, 2003), there is only one annotation attribute available to perform such choices: the tx:enableAbstractElements attribute, which can be included in grammar, div, or include. This attribute can take the values true or false and controls whether abstract elements can be used in substitution groups. Substitution groups are a fairly advanced feature of W3C XML Schema, and I won't present the concept here, but you can find more information on this feature in my XML.com tutorial at http://xml.com/pub/a/2000/11/29/schemas/part1.html or in my book, XML Schema.
The Trang manual indicates that more annotations might be added in the future.
The Schema Adjunct Framework (SAF) is a generic framework that stores processing information in relation to schemas and can work either standalone or as a schema adornment to annotations embedded in schemas. Although it has been developed to work with W3C XML Schema, there is no reason that it couldn't be used to adorn RELAX NG schemas.
Tip | |
---|---|
You can find more information about SAF on the Web: http://www.tibco.com/solutions/products/extensibility/resources/saf.jsp. |
The momentum behind SAF seems to have decreased a lot since end of 2001, but it's definitely something worth examining if you need to add processing information to a schema. A simple example of a SAF adornment in RELAX NG looks like:
<define name="author-element"> <sql:select>select <sql:elem>name</sql:elem>, <sql:elem>birthdate</sql:elem>,<sql:elem>deathdate</sql:elem> from tbl_author</sql:select> <element name="author"> <attribute name="id"/> <ref name="name-element"/> <ref name="born-element"/> <optional> <ref name="died-element"/> </optional> </element> </define> |
or:
[ sql:select [ "select " sql:elem [ "name" ] ", " sql:elem [ "birthdate" ] ", " sql:elem [ "deathdate" ] " from tbl_author" ] ] author-element = element author { attribute id { text }, name-element, born-element, died-element? } |
These examples both add SQL-based processing information to the schema.
Annotations can also be used as extensions to influence the behavior of the RELAX NG processors that support them. This application is controversial but can also be very useful. The two applications of which I am aware in this category are one for embedding Schematron rules, and my own XVIF project, which allows a user to define validation and transformation pipes that act as RELAX NG patterns.
Schematron is a rather atypical XML schema language. Instead of being grammar-based like RELAX NG and focusing on describing documents, Schematron is rule-based and consists of lists of rules to check against documents. Giving the exhaustive list of all the rules needed to validate a document is a very verbose and error-prone task, but on the other hand, the ability to write your own rules gives a flexibility and a power that can't be matched by a grammar-based schema language. The two types of languages appear to be more complementary than their competitors. Using both together allows you to get the best from each of them.
Tip | |
---|---|
You can find more information about Schematron on its web site: http://www.ascc.net/xml/resource/schematron/schematron.html. |
Schematron can get into places no other schema language can. For example, Schematron is a good fit when checking whether the id attribute of a book element is composed of the ISBN number prefixed by the letter b. In this case, you would write:
<?xml version="1.0" encoding="utf-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:s="http://www.ascc.net/xml/schematron"> <define name="book-element"> <element name="book"> <s:rule context="book"> <s:assert test="@id = concat('b', isbn)"> The id needs to be the isbn number prefixed by "b" </s:assert> </s:rule> <attribute name="id"/> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </element> </define> ... </grammar> |
or:
namespace s = "http://www.ascc.net/xml/schematron" book-element = [ s:rule [ context = "book" s:assert [ test = "@id = concat('b', isbn)" ' The id needs to be the isbn number prefixed by "b" ' ] ] ] element book { attribute id { text }, attribute available { text }, isbn-element, title-element, author-element*, character-element* } ... |
The Schematron annotation comprises a rule element, which sets the context and embedded assert elements defining assertions. Instead of assert, report elements can be used. They are the opposite of assertions and report errors when they are true. These checks are applied to all the elements meeting the XPath expression provided in the context attribute of the rule elements, and the test attribute of the assert or report elements are also XPath expressions.
At this point, I must mention that there is an appreciable difference between implementations on the scope in which the rules must be applied, which can lead to potential issues of interoperability between implementations.
On one hand, the Schematron specification states that when Schematron rules are embedded in another language, they must be collected and bundled into a Schematron schema independently of where they have been found in the original schema. In other words, the rule that was defined earlier should be applied to all the book elements in the instance documents. This is the approach taken by the Topologi multivalidator (see http://www.topologi.com/products/validator/index.html).
On the other hand, when a Schematron rule is embedded in a RELAX NG element pattern, as is the case here, it is tempting to evaluate the rule in the context of the pattern. In that case, the rule applies only to the book elements that are included in the context node. If the rule fails, the element pattern fails, and other alternatives will be checked. This is the approach taken by Sun's Multi Schema Validator (see http://wwws.sun.com/software/xml/developers/multischema/).
The difference can be seen in an example such as:
<define name="book-element"> <choice> <element name="book"> <s:rule context="book"> <s:assert test="@id = concat('b', isbn)"> The id needs to be the isbn number prefixed by "b" </s:assert> </s:rule> <attribute name="id"/> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </element> <element name="book"> <attribute name="id"> <value>ggjh0836217462</value> </attribute> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </element> </choice> </define> |
In this case, the approach taken by the Schematron specification would consider an instance document with a book ID equal to ggjh0836217462 to be invalid. The evaluation of the Schematron rules is completely decoupled from the validation by the RELAX NG schema. The approach taken by MSV considers the same document as valid, because it meets one of the alternative definitions for the book element.
The interoperability issue mentioned previously is a good illustration of the difficulty of mixing elements from different languages that have been specified independently. The XML Validation Interoperability Framework (XVIF) is a proposal for a framework which would take care of this kind of issue.
Tip | |
---|---|
You will find more information about XVIF at its home page: http://downloads.xmlschemata.org/python/xvif/. |
The principle of XVIF is to define micro pipes, much like Unix pipes, of transformations and validations that can be embedded in different transformation and validation languages. When the host language is RELAX NG, these micro pipes behave as RELAX NG patterns.
There are many use cases for such micro pipes; one of them is to include transformations to fit text nodes into existing datatypes. For instance, we've been using dates that use the ISO 8601 format in our documents, but we can also use French date formats. In this case, a set of regular expressions can be defined to do the transformation between these dates and the ISO 8601 format. XVIF gives a way to integrate these regular expressions in a RELAX NG schema:
<?xml version="1.0" encoding="utf-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:if="http://namespaces.xmlschemata.org/xvif/iframe" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <define name="born-element"> <element name="born"> <if:pipe> <if:validate type="http://namespaces.xmlschemata.org/xvif/regexp" apply="m/[0-9]+ .+ [0-9]+/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/^[ \t\n]*([0-9] .*)$/0\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) janvier ([0-9]+)/\2-01-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) fevrier ([0-9]+)/\2-02-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) mars ([0-9]+)/\2-03-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) avril ([0-9]+)/\2-04-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) mai ([0-9]+)/\2-05-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) juin ([0-9]+)/\2-06-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) juillet ([0-9]+)/\2-07-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) aout ([0-9]+)/\2-08-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) septembre ([0-9]+)/\2-09-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) octobre ([0-9]+)/\2-10-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) novembre ([0-9]+)/\2-11-\1/"/> <if:transform type="http://namespaces.xmlschemata.org/xvif/regexp" apply="s/([0-9]+) decembre ([0-9]+)/\2-12-\1/"/> <if:validate type="http://relaxng.org/ns/structure/1.0"> <if:apply> <data type="date"> <param name="minInclusive">1900-01-01</param> <param name="maxInclusive">2099-12-31</param> </data> </if:apply> </if:validate> </if:pipe> <text if:ignore="1"/> </element> </define> ... </grammar> |
or, in the compact syntax:
namespace if = "http://namespaces.xmlschemata.org/xvif/iframe" namespace rng = "http://relaxng.org/ns/structure/1.0" datatypes d = "http://relaxng.org/ns/compatibility/datatypes/1.0" born-element = [ if:pipe [ if:validate [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "m/[0-9]+ .+ [0-9]+/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/^[ \t\n]*([0-9] .*)$/0\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) janvier ([0-9]+)/\2-01-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) fevrier ([0-9]+)/\2-02-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) mars ([0-9]+)/\2-03-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) avril ([0-9]+)/\2-04-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) mai ([0-9]+)/\2-05-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) juin ([0-9]+)/\2-06-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) juillet ([0-9]+)/\2-07-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) aout ([0-9]+)/\2-08-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) septembre ([0-9]+)/\2-09-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) octobre ([0-9]+)/\2-10-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) novembre ([0-9]+)/\2-11-\1/" ] if:transform [ type = "http://namespaces.xmlschemata.org/xvif/regexp" apply = "s/([0-9]+) decembre ([0-9]+)/\2-12-\1/" ] if:validate [ type = "http://relaxng.org/ns/structure/1.0" if:apply [ rng:data [ type = "date" rng:param [ name = "minInclusive" "1900-01-01" ] rng:param [ name = "maxInclusive" "2099-12-31" ] ] ] ] ] ] element born { [ if:ignore = "1" ] text } |
In this example, I define a pipe (if:pipe) of 15 transformations (if:transform) using regular expressions. Each converts one of the twelve months; a final validation (if:validate) is itself using RELAX NG to check that the result is a ISO 8601 date between 1900 and 2099. The text pattern has an if:ignore attribute, which shows XVIF-compliant processors that it is a fallback pattern for other RELAX NG processors.
This text is released under the Free Software Foundation GFDL.