RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)
You are welcome to use our annotation system to give your feedback.
External references offer a powerful but simple mechanism for including a pattern contained in an external document at any location in a schema. This feature works through raw inclusion of the referenced external document. The externalRef pattern is replaced by the content of the document. That document may be a complete RELAX NG schema, though that is not required, but it is required to include a valid pattern.
We may want to reuse existing schemas as a whole, without modifying any of their definitions. Imagine, for instance that we have defined two grammars in two schemas to describe our author and character elements. We created a first RELAX NG schema, author.rng, to describe our authors:
<?xml version="1.0" encoding="UTF-8"?> <element name="author" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <attribute name="id"> <data type="ID"/> </attribute> <element name="name"> <data type="token" datatypeLibrary=""/> </element> <optional> <element name="born"> <data type="date"/> </element> </optional> <optional> <element name="died"> <data type="date"/> </element> </optional> </element> |
or, in the compact syntax, author.rnc:
element author { attribute id { xsd:ID }, element name { token }, element born { xsd:date }?, element died { xsd:date }? } |
Then we created a second schema, character.rng, to describe our characters:
<?xml version="1.0" encoding="UTF-8"?> <element name="character" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <attribute name="id"> <data type="ID"/> </attribute> <element name="name"> <data type="token" datatypeLibrary=""/> </element> <optional> <element name="born"> <data type="date"/> </element> </optional> <element name="qualification"> <data type="token" datatypeLibrary=""/> </element> </element> |
or, in the compact syntax, character.rnc:
element character { attribute id { xsdID }, element name { token }, element born { xsddate }?, element qualification { token } } |
If we want to combine these components into a schema describing our library, we will use externalRef patterns:
<?xml version="1.0" encoding="UTF-8"?> <element name="library" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <oneOrMore> <element name="book"> <attribute name="id"> <data type="ID"/> </attribute> <attribute name="available"> <data type="boolean"/> </attribute> <element name="isbn"> <data type="token" datatypeLibrary=""/> </element> <element name="title"> <attribute name="xml:lang"> <data type="language"/> </attribute> <data type="token" datatypeLibrary=""/> </element> <oneOrMore> <externalRef href="author.rng"/> </oneOrMore> <zeroOrMore> <externalRef href="character.rng"/> </zeroOrMore> </element> </oneOrMore> </element> |
In the compact syntax, externalRef patterns are represented using the keyword external:
element library { element book { attribute id { xsd:ID }, attribute available { xsd:boolean }, element isbn { token }, element title { attribute xml:lang { xsd:language }, token }, external "author.rnc" +, external "character.rnc" * }+ } |
The externalRef pattern performs direct inclusion: when a RELAX NG processor reads a schema it just replaces externalRef with the contents of the referred document.
In the previous example, we used externalRef with "Russian doll" schemas, but this also works with flat schemas. For instance, we might change our author schema, author.rng, to read:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="element-author"/> </start> <define name="element-author"> <element name="author"> <attribute name="id"> <data type="ID"/> </attribute> <ref name="element-name"/> <optional> <ref name="element-born"/> </optional> <optional> <ref name="element-died"/> </optional> </element> </define> <define name="element-name"> <element name="name"> <data type="token" datatypeLibrary=""/> </element> </define> <define name="element-born"> <element name="born"> <data type="date"/> </element> </define> <define name="element-died"> <element name="died"> <data type="date"/> </element> </define> </grammar> |
or the compact syntax, author.rnc, to:
start = element-author element-author = element author { attribute id { xsd:ID }, element-name, element-born?, element-died? } element-name = element name { token } element-born = element born { xsd:date } element-died = element died { xsd:date } |
And our character schema, character.rng, to: |
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="element-character"/> </start> <define name="element-character"> <element name="character"> <attribute name="id"> <data type="ID"/> </attribute> <ref name="element-name"/> <optional> <ref name="element-born"/> </optional> <ref name="element-qualification"/> </element> </define> <define name="element-name"> <element name="name"> <data type="token" datatypeLibrary=""/> </element> </define> <define name="element-born"> <element name="born"> <data type="date"/> </element> </define> <define name="element-qualification"> <element name="qualification"> <data type="token" datatypeLibrary=""/> </element> </define> |
</grammar> |
or, in the compact syntax, character.rnc:
start = element-character element-character = element character { attribute id { xsd:ID }, element-name, element-born?, element-qualification } element-name = element name { token } element-born = element born { xsd:date } element-qualification = element qualification { token } |
The schema using externalRef and external in the previous section will have no difficulty using these flat schemas in place of the Russian doll versions. This can be very convenient if you've defined
This seems straightforward and logical, but why does this approach work? How come that there is no collision between the named patterns element-name and element-born defined in both "author.rng" and "character.rng"? Why is it that the start patterns defined in "author.rng" and "character.rng" do not apply to the schema for our library?
This is works because of a RELAX NG feature called "embedded grammars". As I have already mentioned, externalRef patterns perform strict inclusion of the referred schema. In our last example, this means that our resulting schema is:
<?xml version="1.0" encoding="UTF-8"?> <element name="library" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <oneOrMore> <element name="book"> <attribute name="id"> <data type="ID"/> </attribute> <attribute name="available"> <data type="boolean"/> </attribute> <element name="isbn"> <data type="token" datatypeLibrary=""/> </element> <element name="title"> <attribute name="xml:lang"> <data type="language"/> </attribute> <data type="token" datatypeLibrary=""/> </element> <oneOrMore> <grammar> <start> <ref name="element-author"/> </start> <define name="element-author"> <element name="author"> <attribute name="id"> <data type="ID"/> </attribute> <ref name="element-name"/> <optional> <ref name="element-born"/> </optional> <optional> <ref name="element-died"/> </optional> </element> </define> <define name="element-name"> <element name="name"> <data type="token" datatypeLibrary=""/> </element> </define> <define name="element-born"> <element name="born"> <data type="date"/> </element>external "character.rnc" </define> <define name="element-died"> <element name="died"> <data type="date"/> </element> </define> </grammar> </oneOrMore> <zeroOrMore> <grammar> <start> <ref name="element-character"/> </start> <define name="element-character"> <element name="character"> <attribute name="id"> <data type="ID"/> </attribute> <ref name="element-name"/> <optional> <ref name="element-born"/> </optional> <ref name="element-qualification"/> </element> </define> <define name="element-name"> <element name="name"> <data type="token" datatypeLibrary=""/> </element> </define> <define name="element-born"> <element name="born"> <data type="date"/> </element> </define> <define name="element-qualification"> <element name="qualification"> <data type="token" datatypeLibrary=""/> </element> </define> </grammar> </zeroOrMore> </element> </oneOrMore> </element> |
or, in the compact syntax:
element library { element book { attribute id { xsd:ID }, attribute available { xsd:boolean }, element isbn { token }, element title { attribute xml:lang { xsd:language }, token }, grammar { start = element-author element-author = element author { attribute id { xsd:ID }, element-name, element-born?, element-died? } element-name = element name { token } element-born = element born { xsd:date } element-died = element died { xsd:date } }+, grammar { start = element-character element-character = element character { attribute id { xsd:ID }, element-name, element-born?, element-qualification } element-name = element name { token } element-born = element born { xsd:date } element-qualification = element qualification { token } }* }+ } |
Here we are thus embedding grammars within our schema and they behave as patterns. In fact there's even more than that: for RELAX NG, grammars are patterns! The meaning of these patterns is twofold:
As far as validation is concerned, embedded grammars are equivalent to their start patterns: the grammar describing the character element for instance will match instance nodes corresponding to its start pattern, i.e. instance nodes matching the pattern element-character, which is what we were expecting.
Grammars also set the scope of their definitions: start and named patterns defined in a grammar are visible only in this grammar. Their scope (i.e. the location where they can be referred to) is strictly limited to the grammar in which they are defined.
Applied to our example, the strict scoping of start and named patterns means that:
The born pattern of the grammar describing the character element cannot be seen from its parent grammar, i.e. the grammar describing the library and book elements. Nor can it be seen from its sibling grammar, i.e. the grammar describing the author element. The same applies to start patterns.
Unlike common usage among programming languages, the scopes of start and named patterns do not include embedded grammars. start and named patterns defined in the grammar describing the library and book elements would not be visible in the embedded grammars.
This strict isolation of start and named patterns in their grammars is usually convenient when we create references to external grammars. It means that external grammars can be written independently without risk of collision or incompatibility. You can safely take any RELAX NG schema and drop it into a new schema and see it as a single pattern without any risk of collision.
On the other hand, that approach doesn't let you modify what you are including (we will see how to do so in the next section) nor even let you leverage a set of common named patterns. In our example, since we already had two definitions of element-name and element-born, it was a good thing that they were both isolated in their grammars. If we were designing the same building blocks from scratch, however, we would probably want to have only one definition of these two elements which could be shared by the author and character elements. In fact if we were following the principle "if it's written more than once make it common" we would also want to share the definition of the id attribute.
We will see another way to do so, but it is also possible to do so through making an explicit reference to a pattern from the parent grammar, i.e. the grammar embedding the current one. In this case, we need to add the definitions which we want to share in the top level schema even if we do not use all of theme in this schema:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> |
<start> <element name="library"> <oneOrMore> <element name="book"> <ref name="attribute-id"/> <attribute name="available"> <data type="boolean"/> </attribute> <element name="isbn"> <data type="token" datatypeLibrary=""/> </element> <element name="title"> <attribute name="xml:lang"> <data type="language"/> </attribute> <data type="token" datatypeLibrary=""/> </element> <oneOrMore> <externalRef href="author.rng"/> </oneOrMore> <zeroOrMore> <externalRef href="character.rng"/> </zeroOrMore> </element> </oneOrMore> </element> </start> |
<define name="element-name"> <element name="name"> <data type="token" datatypeLibrary=""/> </element> </define> <define name="element-born"> <element name="born"> <data type="date"/> </element> </define> <define name="attribute-id"> <attribute name="id"> <data type="ID"/> </attribute> </define> </grammar> |
or:
start = element library { element book { attribute-id, attribute available { xsd:boolean }, element isbn { token }, element title { attribute xml:lang { xsd:language }, token }, external "author.rnc"+, external "character.rnc"* }+ } element-name = element name { token } element-born = element born { xsd:date } attribute-id = attribute id { xsd:ID } |
Now, to make a reference to the named patterns element-name, element-born and attribute-id in the embedded grammars, we will use a pattern called parentRef. This makes author.rng look like:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="element-author"/> </start> <define name="element-author"> <element name="author"> <attribute name="id"> <data type="ID"/> </attribute> <parentRef name="element-name"/> <optional> <parentRef name="element-born"/> </optional> <optional> <ref name="element-died"/> </optional> </element> </define> <define name="element-died"> <element name="died"> <data type="date"/> </element> </define> </grammar> |
and the character.rng schema now looks like:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="element-character"/> </start> <define name="element-character"> <element name="character"> <attribute name="id"> <data type="ID"/> </attribute> <parentRef name="element-name"/> <optional> <parentRef name="element-born"/> </optional> <ref name="element-qualification"/> </element> </define> <define name="element-qualification"> <element name="qualification"> <data type="token" datatypeLibrary=""/> </element> </define> </grammar> |
The parentRef pattern is translated to a parent keyword in the compact syntax. The author.rnc schema looks like:
start = element-author element-author = element author { attribute id { xsd:ID }, parent element-name, parent element-born?, element-died? } element-died = element died { xsd:date } |
(author.rnc)
While the character.rnc schema looks like:
start = element-character element-character = element character { attribute id { xsd:ID }, parent element-name, parent element-born?, element-qualification } element-qualification = element qualification { token } |
We are using these features in the context of multiple schema documents, but the semantic of the externalRef pattern itself remains the same. This schema is equivalent to the same schema with its externalRef patterns expanded in a single monolithic schema with two embedded grammars:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="library"> <oneOrMore> <element name="book"> <ref name="attribute-id"/> <attribute name="available"> <data type="boolean"/> </attribute> <element name="isbn"> <data type="token" datatypeLibrary=""/> </element> <element name="title"> <attribute name="xml:lang"> <data type="language"/> </attribute> <data type="token" datatypeLibrary=""/> </element> <oneOrMore> <grammar> <start> <ref name="element-author"/> </start> <define name="element-author"> <element name="author"> <attribute name="id"> <data type="ID"/> </attribute> <parentRef name="element-name"/> <optional> <parentRef name="element-born"/> </optional> <optional> <ref name="element-died"/> </optional> </element> </define> <define name="element-died"> <element name="died"> <data type="date"/> </element> </define> </grammar> </oneOrMore> <zeroOrMore> <grammar> <start> <ref name="element-character"/> </start> <define name="element-character"> <element name="character"> <attribute name="id"> <data type="ID"/> </attribute> <parentRef name="element-name"/> <optional> <parentRef name="element-born"/> </optional> <ref name="element-qualification"/> </element> </define> <define name="element-qualification"> <element name="qualification"> <data type="token" datatypeLibrary=""/> </element> </define> </grammar> </zeroOrMore> </element> </oneOrMore> </element> </start> <define name="element-name"> <element name="name"> <data type="token" datatypeLibrary=""/> </element> </define> <define name="element-born"> <element name="born"> <data type="date"/> </element> </define> <define name="attribute-id"> <attribute name="id"> <data type="ID"/> </attribute> </define> </grammar> |
or, in the compact syntax:
start = element library { element book { attribute-id, attribute available { xsd:boolean }, element isbn { token }, element title { attribute xml:lang { xsd:language }, token }, grammar { start = element-author element-author = element author { attribute id { xsd:ID }, parent element-name, parent element-born?, element-died? } element-died = element died { xsd:date } }+, grammar { start = element-character element-character = element character { attribute id { xsd:ID }, parent element-name, parent element-born?, element-qualification } element-qualification = element qualification { token } }* }+ } element-name = element name { token } element-born = element born { xsd:date } attribute-id = attribute id { xsd:ID } |
You can see how start and named patterns have been defined in each of the three grammars composing this schema:
element-died is defined in the grammar defining the author element and can only be used in this grammar.
Similarly, element-qualification is defined in the grammar defining the character element and can only be used there.
element-name, element-born and attribute-id are defined in the top level grammar. They can be used in this grammar through normal references (i.e. ref patterns) and can also be used in its child grammars, i.e. the grammars which are directly embedded into this one, using a parentRef pattern.
There are two more things to note about the parentRef pattern:
If the depth of nesting of grammar is higher than two, you may run into troubles since you can only make a reference to your immediate parent grammar, not to the other grammar ancestors. The RELAX NG working group has considered this issue but hasn't found any real world use case for generalizing parentRef patterns to greater depths of nesting. If you find one they will probably welcome a mail on the subject! In practice, if we needed to do so, we would have as a workaround to define named patterns in the intermediary grammars that would act as "proxies".
Now that we have added the parentRef patterns to our two schemas, "author.rng" and "character.rng" cannot be used as standalone schemas for validating documents with author or character root elements. Using them now requires that they be embedded into grammars which provide the definitions for the named patterns they are using to be complete and operational.
You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.