by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)


Using External References

External references offer a powerful but simple mechanism for including a pattern contained in an external document at any location in a schema. This feature works through raw inclusion of the referenced external document. The externalRef pattern is replaced by the content of the document. That document may be a complete RELAX NG schema, though that isn't required, but a valid pattern is required.

With Russian Doll Schemas

You may want to reuse existing schemas as a whole, without modifying any of their definitions. Imagine, for instance, that we have defined two grammars in two schemas to describe our author and character elements. First, create a RELAX NG schema, author.rng, to describe our authors:

 <?xml version="1.0" encoding="UTF-8"?>
 <element name="author" xmlns="http://relaxng.org/ns/structure/1.0" 
   datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <attribute name="id">
     <data type="ID"/>
   </attribute>
   <element name="name">
     <data type="token" datatypeLibrary=""/>
   </element>
   <optional>
     <element name="born">
       <data type="date"/>
     </element>
   </optional>
   <optional>
     <element name="died">
       <data type="date"/>
     </element>
   </optional>
 </element>

or, in the compact syntax, author.rnc:

 element author {
   attribute id { xsd:ID },
   element name { token },
   element born { xsd:date }?,
   element died { xsd:date }?
 }

Then create a second schema, character.rng, to describe our characters:

 <?xml version="1.0" encoding="UTF-8"?>
 <element name="character" xmlns="http://relaxng.org/ns/structure/1.0" 
   datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <attribute name="id">
     <data type="ID"/>
   </attribute>
   <element name="name">
     <data type="token" datatypeLibrary=""/>
   </element>
   <optional>
     <element name="born">
       <data type="date"/>
     </element>
   </optional>
   <element name="qualification">
     <data type="token" datatypeLibrary=""/>
   </element>
 </element>

or, in the compact syntax, character.rnc:

 element character { 
   attribute id { xsd:ID },
   element name { token },
   element born { xsd:date }?,
   element qualification { token }
 }

To combine these components into a schema describing our library, use externalRef patterns:

 <?xml version="1.0" encoding="UTF-8"?>
 <element name="library" xmlns="http://relaxng.org/ns/structure/1.0" 
   datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <oneOrMore>
     <element name="book">
       <attribute name="id">
         <data type="ID"/>
       </attribute>
       <attribute name="available">
         <data type="boolean"/>
       </attribute>
       <element name="isbn">
         <data type="token" datatypeLibrary=""/>
       </element>
       <element name="title">
         <attribute name="xml:lang">
           <data type="language"/>
         </attribute>
         <data type="token" datatypeLibrary=""/>
       </element>
       <oneOrMore>
         <externalRef href="author.rng"/>
       </oneOrMore>
       <zeroOrMore>
         <externalRef href="character.rng"/>
       </zeroOrMore>
     </element>
   </oneOrMore>
 </element>

In the compact syntax, externalRef patterns are represented using the keyword external:

 element library {
   element book {
     attribute id { xsd:ID },
     attribute available { xsd:boolean },
     element isbn { token },
     element title {
       attribute xml:lang { xsd:language },
       token
     },
     external "author.rnc" +,
     external "character.rnc" *
   }+
 }

The externalRef pattern performs direct inclusion: when a RELAX NG processor reads a schema, it replaces externalRef with the contents of the referred document.

With Flat Schemas

The previous example used externalRef to include the content of Russian doll schemas, but this also works with flat schemas. For instance, we might change our author schema, author.rng, to read:

 <?xml version="1.0" encoding="UTF-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" 
          datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   
   <start>
     <ref name="element-author"/>
   </start>
   
   <define name="element-author">
     <element name="author">
       <attribute name="id">
         <data type="ID"/>
       </attribute>
       <ref name="element-name"/>
       <optional>
         <ref name="element-born"/>
       </optional>
       <optional>
         <ref name="element-died"/>
       </optional>
     </element>
   </define>
   
   <define name="element-name">
     <element name="name">
       <data type="token" datatypeLibrary=""/>
     </element>
   </define>
   
   <define name="element-born">
     <element name="born">
       <data type="date"/>
     </element>
   </define>
   
   <define name="element-died">
     <element name="died">
       <data type="date"/>
     </element>
   </define>
   
 </grammar>

or the compact syntax, author.rnc, to:

 start = element-author
 element-author =
   element author {
     attribute id { xsd:ID },
     element-name,
     element-born?,
     element-died?
   }
 element-name = element name { token }
 element-born = element born { xsd:date }
 element-died = element died { xsd:date }

And our character schema, character.rng, to:

 <?xml version="1.0" encoding="UTF-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" 
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
 
   <start>
     <ref name="element-character"/>
   </start>
   
   <define name="element-character">
     <element name="character">
       <attribute name="id">
         <data type="ID"/>
       </attribute>
       <ref name="element-name"/>
       <optional>
         <ref name="element-born"/>
       </optional>
       <ref name="element-qualification"/>
     </element>
   </define>
   
   <define name="element-name">
     <element name="name">
       <data type="token" datatypeLibrary=""/>
     </element>
   </define>
   
   <define name="element-born">
     <element name="born">
       <data type="date"/>
     </element>
   </define>
   
   <define name="element-qualification">
     <element name="qualification">
       <data type="token" datatypeLibrary=""/>
     </element>
   </define>

 </grammar>

or, in the compact syntax, character.rnc:

 start = element-character
 element-character =
   element character {
     attribute id { xsd:ID },
     element-name,
     element-born?,
     element-qualification
   }
 element-name = element name { token }
 element-born = element born { xsd:date }
 element-qualification = element qualification { token }

The schema using externalRef and external in the previous section will have no difficulty using these flat schemas in place of the Russian doll versions.

Embedding Grammars

This seems straightforward and logical, but why does this approach work? How come there is no collision between the named patterns element-name and element-born defined in both author.rng and character.rng? Why is it that the start patterns defined in author.rng and character.rng don't apply to the schema for our library?

This works because of a RELAX NG feature called embedded grammars. As I have already mentioned, externalRef patterns perform strict inclusion of the referred schema. Using our last example, this means that our resulting schema is:

 <?xml version="1.0" encoding="UTF-8"?>
 <element name="library" xmlns="http://relaxng.org/ns/structure/1.0"
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <oneOrMore>
     <element name="book">
       <attribute name="id">
         <data type="ID"/>
       </attribute>
       <attribute name="available">
         <data type="boolean"/>
       </attribute>
       <element name="isbn">
         <data type="token" datatypeLibrary=""/>
       </element>
       <element name="title">
         <attribute name="xml:lang">
           <data type="language"/>
         </attribute>
         <data type="token" datatypeLibrary=""/>
       </element>
       <oneOrMore>
         <grammar>
           <start>
             <ref name="element-author"/>
           </start>
           <define name="element-author">
             <element name="author">
               <attribute name="id">
                 <data type="ID"/>
               </attribute>
               <ref name="element-name"/>
               <optional>
                 <ref name="element-born"/>
               </optional>
               <optional>
                 <ref name="element-died"/>
               </optional>
             </element>
           </define>
           <define name="element-name">
             <element name="name">
               <data type="token" datatypeLibrary=""/>
             </element>
           </define>
           <define name="element-born">
             <element name="born">
               <data type="date"/>
             </element>
           </define>
           <define name="element-died">
             <element name="died">
               <data type="date"/>
             </element>
           </define>
         </grammar>
       </oneOrMore>
       <zeroOrMore>
         <grammar>
           <start>
             <ref name="element-character"/>
           </start>
           <define name="element-character">
             <element name="character">
               <attribute name="id">
                 <data type="ID"/>
               </attribute>
               <ref name="element-name"/>
               <optional>
                 <ref name="element-born"/>
               </optional>
               <ref name="element-qualification"/>
             </element>
           </define>
           <define name="element-name">
             <element name="name">
               <data type="token" datatypeLibrary=""/>
             </element>
           </define>
           <define name="element-born">
             <element name="born">
               <data type="date"/>
             </element>
           </define>
           <define name="element-qualification">
             <element name="qualification">
               <data type="token" datatypeLibrary=""/>
             </element>
           </define>
         </grammar>
       </zeroOrMore>
     </element>
   </oneOrMore>
 </element>

or, in the compact syntax:

 element library {
   element book {
     attribute id { xsd:ID },
     attribute available { xsd:boolean },
     element isbn { token },
     element title {
       attribute xml:lang { xsd:language },
       token
     },
     grammar {
       start = element-author
       element-author =
         element author {
           attribute id { xsd:ID },
           element-name,
           element-born?,
           element-died?
         }
       element-name = element name { token }
       element-born = element born { xsd:date }
       element-died = element died { xsd:date }
     }+,
     grammar {
       start = element-character
       element-character =
         element character {
           attribute id { xsd:ID },
           element-name,
           element-born?,
           element-qualification
         }
       element-name = element name { token }
       element-born = element born { xsd:date }
       element-qualification = element qualification { token }
     }*
   }+
 }

Here we are thus embedding grammars within our schema, and they behave as patterns. In fact there's even more than that: for RELAX NG, grammars are patterns! The meaning of these patterns is twofold:

  • As far as validation is concerned, embedded grammars are equivalent to their start patterns: the grammar describing the character element, for instance, matches instance nodes corresponding to its start pattern—i.e., instance nodes matching the pattern element-character, which is what was expected.

  • Grammars also set the scope of their definitions: start and named patterns defined in a grammar are visible only in this grammar. Their scope (the location where they can be referred to) is strictly limited to the grammar in which they are defined.

Applied to our example, the strict scoping of start and named patterns means that:

  • The born pattern of the grammar describing the character element can't be seen from its parent grammar—i.e., the grammar describing the library and book elements. Nor can it be seen from its sibling grammar—i.e., the grammar describing the author element. The same applies to start patterns.

  • Unlike common usage among programming languages, the scopes of start and named patterns don't include embedded grammars. start and named patterns defined in the grammar describing the library and book elements aren't visible in the embedded grammars.

Referencing Patterns in Parent Grammars

This strict isolation of start and named patterns in their grammars is usually convenient when you create references to external grammars. It means that external grammars can be written independently without risk of collision or incompatibility. You can safely take any RELAX NG schema, drop it into a new schema, and see it as a single pattern without any risk of collision.

On the other hand, that approach doesn't let you modify what you include (you will see how to do so in the next section) nor even let you leverage a set of common named patterns. In our example, since there are already \\ two definitions of element-name and element-born, it's a good thing that they are both isolated in their grammars. If you were designing the same building blocks from scratch, however, you'd probably want to have only one definition of these two elements that could be shared by the author and character elements. In fact, if you followed the principle "if it's written more than once, make it common," you'd also want to share the definition of the id attribute.

Parent references let you make an explicit reference to a pattern from the parent grammar—i.e., the grammar embedding the current one. In this case, you need to add the definition that you want to share in the top-level schema even if you don't use all of them in this schema:

<?xml version="1.0" encoding="UTF-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" 
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

   <start>
     <element name="library">
       <oneOrMore>
         <element name="book">
           <ref name="attribute-id"/>
           <attribute name="available">
             <data type="boolean"/>
           </attribute>
           <element name="isbn">
             <data type="token" datatypeLibrary=""/>
           </element>
           <element name="title">
             <attribute name="xml:lang">
               <data type="language"/>
             </attribute>
             <data type="token" datatypeLibrary=""/>
           </element>
           <oneOrMore>
             <externalRef href="author.rng"/>
           </oneOrMore>
           <zeroOrMore>
             <externalRef href="character.rng"/>
           </zeroOrMore>
         </element>
       </oneOrMore>
     </element>
   </start>

   <define name="element-name">
     <element name="name">
       <data type="token" datatypeLibrary=""/>
     </element>
   </define>
   
   <define name="element-born">
     <element name="born">
       <data type="date"/>
     </element>
   </define>
   
   <define name="attribute-id">
     <attribute name="id">
       <data type="ID"/>
     </attribute>
   </define>
   
 </grammar>

or:

 start =
   element library {
     element book {
       attribute-id,
       attribute available { xsd:boolean },
       element isbn { token },
       element title {
         attribute xml:lang { xsd:language },
         token
       },
       external "author.rnc"+,
       external "character.rnc"*
     }+
   }
 element-name = element name { token }
 element-born = element born { xsd:date }
 attribute-id = attribute id { xsd:ID }

Now, to make a reference to the named patterns element-name, element-born, and attribute-id in the embedded grammars, use a pattern called parentRef. This pattern makes author.rng look like:

 <?xml version="1.0" encoding="UTF-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" 
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <start>
     <ref name="element-author"/>
   </start>
   <define name="element-author">
     <element name="author">
       <attribute name="id">
         <data type="ID"/>
       </attribute>
       <parentRef name="element-name"/>
       <optional>
         <parentRef name="element-born"/>
       </optional>
       <optional>
         <ref name="element-died"/>
       </optional>
     </element>
   </define>
   <define name="element-died">
     <element name="died">
       <data type="date"/>
     </element>
   </define>
 </grammar>

and the character.rng schema now looks like:

 <?xml version="1.0" encoding="UTF-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" 
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <start>
     <ref name="element-character"/>
   </start>
   <define name="element-character">
     <element name="character">
       <attribute name="id">
         <data type="ID"/>
       </attribute>
       <parentRef name="element-name"/>
       <optional>
         <parentRef name="element-born"/>
       </optional>
       <ref name="element-qualification"/>
     </element>
   </define>
   <define name="element-qualification">
     <element name="qualification">
       <data type="token" datatypeLibrary=""/>
     </element>
   </define>
 </grammar>

The parentRef pattern is translated to a parent keyword in the compact syntax. The author.rnc schema looks like:

 start = element-author
 element-author =
   element author {
     attribute id { xsd:ID },
     parent element-name,
     parent element-born?,
     element-died?
   }
 element-died = element died { xsd:date }

while the character.rnc schema looks like:

 start = element-character
 element-character =
   element character {
     attribute id { xsd:ID },
     parent element-name,
     parent element-born?,
     element-qualification
   }
 element-qualification = element qualification { token }

You are using these features in the context of multiple schema documents, but the semantic of the externalRef pattern itself remains the same. This schema is equivalent to the same schema, with its externalRef patterns expanded in a single monolithic schema with two embedded grammars:

 <?xml version="1.0" encoding="UTF-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0" 
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <start>
     <element name="library">
       <oneOrMore>
         <element name="book">
           <ref name="attribute-id"/>
           <attribute name="available">
             <data type="boolean"/>
           </attribute>
           <element name="isbn">
             <data type="token" datatypeLibrary=""/>
           </element>
           <element name="title">
             <attribute name="xml:lang">
               <data type="language"/>
             </attribute>
             <data type="token" datatypeLibrary=""/>
           </element>
           <oneOrMore>
             <grammar>
               <start>
                 <ref name="element-author"/>
               </start>
               <define name="element-author">
                 <element name="author">
                   <attribute name="id">
                     <data type="ID"/>
                   </attribute>
                   <parentRef name="element-name"/>
                   <optional>
                     <parentRef name="element-born"/>
                   </optional>
                   <optional>
                     <ref name="element-died"/>
                   </optional>
                 </element>
               </define>
               <define name="element-died">
                 <element name="died">
                   <data type="date"/>
                 </element>
               </define>
             </grammar>
           </oneOrMore>
           <zeroOrMore>
             <grammar>
               <start>
                 <ref name="element-character"/>
               </start>
               <define name="element-character">
                 <element name="character">
                   <attribute name="id">
                     <data type="ID"/>
                   </attribute>
                   <parentRef name="element-name"/>
                   <optional>
                     <parentRef name="element-born"/>
                   </optional>
                   <ref name="element-qualification"/>
                 </element>
               </define>
               <define name="element-qualification">
                 <element name="qualification">
                   <data type="token" datatypeLibrary=""/>
                 </element>
               </define>
             </grammar>
           </zeroOrMore>
         </element>
       </oneOrMore>
     </element>
   </start>
   <define name="element-name">
     <element name="name">
       <data type="token" datatypeLibrary=""/>
     </element>
   </define>
   <define name="element-born">
     <element name="born">
       <data type="date"/>
     </element>
   </define>
   <define name="attribute-id">
     <attribute name="id">
       <data type="ID"/>
     </attribute>
   </define>
 </grammar>

or, in the compact syntax:

 start =
   element library {
     element book {
       attribute-id,
       attribute available { xsd:boolean },
       element isbn { token },
       element title {
         attribute xml:lang { xsd:language },
         token
       },
       grammar {
         start = element-author
         element-author =
           element author {
             attribute id { xsd:ID },
             parent element-name,
             parent element-born?,
             element-died?
           }
         element-died = element died { xsd:date }
       }+,
       grammar {
         start = element-character
         element-character =
           element character {
             attribute id { xsd:ID },
             parent element-name,
             parent element-born?,
             element-qualification
           }
         element-qualification = element qualification { token }
       }*
     }+
   }
 element-name = element name { token }
 element-born = element born { xsd:date }
 attribute-id = attribute id { xsd:ID }

You can see how start and named patterns have been defined in each of the three grammars composing this schema:

  • element-died is defined in the grammar defining the author element and can be used only in this grammar.

  • Similarly, element-qualification is defined in the grammar defining the character element and can be used only there.

  • element-name, element-born, and attribute-id are defined in the top-level grammar. They can be used in this grammar through normal references (i.e., ref patterns) and can also be used in its child grammars (i.e., the grammars that are directly embedded into this one, using a parentRef pattern).

There are two more things to note about the parentRef pattern:

  • If the depth of nesting of grammar is higher than two, you may run into trouble because you can make a reference only to your immediate parent grammar, not to the other grammar ancestors. The RELAX NG working group has considered this issue but hasn't found any real-world use case for generalizing parentRef patterns to greater depths of nesting. If you find one, they will probably welcome a mail on the subject! In practice, if you need to do so, you can, as a workaround, define named patterns in the intermediary grammars that can act as proxies.

  • Now that we've added the parentRef patterns to our two schemas, author.rng and character.rng can't be used as standalone schemas for validating documents with author or character root elements. Using them now requires that they be embedded into grammars that provide the definitions for the named patterns they are using to be complete and operational.


This text is released under the Free Software Foundation GFDL.