Extensible schemas

RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

Extensible schemas
Prevï¿½	Chapter 12: Writing Extensible Schemas	ï¿½Next

Extensible schemas

Sometimes building an extensible schema is a matter of capturing existing practice in RELAX NG, while other times the schema development comes before practice, and the schema developer has the opportunity to make a lot of choices. We often have to do our best to write an extensible schema for an existing XML vocabulary ( as if we were asked to cook the best "blanquette de veau"), one whose contents are already specified. This type of recipe contrasts with the type where we have the freedom to change the format itself and decide when we will use elements or attributes, whether order matters, and many other variables (as if we were asked to cook the best food containing veal without knowing which meal was considered best).

Working from a Fixed Result

In the case of a fixed result, the only way we can manage extensibility lies in how named patterns are defined, much the same way that programmers' decisions about how to define classes in object oriented environments have a lot of impact on its extensibility. In this section, we will examine the major approaches to use when defining named patterns and start elements with extensibility in mind.

Providing a grammar and a start element

Let's have a look back at our first schema, the "Russian doll" schema:

 <?xml version="1.0" encoding="utf-8"  ?>
 <element xmlns="http://relaxng.org/ns/structure/1.0" name="library">
  <oneOrMore>
   <element name="book">
    <attribute name="id"/>
    <attribute name="available"/>
    <element name="isbn">
     <text/>
    </element>
    <element name="title">
     <attribute name="xml:lang"/>
     <text/>
    </element>
    <zeroOrMore>
     <element name="author">
      <attribute name="id"/>
      <element name="name">
       <text/>
      </element>
      <element name="born">
       <text/>
      </element>
      <optional>
       <element name="died">
        <text/>
       </element>
      </optional>
     </element>
    </zeroOrMore>
    <zeroOrMore>
     <element name="character">
      <attribute name="id"/>
      <element name="name">
       <text/>
      </element>
      <element name="born">
       <text/>
      </element>
      <element name="qualification">
       <text/>
      </element>
     </element>
    </zeroOrMore>
   </element>
  </oneOrMore>
 </element>

or, in the compact syntax:

 element library {
  element book {
   attribute id {text},
   attribute available {text},
   element isbn {text},
   element title {attribute xml:lang {text}, text},
   element author {
    attribute id {text},
    element name {text},
    element born {text},
    element died {text}?}*,
   element character {
    attribute id {text},
    element name {text},
    element born {text},
    element qualification {text}}*
  } +
 }

What happens if we want to derive a schema that has a new id attribute on the library element? That's simple: we have to take our schema, copy it, and edit it as a new one. There is no option for extensibility at all since we cannot include an attribute which doesn't have a grammar element as a root.

The first thing to consider when we want a RELAX NG schema to be extensible is that we always want the root element to be a grammar element. In this case, the change, producing russian-doll.rng, is minor:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0">
   <start>
     <element name="library">
       <oneOrMore>
         <element name="book">
           <attribute name="id"/>
           <attribute name="available"/>
           <element name="isbn">
             <text/>
           </element>
           <element name="title">
             <attribute name="xml:lang"/>
             <text/>
           </element>
           <zeroOrMore>
             <element name="author">
               <attribute name="id"/>
               <element name="name">
                 <text/>
               </element>
               <element name="born">
                 <text/>
               </element>
               <optional>
                 <element name="died">
                   <text/>
                 </element>
               </optional>
             </element>
           </zeroOrMore>
           <zeroOrMore>
             <element name="character">
               <attribute name="id"/>
               <element name="name">
                 <text/>
               </element>
               <element name="born">
                 <text/>
               </element>
               <element name="qualification">
                 <text/>
               </element>
             </element>
           </zeroOrMore>
         </element>
       </oneOrMore>
     </element>
   </start>
 </grammar>

In the compact syntax, grammar is implicit, but you still need to have a start pattern if you want to be able to redefine anything. The result of adding this, russian-doll.rnc, looks like:

 start =
    element library
    {
       element book
       {
          attribute id { text },
          attribute available { text },
          element isbn { text },
          element title { attribute xml:lang { text }, text },
          element author
          {
             attribute id { text },
             element name { text },
             element born { text },
             element died { text }?
          }*,
          element character
          {
             attribute id { text },
             element name { text },
             element born { text },
             element qualification { text }
          }*
       }+
    }

Once these minor changes have been made, the schema can at least be included into another schema and modified there.

Maximize granularity

Although the previous schemas can be redefined, this redefinition is ineffective since the granularity is very coarse and so we can't redefine just the library element. The best we can do are these, which aren't much of an improvement:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0">
   <include href="russian-doll.rng">
     <start>
       <element name="library">
         <attribute name="id"/>
         <oneOrMore>
           <element name="book">
             <attribute name="id"/>
             <attribute name="available"/>
             <element name="isbn">
               <text/>
             </element>
             <element name="title">
               <attribute name="xml:lang"/>
               <text/>
             </element>
             <zeroOrMore>
               <element name="author">
                 <attribute name="id"/>
                 <element name="name">
                   <text/>
                 </element>
                 <element name="born">
                   <text/>
                 </element>
                 <optional>
                   <element name="died">
                     <text/>
                   </element>
                 </optional>
               </element>
             </zeroOrMore>
             <zeroOrMore>
               <element name="character">
                 <attribute name="id"/>
                 <element name="name">
                   <text/>
                 </element>
                 <element name="born">
                   <text/>
                 </element>
                 <element name="qualification">
                   <text/>
                 </element>
               </element>
             </zeroOrMore>
           </element>
         </oneOrMore>
       </element>
     </start>
   </include>
 </grammar>

or:

 include "russian-doll.rnc"
 {
 start =
       element library
       {
          attribute id { text },
          element book
          {
             attribute id { text },
             attribute available { text },
             element isbn { text },
             element title { attribute xml:lang { text }, text },
             element author
             {
                attribute id { text },
                element name { text },
                element born { text },
                element died { text }?
             }*,
             element character
             {
                attribute id { text },
                element name { text },
                element born { text },
                element qualification { text }
             }*
          }+
       }
 }

In other words, we still need to redefine the whole schema. We've made no gains in modularity since any changes in the original schema would not be propagated into our resulting schema. To fix this, we need to create finer-grained definitions. Creating finer granularity involves defining a named pattern for each element (as with the schema style imposed by DTDs). That approach leads to a schema similar to the flat schema seen in Chapter 5: Flattening our first schema, called flat.rng:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0">
   <start>
     <ref name="library-element"/>
   </start>
   <define name="library-element">
     <element name="library">
       <oneOrMore>
         <ref name="book-element"/>
       </oneOrMore>
     </element>
   </define>
   <define name="author-element">
     <element name="author">
       <attribute name="id"/>
       <ref name="name-element"/>
       <ref name="born-element"/>
       <optional>
         <ref name="died-element"/>
       </optional>
     </element>
   </define>
   <define name="book-element">
     <element name="book">
       <attribute name="id"/>
       <attribute name="available"/>
       <ref name="isbn-element"/>
       <ref name="title-element"/>
       <zeroOrMore>
         <ref name="author-element"/>
       </zeroOrMore>
       <zeroOrMore>
         <ref name="character-element"/>
       </zeroOrMore>
     </element>
   </define>
   <define name="born-element">
     <element name="born">
       <text/>
     </element>
   </define>
   <define name="character-element">
     <element name="character">
       <attribute name="id"/>
       <ref name="name-element"/>
       <ref name="born-element"/>
       <ref name="qualification-element"/>
     </element>
   </define>
   <define name="died-element">
     <element name="died">
       <text/>
     </element>
   </define>
   <define name="isbn-element">
     <element name="isbn">
       <text/>
     </element>
   </define>
   <define name="name-element">
     <element name="name">
       <text/>
     </element>
   </define>
   <define name="qualification-element">
     <element name="qualification">
       <text/>
     </element>
   </define>
   <define name="title-element">
     <element name="title">
       <attribute name="xml:lang"/>
       <text/>
     </element>
   </define>
 </grammar>

or, in the compact syntax, flat.rnc:

 start = library-element
        
 library-element = element library { book-element+ }
 author-element =
    element author
    {
       attribute id { text },
       name-element,
       born-element,
       died-element?
    }
        
 book-element =
    element book
    {
       attribute id { text },
       attribute available { text },
       isbn-element,
       title-element,
       author-element*,
       character-element*
    }
        
 born-element = element born { text }

 character-element =
    element character
    {
       attribute id { text },
       name-element,
       born-element,
       qualification-element
    }
        
 died-element = element died { text }
        
 isbn-element = element isbn { text }
        
 name-element = element name { text }
        
 qualification-element = element qualification { text }
        
 title-element = element title { attribute xml:lang { text }, text }

These new schemas are more verbose, but they're also much more extensible. To add our id attribute, we would only need to redefine the library element:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0">
   <include href="flat.rng">
     <define name="library-element">
       <element name="library">
         <attribute name="id"/>
         <oneOrMore>
           <ref name="book-element"/>
         </oneOrMore>
       </element>
     </define>
   </include>
 </grammar>

or:

 include "flat.rnc"
 {
    library-element = element library { attribute id { text }, book-element+ }
 }

All changes made to the flat schemas - except to the library element - would now propagate through to the derived schemas.

Defining named patterns for content rather than for elements

Although the previous result is much more extensible, we still have to redefine the complete content of the library element to add our id attribute. We may have reduced the problem of redefinition we had with our Russian doll model, but we haven't eliminated it. If we change our main vocabulary and add a new attribute or element to the library element in "flat.rng", the modification will not be automatically taken into account in our schema. We'll need to edit it.

The modification isn't automatically transferred because the extensibility of a named pattern doesn't cross element boundaries. Since we have the boundary of the library element included within our library-element named pattern, the content of this element isn't extensible, as shown in Figureï¿½1.

Figureï¿½1.ï¿½A flat schema which is difficult to extend.

To avoid this difficulty, we could have split our named patterns according to the content of the elements rather than by the element themselves. We would then have been able to add new content within the library element, as shown in Figureï¿½2.

Figureï¿½2.ï¿½A split schema which is easier to extend.

Generalizing this approach for all the definitions of all the elements would lead to a schema that looks like flat-content.rng:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0">
   <start>
     <element name="library">
       <ref name="library-content"/>
     </element>
   </start>
   <define name="library-content">
     <oneOrMore>
       <element name="book">
         <ref name="book-content"/>
       </element>
     </oneOrMore>
   </define>
   <define name="book-content">
     <attribute name="id"/>
     <attribute name="available"/>
     <element name="isbn">
       <ref name="isbn-content"/>
     </element>
     <element name="title">
       <ref name="title-content"/>
     </element>
     <zeroOrMore>
       <element name="author">
         <ref name="author-content"/>
       </element>
     </zeroOrMore>
     <zeroOrMore>
       <element name="character">
         <ref name="character-content"/>
       </element>
     </zeroOrMore>
   </define>
   <define name="author-content">
     <attribute name="id"/>
     <element name="name">
       <ref name="name-content"/>
     </element>
     <element name="born">
       <ref name="born-content"/>
     </element>
     <optional>
       <element name="died">
         <ref name="died-content"/>
       </element>
     </optional>
   </define>
   <define name="born-content">
       <text/>
   </define>
   <define name="character-content">
     <attribute name="id"/>
     <element name="name">
       <ref name="name-content"/>
     </element>
     <element name="born">
       <ref name="born-content"/>
     </element>
     <element name="qualification">
       <ref name="qualification-content"/>
     </element>
   </define>
   <define name="died-content">
     <text/>
   </define>
   <define name="isbn-content">
     <text/>
   </define>
   <define name="name-content">
     <text/>
   </define>
   <define name="qualification-content">
     <text/>
   </define>
   <define name="title-content">
     <attribute name="xml:lang"/>
     <text/>
   </define>
 </grammar>

or, in the compact syntax, flat-content.rnc:

 start = element library { library-content }

 library-content = element book { book-content }+

 book-content =
    attribute id { text },
    attribute available { text },
    element isbn { isbn-content },
    element title { title-content },
    element author { author-content }*,
    element character { character-content }*

 author-content =
    attribute id { text },
    element name { name-content },
    element born { born-content },
    element died { died-content }?

 born-content = text

 character-content =
    attribute id { text },
    element name { name-content },
    element born { born-content },
    element qualification { qualification-content }

 died-content = text

 isbn-content = text

 name-content = text

 qualification-content = text

 title-content = attribute xml:lang { text }, text

We can now take full advantage of the named pattern and, instead of redefining it, we can combine it neatly with the id attribute:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0">
   <include href="flat-content.rng"/>
   <define name="library-content" combine="interleave">
     <attribute name="id"/>
   </define>
 </grammar>

or:

 include "flat-content.rnc"

 library-content &= attribute id { text }

Because of the nature of the content, the extension could be done using a combination by interleave. This method of combination is frequently useful, when attributes or elements need to be added, but it only works when the relative order isn't significant for the schema. Otherwise, we would still have needed to redefine the pattern or to combine it by choice.

  <define name="book-content">
    <interleave>
      <attribute name="id"/>
      <attribute name="available"/>
      <element name="isbn">
        <ref name="isbn-content"/>
      </element>
      <element name="title">
        <ref name="title-content"/>
      </element>
      <zeroOrMore>
        <element name="author">
          <ref name="author-content"/>
        </element>
      </zeroOrMore>
      <zeroOrMore>
        <element name="character">
          <ref name="character-content"/>
        </element>
      </zeroOrMore>
    </interleave>
  </define>

or, in the compact syntax:

 book-content =
    attribute id { text }
  & attribute available { text }
  & element isbn { isbn-content }
  & element title { title-content }
  & element author { author-content }*
  & element character { character-content }*

This would allow instance documents where author and character elements are mixed up with the other elements such as that shown in Figureï¿½3:

Figureï¿½3.ï¿½An instance document with interleaved content.

W3C XML Schema cannot support this. In order to define a schema which could more easily be translated into a W3C XML Schema, we can add containers to isolate the author and character elements from the elements which cannot be repeated. The content of the book-content pattern would thus become:

  <define name="book-content">
    <interleave>
      <attribute name="id"/>
      <attribute name="available"/>
      <element name="isbn">
        <ref name="isbn-content"/>
      </element>
      <element name="title">
        <ref name="title-content"/>
      </element>
      <element name="authors">
        <zeroOrMore>
          <element name="author">
            <ref name="author-content"/>
          </element>
        </zeroOrMore>
      </element>
      <element name="characters">
        <zeroOrMore>
          <element name="character">
            <ref name="character-content"/>
          </element>
        </zeroOrMore>
      </element>
    </interleave>
  </define>

or:

book-content =
  attribute id { text }
 & attribute available { text }
 & element isbn { isbn-content }
 & element title { title-content }
 & element authors { element author { author-content }* }
 & element characters { element character { character-content }* }

and it would validate elements such as those shown in Figureï¿½4:

Figureï¿½4.ï¿½A document with interleaved containers.

The relative order between the isbn, title, authors and characters elements is still not significant, but the author and character elements are now grouped together under containers and cannot interleave between the other elements. That's enough to make this schema much friendler to schema languages with less expressive power than RELAX NG.

Note that even if these containers are not necessary for RELAX NG, they are considered to be a good practice by many XML experts. The containers facilitate the access to author and character elements. The downside is that additional hierarchies are added and XPath expressions which identify the contained elements become more verbose: instead of writing "/library/book/character" to access to the character elements, we will have to write "/library/book/characters/character". This can get long.

You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.

Prevï¿½	Up	ï¿½Next
Chapter 12: Writing Extensible Schemasï¿½	Home	ï¿½The Case for Open Schemas