by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)


Simplification

Since its conception, RELAX NG has always tried to balance simplicity of use, simplicity of implementation, and simplicity of its data model. What's simple to implement is often simple to use, however, there are many features that are very effective for the users but add complexity for the implementers and clutter the data model. This is the case, for instance, for all the features designed to create building blocks (named patterns, includes, and embedded grammars). They are very helpful to users but your use of named patterns or a Russian-doll style has zero impact on the validation itself. This is also the case for shortcuts such as the mixed pattern, which is really just a more concise way of writing an interleave pattern with an embedded text pattern.

The quest for simplicity has had a huge influence on the design of RELAX NG. Here is James Clark on the subject:

Simplicity of specification often goes hand in hand with simplicity of use. But I find that these are often in conflict with simplicity of implementation. An example would be ambiguity restrictions as in W3C XML Schema: these make implementation simpler (well, at least for people who don't want to learn a new algorithm) but make specification and use more complex. In general, RELAX NG aims for implementation to be practical and safe (i.e., implementations shouldn't use huge amounts of time/memory for particular schemas/instances), but apart from that favors simplicity of specification/use over simplicity of implementation.

To keep the description of the restriction and validation algorithm simple while continuing to offer valuable features to the users, RELAX NG describes validation as a two-step process:

  1. The schema is read and simplified. The simplification removes all the additional complexity of the syntactic sugar and reduces the schema to its simplest form.

  2. Instance documents are validated against the simplified schema. Because all the syntactic sugar has been removed from the simplified schema, it doesn't need to be taken into account in the description of the validation, permitting the use of much simpler algorithms.

The simplification is described for each RELAX NG element in the RELAX NG specification, so I won't dive into its details here—just the main points. Simplification removes all syntactic sugar, consolidates all external schemas, uses a subset of all the available RELAX NG elements, and transforms the resulting structure into a flat schema. Each element is embedded in a named pattern, and all the resulting named patterns contain the definition of a single element.

The RELAX NG specification is very clear that this simplification is done by the RELAX NG processors to the data model after reading the complete schema. The result of this simplification doesn't ever have to be serialized as XML. However, showing intermediary results as XML helps to show what the simplification process does.

[Tip]Tip

Intermediary results are indented for readability. In reality, whitespace is removed in one of the first steps of the simplification.

The XML syntax is more similar to the data model used to describe the simplification than is the compact syntax. The details of the simplification are shown next in XML snippets. For each sequence of steps, I've also given the compact syntax for the whole schema, to give a better overall view of the impact on the structure of the schema, although some impacts of simplification are lost when using the compact syntax.

Annotation Removal, Whitespace and Attribute Normalization, and Inheritance

The first step of simplification performs various normalizations without changing the structure of the schema:

  • Annotations (i.e., attributes and elements from foreign namespaces) are removed.

  • Text nodes containing only whitespace are removed, except when found in value and param elements. Whitespace is normalized in name, type, and combine attributes and in name elements.

  • The characters that aren't allowed in the datatypeLibrary attributes are escaped. The attributes are transferred through inheritance to each data and value pattern.

  • If not specified, the type attributes of the value pattern are defaulted to the token datatype from the built in datatype library.

After this set of steps, the structure of the schema is still unchanged, but all cosmetic features, which have no impact on the meaning of the schema, have been removed. For instance, the following schema snippet:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0"
          xmlns:hr="http://eric.van-der-vlist.com/ns/person"
          ns="http://eric.van-der-vlist.com/ns/library"
          xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
          xmlns:sn="http://www.snee.com/ns/stages"
          datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <a:documentation>RELAX NG schema for our library</a:documentation>
    <sn:stages>
      <sn:stage name="library"/>
      <sn:stage name="book"/>
      <sn:stage name="author"/>
      <sn:stage name="character"/>
      <sn:stage name="author-or-book"/>
    </sn:stages>
   <start>
     <choice>
       <element name=" library "  sn:stages="library">
         <oneOrMore>
           <ref name="book-element"/>
         </oneOrMore>
       </element>
       <ref name="book-element" sn:stages="book author-or-book"/>
       <ref name="author-element" sn:stages="author author-or-book"/>
       <ref name="character-element" sn:stages="character"/>
     </choice>
   </start>
   <define name=" author-element ">
     <element name="hr:author" datatypeLibrary="">
       <attribute name="id" 
                  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
         <data type="NMTOKEN">
          <param name="maxLength"> 16 </param>
        </data>
       </attribute>
       <ref name=" name-element"/>
       <ref name="born-element"/>
       <optional>
         <ref name="died-element"/>
       </optional>
     </element>
   </define>
   <define name="available-content">
     <choice>
       <value>true</value>
       <value type="token"> false </value>
       <value> </value>
     </choice>
   </define>
 </grammar>

will be transformed during this sequence of steps into the following (note that I am still showing whitespace for readability, even though it would have been removed):

 <?xml version="1.0"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0"
          xmlns:hr="http://eric.van-der-vlist.com/ns/person"
          xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
          xmlns:sn="http://www.snee.com/ns/stages"
          ns="http://eric.van-der-vlist.com/ns/library">
   <start>
     <choice>
       <element name="library">
         <oneOrMore>
           <ref name="book-element"/>
         </oneOrMore>
       </element>
       <ref name="book-element"/>
       <ref name="author-element"/>
       <ref name="character-element"/>
     </choice>
   </start>
   <define name="author-element">
     <element name="hr:author">
       <attribute name="id">
         <data 
           datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
              type="NMTOKEN">
           <param name="maxLength"> 16 </param>
         </data>
       </attribute>
       <ref name="name-element"/>
       <ref name="born-element"/>
       <optional>
         <ref name="died-element"/>
       </optional>
     </element>
   </define>
   <define name="available-content">
     <choice>
       <value type="token" datatypeLibrary="">true</value>
       <value datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" type="token">
false </value>
       <value type="token" datatypeLibrary=""> </value>
     </choice>
   </define>
 </grammar>

After the first series of steps, our schema looks like this:

 namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
 namespace hr = "http://eric.van-der-vlist.com/ns/person"
 namespace local = ""
 default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
 namespace sn = "http://www.snee.com/ns/stages"
      
 start =
   element library { book-element+ }
   | book-element
   | author-element
   | character-element
 include "foreign.rnc" {
   foreign-elements = element * - (local:* | ns1:* | hr:*) { anything }*
   foreign-attributes = attribute * - (local:* | ns1:* | hr:*) { text }*
 }
 author-element =
   element hr:author {
     attribute id {
       xsd:NMTOKEN { maxLength = " 16 " }
     },
     name-element,
     born-element,
     died-element?
   }
 include "book-content.rnc"
 book-content &= foreign-nodes
 book-element = element book { book-content }
 born-element = element hr:born { xsd:date }
 character-element = external "character-element.rnc"
 died-element = element hr:died { xsd:date }
 isbn-element = element isbn { foreign-attributes, token }
 name-element = element hr:name { xsd:token }
 qualification-element = element qualification { text }
 title-element = element title { foreign-attributes, text }
 available-content = "true" | xsd:token " false " | " "

Retrieval of External Schemas

The second sequence of steps reads and processes externalRef and include patterns:

  • externalRef patterns are replaced by the content of the resource referenced by their href attributes. All the simplification steps up to this one must be recursively applied during this replacement to make sure all schemas are merged at the same level of simplification.

  • The schemas referenced by include patterns are read and all the simplification steps up to this point are recursively applied to these schemas. Their definitions are overridden by those found in the include pattern itself when overrides are used. The content of their grammar is added in a new div pattern to the current schema. The div pattern is needed temporarily to carry namespace information to the next sequence of steps.

After the second step, you get a standalone schema without any reference to external documents.

The following snippet:

<define name="character-element">
  <externalRef href="character-element.rng" 
               ns="http://eric.van-der-vlist.com/ns/library"/>
</define>

is transformed into:

<define name="character-element">
  <grammar ns="http://eric.van-der-vlist.com/ns/library">
    <start>
      <element name="character">
        <attribute name="id"/>
        <parentRef name="name-element"/>
        <parentRef name="born-element"/>
        <parentRef name="qualification-element"/>
      </element>
    </start>
  </grammar>
</define>

And the snippet:

<include href="foreign.rng">
  <define name="foreign-elements">
    <zeroOrMore>
   <element>
        <anyName>
          <except>
            <nsName ns=""/>
            <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
            <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
          </except>
        </anyName>
        <ref name="anything"/>
      </element>
    </zeroOrMore>
  </define>
  <define name="foreign-attributes">
    <zeroOrMore>
      <attribute>
           <anyName>
           <except>
            <nsName ns=""/>
            <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
            <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
          </except>
        </anyName>
      </attribute>
    </zeroOrMore>
  </define>
</include>

becomes:

<div>
  <define name="foreign-elements">
    <zeroOrMore>
      <element>
        <anyName>
       <except>
            <nsName ns=""/>
            <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
            <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
          </except>
        </anyName>
        <ref name="anything"/>
      </element>
    </zeroOrMore>
  </define>
  <define name="foreign-attributes">
    <zeroOrMore>
      <attribute>
        <anyName>
          <except>
            <nsName ns=""/>
            <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
            <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
          </except>
         </anyName>
      </attribute>
    </zeroOrMore>
  </define>
  <define name="anything">
    <zeroOrMore>
      <choice>
        <element>
          <anyName/>
          <ref name="anything"/>
        </element>
        <attribute>
          <anyName/>
        </attribute>
        <text/>
      </choice>
    </zeroOrMore>
  </define>
  <define name="foreign-nodes">
    <zeroOrMore>
      <choice>
        <ref name="foreign-attributes"/>
        <ref name="foreign-elements"/>
      </choice>
    </zeroOrMore>
  </define>
</div>

In the compact syntax, the schema after the second sequence of steps looks like this:

 namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
 namespace hr = "http://eric.van-der-vlist.com/ns/person"
 namespace local = ""
 default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
 namespace sn = "http://www.snee.com/ns/stages"

 start =
   element library { book-element+ }
   | book-element
   | author-element
   | character-element
 div {
   foreign-elements = element * - (local:* | ns1:* | hr:*) { anything }*
   foreign-attributes = attribute * - (local:* | ns1:* | hr:*) { text }*
   anything =
     (element * { anything }
      | attribute * { text }
      | text)*
   foreign-nodes = (foreign-attributes | foreign-elements)*
 }
 author-element =
   element hr:author {
     attribute id {
       xsd:NMTOKEN { maxLength = " 16 " }
     },
     name-element,
     born-element,
     died-element?
   }
 div {
   book-content =
     attribute id { text },
     attribute available { available-content },
     isbn-element,
     title-element,
     author-element*,
     character-element*
 }
 book-content &= foreign-nodes
 book-element = element book { book-content }
 born-element = element hr:born { xsd:date }
 character-element =
   grammar {
     start =
       element character {
         attribute id { text },
         parent name-element,
         parent born-element,
         parent qualification-element
       }
   }
 died-element = element hr:died { xsd:date }
 isbn-element = element isbn { foreign-attributes, token }
 name-element = element hr:name { xsd:token }
 qualification-element = element qualification { text }
 title-element = element title { foreign-attributes, text }
 available-content = "true" | xsd:token " false " | " "

Name Class Normalization

This third sequence of steps performs the normalization of name classes:

  • The name attribute of the element and attribute patterns is replaced by the name element, a name class that matches only a single name.

  • ns attributes are transferred through inheritance to the elements that need them; name, nsName, and value patterns need this attribute to support QName datatypes reliably. (Note that the ns attribute behaves like the default namespace in XML and isn't passed to attributes, which, by default, have no namespace URI.)

  • The QName (qualified name) used in name elements is replaced by their local part. The ns attribute of these elements is replaced by the namespace URI defined for their prefix.

By this third sequence of steps, name classes are almost normalized (the except and choice name class are normalized in the fourth sequence of steps).

During this sequence of steps, the snippet:

<element name="hr:author">
  <attribute name="id">
    <data 
         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
         type="NMTOKEN">
      <param name="maxLength"> 16 </param>
    </data>
  </attribute>
  <ref name="name-element"/>
  <ref name="born-element"/>
  <optional>
    <ref name="died-element"/>
  </optional>
</element>

is transformed into:

<element>
  <name ns="http://eric.van-der-vlist.com/ns/person">author</name>
  <attribute>
    <name ns="">id</name>
    <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
     type="NMTOKEN">
      <param name="maxLength"> 16 </param>
    </data>
  </attribute>
  <ref name="name-element"/>
  <ref name="born-element"/>
  <optional>
    <ref name="died-element"/>
  </optional>
</element>

Note that none of these modifications are visible in the compact syntax. The compact syntax already requires that all namespace declarations be made in the declaration section of the schema and supports no difference between name elements and attributes.

Pattern Normalization

In the fourth sequence of steps, patterns are normalized:

  • div elements are replaced by their children.

  • define, oneOrMore, zeroOrMore, optional, list, and mixed patterns are transformed to have exactly one child pattern. If they have more than one pattern, these patterns are wrapped into a group pattern.

  • element patterns follow a similar rule and are transformed to have exactly one name class and a single child pattern.

  • except patterns and name classes are also transformed to have exactly one child pattern, but since they have a different semantic, their child elements are wrapped in a choice element.

  • If an attribute pattern has no child pattern, a text pattern is added.

  • The group and interleave patterns and the choice pattern and name class are recursively transformed to have exactly two subelements: if it has only one child, it's replaced by this child. If it has more than two children, the first two child elements are combined into a new element until there are exactly two child elements.

  • mixed patterns are transformed into interleave patterns between their unique child pattern and a text pattern.

  • optional patterns are transformed into choice patterns between their unique child pattern and an empty pattern.

  • zeroOrMore patterns are transformed into choice patterns between a oneOrMore pattern including their unique child pattern and an empty pattern.

After the fourth set of steps, the number of different types of patterns has been reduced to a set of "primitive" patterns. All the patterns that are left have a fixed number of child elements.

Here's our example snippet:

<define name="foreign-elements">
  <zeroOrMore>
    <element>
      <anyName>
        <except>
          <nsName ns=""/>
          <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
          <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
        </except>
      </anyName>
      <ref name="anything"/>
    </element>
  </zeroOrMore>
</define>

which is transformed into:

<define name="foreign-elements">
  <choice>
    <oneOrMore>
      <element>
        <anyName>
          <except>
            <choice>
              <choice>
                <nsName ns=""/>
                <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
              </choice>
              <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
            </choice>
          </except>
        </anyName>
        <ref name="anything"/>
    </element>
    </oneOrMore>
    <empty/>
  </choice>
</define>

During the fourth set of steps, our schema becomes:

 namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
 namespace hr = "http://eric.van-der-vlist.com/ns/person"
 namespace local = ""
 default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
 namespace sn = "http://www.snee.com/ns/stages"
      
 start =
   ((element library { book-element+ }
     | book-element)
    | author-element)
   | character-element
 foreign-elements =
   element * - ((local:* | ns1:*) | hr:*) { anything }+
   | empty
 foreign-attributes =
   attribute * - ((local:* | ns1:*) | hr:*) { text }+
   | empty
 anything =
   ((element * { anything }
     | attribute * { text })
    | text)+
   | empty
 foreign-nodes = (foreign-attributes | foreign-elements)+ | empty
 author-element =
   element hr:author {
     ((attribute id {
         xsd:NMTOKEN { maxLength = " 16 " }
       },
       name-element),
      born-element),
     (died-element | empty)
   }
 book-content =
   ((((attribute id { text },
       attribute available { available-content }),
      isbn-element),
     title-element),
    (author-element+ | empty)),
   (character-element+ | empty)
 book-content &= foreign-nodes
 book-element = element book { book-content }
 born-element = element hr:born { xsd:date }
 character-element =
   grammar {
     start =
       element character {
         ((attribute id { text },
           parent name-element),
          parent born-element),
         parent qualification-element
       }
   }
 died-element = element hr:died { xsd:date }
 isbn-element = element isbn { foreign-attributes, token }
 name-element = element hr:name { xsd:token }
 qualification-element = element qualification { text }
 title-element = element title { foreign-attributes, text }
 available-content = ("true" | xsd:token " false ") | " "

It is much more verbose but has a simpler structure.

First Set of Constraints

The first set of constraints is applied at this fourth processing step. They are mainly checks that our document conforms to XML commonsense, but it's easier and safer to check now on the complete schema:

  • It's not possible to define name classes—or except—that contain no name at all by including anyName in an except name class or nsName in an except name class included in another nsName.

  • It's not possible to define attributes having the name xmlns or a namespace URI equal to the namespace URI http://www.w3.org/2000/xmlns (corresponding to the "xmlns" prefix).

  • Datatype libraries are used correctly; each type exists in its datatype library and its param elements are appropriate to that library.

Grammar Merge

define and start elements are combined in each grammar; all grammars are then merged into one top-level grammar:

  1. In each grammar, multiple start elements and multiple define elements with the same name are combined as defined by their combine attribute.

  2. The names of the named patterns are then changed so as to be unique across the whole schema; the references to these named patterns are changed accordingly.

  3. A top-level grammar and its start element are created, if not already present. All the named patterns become children in this top-level grammar, parentRef elements are replaced by ref elements, and all other grammar and start elements are replaced by their child elements.

During this fifth sequence of steps, the simplified schema:

<define name="born-element">
    <element>
      <name ns="http://eric.van-der-vlist.com/ns/person">born</name>
      <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" 
            type="date"/>
    </element>
  </define>
  <define name="character-element">
    <grammar>
      <start>
        <element>
          <name ns="http://eric.van-der-vlist.com/ns/library">character</name>
          <group>
            <group>
              <group>
                <attribute>
                  <name ns="">id</name>
                  <text/>
                </attribute>
                <parentRef name="name-element"/>
              </group>
              <parentRef name="born-element"/>
            </group>
            <parentRef name="qualification-element"/>
          </group>
        </element>
      </start>
    </grammar>
 </define>

becomes:

  <define name="born-element-id2613943">
    <element>
      <name ns="http://eric.van-der-vlist.com/ns/person">born</name>
      <data datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" 
            type="date"/>
    </element>
  </define>
  <define name="character-element-id2613924">
    <element>
      <name ns="http://eric.van-der-vlist.com/ns/library">character</name>
      <group>
        <group>
          <group>
            <attribute>
              <name ns="">id</name>
              <text/>
            </attribute>
            <ref name="name-element-id2613832"/>
          </group>
          <ref name="born-element-id2613943"/>
        </group>
        <ref name="qualification-element-id2613840"/>
      </group>
    </element>
  </define>

No specific algorithm to create unique names for a named pattern is described in the specification, so these names will vary between implementations.

To demonstrate the drastic change that occurs during simplification, I now present a schema that is a consolidation of features seen throughout this book, to cover most of the elements affected by the simplification. It is composed of four documents.

The first, library.rnc (or library.rng in the XML syntax), defines the library in general, but not authors or characters:

 namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
 namespace hr = "http://eric.van-der-vlist.com/ns/person"
 namespace local = ""
 default namespace ns1 = "http://eric.van-der-vlist.com/ns/library"
 namespace sn = "http://www.snee.com/ns/stages"

 a:documentation [ "RELAX NG schema for our library" ]
 sn:stages [[
   sn:stage [ name = "library" ]
   sn:stage [ name = "book" ]
   sn:stage [ name = "author" ]
   sn:stage [ name = "character" ]
   sn:stage [ name = "author-or-book" ]
 ]
 start =
   [ sn:stages = "library" ] element library { book-element+ }
   | [ sn:stages = "book author-or-book" ] book-element
   | [ sn:stages = "author author-or-book" ] author-element
   | [ sn:stages = "character" ] character-element
 include "foreign.rnc" {
   foreign-elements = element * - (local:* | ns1:* | hr:*) { anything }*
   foreign-attributes = attribute * - (local:* | ns1:* | hr:*) { text }*
 }
 author-element =
   element hr:author {
     attribute id {
       xsd:NMTOKEN { maxLength = " 16 " }
     },
     name-element,
     born-element,
     died-element?
   }
 include "book-content.rnc"
 book-content &= foreign-nodes
 book-element = element book { book-content }
 born-element = element hr:born { xsd:date }
 character-element = external "character-element.rnc"
 died-element = element hr:died { xsd:date }
 isbn-element = element isbn { foreign-attributes, token }
 name-element = element hr:name { xsd:token }
 qualification-element = element qualification { text }
 title-element = element title { foreign-attributes, text }
 available-content = "true" | xsd:token " false " | " "

The second, book-content.rnc (or bookcontent.rng in the XML syntax), contains a pattern defining the contents of books:

 book-content =
   attribute id { text },
   attribute available { available-content },
   isbn-element,
   title-element,
   author-element*,
   character-element*

The third, character-element.rnc (or character-element.rng in the XML syntax), defines character elements:

 start =
   element character {
     attribute id { text },
     parent name-element,
     parent born-element,
     parent qualification-element
   }

The last component, foreign.rnc (or foreign.rng), provides a model for openness in the schema:

 anything =
   (element * { anything }
    | attribute * { text }
    | text)*
 foreign-elements = element * { anything }*
 foreign-attributes = attribute * { text }*
 foreign-nodes = (foreign-attributes | foreign-elements)*

Here's the complete schema for the library after the grammar-merging steps are completed:

namespace local = ""
 namespace ns1 = "http://eric.van-der-vlist.com/ns/person"
 default namespace ns2 = "http://eric.van-der-vlist.com/ns/library"

 start =
   ((element library { book-element-id2613963+ }
     | book-element-id2613963)
    | author-element-id2614058)
   | character-element-id2613924
 foreign-elements-id2614183 =
   element * - ((local:* | ns2:*) | ns1:*) { anything-id2614112 }+
   | empty
 foreign-attributes-id2614152 =
   attribute * - ((local:* | ns2:*) | ns1:*) { text }+
   | empty
 anything-id2614112 =
   ((element * { anything-id2614112 }
     | attribute * { text })
    | text)+
   | empty
 foreign-nodes-id2614043 =
   (foreign-attributes-id2614152 | foreign-elements-id2614183)+ | empty
 author-element-id2614058 =
   element ns1:author {
     ((attribute id {
         xsd:NMTOKEN { maxLength = " 16 " }
       },
       name-element-id2613832),
      born-element-id2613943),
     (died-element-id2613856 | empty)
   }
 book-content-id2614016 =
   (((((attribute id { text },
        attribute available { available-content-id2613805 }),
       isbn-element-id2613872),
      title-element-id2613819),
     (author-element-id2614058+ | empty)),
    (character-element-id2613924+ | empty))
   & foreign-nodes-id2614043
 book-element-id2613963 = element book { book-content-id2614016 }
 born-element-id2613943 = element ns1:born { xsd:date }
 character-element-id2613924 =
   element character {
     ((attribute id { text },
       name-element-id2613832),
      born-element-id2613943),
     qualification-element-id2613840
   }
 died-element-id2613856 = element ns1:died { xsd:date }
 isbn-element-id2613872 =
   element isbn { foreign-attributes-id2614152, token }
 name-element-id2613832 = element ns1:name { xsd:token }
 qualification-element-id2613840 = element qualification { text }
 title-element-id2613819 =
   element title { foreign-attributes-id2614152, text }
 available-content-id2613805 = ("true" | xsd:token " false ") | " "

Schema Flattening

The basic style of the schema (Russian-doll or named templates) has still been preserved by the previous steps. The goal of the sixth step, schema flattening, is to normalize the use of named templates. The goal is to make the schema similar in structure to a DTD. Each element will be cleanly embedded in its own named pattern, and named patterns will contain no more than a single element:

  • For each element that isn't the unique child of a define element, a named pattern is created to embed its definition.

  • For each named pattern that isn't embedded, a single element pattern is suppressed. References to this named pattern are replaced by its definition.

During this step, the snippet:

  <start>
    <choice>
      <choice>
        <choice>
          <element>
            <name ns="http://eric.van-der-vlist.com/ns/library">library</name>
            <oneOrMore>
              <ref name="book-element-id2613963"/>
            </oneOrMore>
          </element>
          <ref name="book-element-id2613963"/>
        </choice>
        <ref name="author-element-id2614058"/>
      </choice>
      <ref name="character-element-id2613924"/>
    </choice>
  </start>

is replaced by:

  <start>
    <choice>
      <choice>
        <choice>
          <ref name="_  _library-elt-id2615152"/>
          <ref name="book-element-id2613963"/>
        </choice>
        <ref name="author-element-id2614058"/>
      </choice>
      <ref name="character-element-id2613924"/>
    </choice>
  </start>
  <define name="_  _library-elt-id2615152">
    <element>
      <name ns="http://eric.van-der-vlist.com/ns/library">library</name>
      <oneOrMore>
        <ref name="book-element-id2613963"/>
      </oneOrMore>
    </element>
  </define>

If I take the results of merging the four-part schema from the previous section and apply this step, I get:

 namespace local = ""
 namespace ns1 = "http://eric.van-der-vlist.com/ns/person"
 default namespace ns2 = "http://eric.van-der-vlist.com/ns/library"
      
 start =
   ((_  _library-elt-id2615152 | book-element-id2613963)
    | author-element-id2614058)
   | character-element-id2613924
 author-element-id2614058 =
   element ns1:author {
     ((attribute id {
         xsd:NMTOKEN { maxLength = " 16 " }
       },
       name-element-id2613832),
      born-element-id2613943),
     (died-element-id2613856 | empty)
   }
 book-element-id2613963 =
   element book {
     (((((attribute id { text },
          attribute available { ("true" | xsd:token " false ") | " " }),
         isbn-element-id2613872),
        title-element-id2613819),
       (author-element-id2614058+ | empty)),
      (character-element-id2613924+ | empty))
     & (((attribute * - ((local:* | ns2:*) | ns1:*) { text }+
          | empty)
         | (_  _-elt-id2615098+ | empty))+
        | empty)
   }
 born-element-id2613943 = element ns1:born { xsd:date }
 character-element-id2613924 =
   element character {
     ((attribute id { text },
       name-element-id2613832),
      born-element-id2613943),
     qualification-element-id2613840
   }
 died-element-id2613856 = element ns1:died { xsd:date }
 isbn-element-id2613872 =
   element isbn {
     (attribute * - ((local:* | ns2:*) | ns1:*) { text }+
      | empty),
     token
   }
 name-element-id2613832 = element ns1:name { xsd:token }
 qualification-element-id2613840 = element qualification { text }
 title-element-id2613819 =
   element title {
     (attribute * - ((local:* | ns2:*) | ns1:*) { text }+
      | empty),
     text
   }
 _  _-elt-id2615020 =
   element * {
     ((_  _-elt-id2615020
       | attribute * { text })
      | text)+
     | empty
   }
 _  _library-elt-id2615152 = element library { book-element-id2613963+ }
 _  _-elt-id2615098 =
   element * - ((local:* | ns2:*) | ns1:*) {
     ((_  _-elt-id2615020
       | attribute * { text })
      | text)+
     | empty
   }

Final Cleanup

The simplification process is almost done and just needs a bit of final cleanup:

  • Recursively escalate notAllowed patterns, when they are located where their effect is such that their parent pattern itself is notAllowed. Remove choices that are notAllowed. (Note that this simplification doesn't cross element boundaries, so element foo { notAllowed } isn't transformed into notAllowed.)

  • Remove empty elements that have no effect.

  • Move useful empty elements so that they are the first child in choice elements.

After this cleanup, our schema becomes:

 namespace local = ""
 namespace ns1 = "http://eric.van-der-vlist.com/ns/person"
 default namespace ns2 = "http://eric.van-der-vlist.com/ns/library"

 start =
   ((_  _library-elt-id2615152 | book-element-id2613963)
    | author-element-id2614058)
   | character-element-id2613924
 author-element-id2614058 =
   element ns1:author {
     ((attribute id {
         xsd:NMTOKEN { maxLength = " 16 " }
       },
       name-element-id2613832),
      born-element-id2613943),
     (empty | died-element-id2613856)
   }
 book-element-id2613963 =
   element book {
     (((((attribute id { text },
          attribute available { ("true" | xsd:token " false ") | " " }),
         isbn-element-id2613872),
        title-element-id2613819),
       (empty | author-element-id2614058+)),
      (empty | character-element-id2613924+))
     & (empty
        | ((empty
            | attribute * - ((local:* | ns2:*) | ns1:*) { text }+)
           | (empty | _  _-elt-id2615098+))+)
   }
 born-element-id2613943 = element ns1:born { xsd:date }
 character-element-id2613924 =
   element character {
     ((attribute id { text },
       name-element-id2613832),
      born-element-id2613943),
     qualification-element-id2613840
   }
 died-element-id2613856 = element ns1:died { xsd:date }
 isbn-element-id2613872 =
   element isbn {
     (empty
      | attribute * - ((local:* | ns2:*) | ns1:*) { text }+),
     token
   }
 name-element-id2613832 = element ns1:name { xsd:token }
 qualification-element-id2613840 = element qualification { text }
 title-element-id2613819 =
   element title {
     (empty
      | attribute * - ((local:* | ns2:*) | ns1:*) { text }+),
     text
   }
 _  _-elt-id2615020 =
   element * {
     empty
     | ((_  _-elt-id2615020
         | attribute * { text })
        | text)+
   }
 _  _library-elt-id2615152 = element library { book-element-id2613963+ }
 _  _-elt-id2615098 =
   element * - ((local:* | ns2:*) | ns1:*) {
     empty
     | ((_  _-elt-id2615020
         | attribute * { text })
        | text)+
   }

This text is released under the Free Software Foundation GFDL.