Accepting foreign namespaces

RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

Accepting foreign namespaces
Prev�	Chapter 11: Namespaces	�Next

Accepting foreign namespaces

The last schemas which we have seen will validate instance documents independently of the prefixes being used. They meet the first goal of namespaces, disambiguating elements in multi-namespace documents. However, they will fail to validate the instance document where we've added the dc:publisher element. We could easily update our schema to explicitly add this element to the content model of our book element, but that wouldn't make it an open schema accepting addition of elements from any other namespace.

Instead of some "magic feature" which could have been quite rigid, RELAX NG introduced a flexible and clever feature which lets you define your own level of "openness". The idea is to let you define your own wildcard, and, once you have it, you can include it wherever you want in your content model.

Constructing a wildcard

Before we start, let's define what we are trying to achieve! We want a named pattern allowing any element or attribute which does not belong to our lib or hr namespaces. We probably want to exclude elements and attributes with no namespaces: attributes because our own attributes have no namespace and we might want to differentiate them, and elements because allowing elements without namespaces in a document using namespaces violates the general intent of disambiguating content. The content model of the elements which we'll accept may be anything.

We will start by defining the inner content of the wilcard and define what we want our "anything" to be. "Anything" in terms of patterns is any number of elements (themselves containing "anything"), attributes, and text, in any order. This is a good candidate for a recursively named pattern:

  <define name="anything">
    <zeroOrMore>
      <choice>
        <element>
          <anyName/>
          <ref name="anything"/>
        </element>
        <attribute>
          <anyName/>
        </attribute>
        <text/>
      </choice>
    </zeroOrMore>
  </define>

or:

 anything = ( element * { anything } | attribute * { text } | text )*

The only things new here are the anyName element (in the XML syntax) and the * operator (in the compact syntax) that replace the name of an element or attribute. This is our first example of a name class (a class of names). We'll see that there are many ways to restrict this name class. Now that we have a named pattern to express what "anything" is, we can use it to define what "foreign" elements mean:

  <define name="foreign-elements">
    <zeroOrMore>
      <element>
        <anyName>
          <except>
            <nsName ns=""/>
            <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
            <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
          </except>
        </anyName>
        <ref name="anything"/>
      </element>
    </zeroOrMore>
  </define>

or:

 default namespace lib = "http://eric.van-der-vlist.com/ns/library"
 namespace local = ""
 namespace hr = "http://eric.van-der-vlist.com/ns/person"
      
 .../...
      
 foreign-elements = element * - (local:* | lib:* | hr:*) { anything }*

To achieve our purpose, we have introduced two new elements embedded in the anyName name class. except (- in the compact syntax) has the same meaning as it does with enumerations. nsName (xxx:* in the compact syntax) means "any name from the namespace specified". When using the XML syntax, nsName uses an ns attribute while prefixes are used when using the compact syntax. This usage of prefixes in the compact syntax implies that declarations are added to define prefixes not only for the "lib" (which is also the default namespace) and "hr" namespaces, but also for "no namespace" (here we have used the prefix "local").

Note that name classes are not considered patterns; instead they are a specific set of elements with a specific purpose. A consequence of this statement is that name class definitions cannot be placed within named patterns to be reused. Also, we have to repeat the same name class for both elements and attributes.

The same can be done to define foreign attributes:

  <define name="foreign-attributes">
    <zeroOrMore>
      <attribute>
        <anyName>
          <except>
            <nsName ns=""/>
            <nsName ns="http://eric.van-der-vlist.com/ns/library"/>
            <nsName ns="http://eric.van-der-vlist.com/ns/person"/>
          </except>
        </anyName>
      </attribute>
    </zeroOrMore>
  </define>

or:

 foreign-attributes = attribute * - (local:* | lib:* | hr:*) { text }*

For convenience, we can also define foreign nodes by combining foreign elements and attributes:

  <define name="foreign-nodes">
    <zeroOrMore>
      <choice>
        <ref name="foreign-attributes"/>
        <ref name="foreign-elements"/>
      </choice>
    </zeroOrMore>
  </define>

or:

 foreign-nodes = ( foreign-attributes | foreign-elements )*

Using wildcards

Now that we have defined what the foreign-nodes wildcard is, we can use the concept to give more extensibility to our schema. To enable foreign-nodes where we have added the dc:publisher element--between the title and author elements--we could write (switching to a "flatter" style to make it more readable):

     <element name="book">
      <attribute name="id"/>
      <attribute name="available"/>
      <ref name="isbn-element"/>
      <ref name="title-element"/>
      <ref name="foreign-nodes"/>
      <zeroOrMore>
        <ref name="author-element"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="character-element"/>
      </zeroOrMore>
    </element>

or:

 book-element =
    element book
    {
       attribute id { text },
       attribute available { text },
       isbn-element,
       title-element,
       foreign-nodes,
       author-element*,
       character-element*
    }

This would do the trick for the instance document shown above, but wouldn't validate a document where foreign nodes were added in any other place, for instance between the isbn and title elements. We could insert a reference to the foreign-nodes pattern between all the elements, but this would be very verbose. If we think about it, what we really want to do is interleave these foreign nodes between the content defined for the book element. This is a good opportunity to use the interleave pattern:

    <element name="book">
      <interleave>
        <group>
          <attribute name="id"/>
          <attribute name="available"/>
          <ref name="isbn-element"/>
          <ref name="title-element"/>
          <zeroOrMore>
            <ref name="author-element"/>
          </zeroOrMore>
          <zeroOrMore>
            <ref name="character-element"/>
          </zeroOrMore>
        </group>
        <ref name="foreign-nodes"/>
      </interleave>
    </element>

or:

    element book
    {
       (
          attribute id { text },
          attribute available { text },
          isbn-element,
          title-element,
          author-element*,
          character-element*
       )
     & foreign-nodes
    }

Where should we allow foreign nodes?

We may be tempted to allow foreign nodes everywhere in our document. However while the extensibility which would be gained is often acceptable in elements such as book which already have child elements, it's often considered a bad practice to do the same in elements which contain only text or data. An example would be the isbn element where this would transform a text content model into a mixed content model. The reason this is considered bad practice comes from the weak support for mixed content models which we mentioned in Chapter 6: More patterns where we discussed the limitations of the mixed pattern. A consequence of allowing foreign elements in isbn elements would be that the content of this element could not be considered to be data any longer. Neither datatypes nor restrictions could be applied.

Beyond this limitation of RELAX NG, applications would have to concatenate text nodes spread over the foreign elements. This can become verbose with tools such as XPath and XSLT.

One compromise on this issue is to allow only foreign attributes in text content models. That's not an problem for us since our foreign-attributes is ready for this purpose:

    <element name="isbn">
      <ref name="foreign-attributes"/>
      <text/>
    </element>

or:

 element isbn { foreign-attributes, text }

This way, the isbn element is extensible, but only with attributes from foreign namespaces.

Traps to avoid

Although most of the time using wildcards is straightforward, there are some situations where wildcards may lead to unexpected schema errors, especially with attributes, where their usage is subject to restrictions.

The first of the traps is related to the fact that the definition of attributes cannot be duplicated in a schema. The following definition would be invalid:

 element title { attribute xml:space, attribute xml:space, text } # this is invalid

This seems to be pretty sensible since duplicate attributes are forbidden in the instance document. Unfortunately, the attribute "xml:space" is allowed by our "foreign-attributes" named template. We will get an error as well if we unthinkingly extend the definition of our title element and write:

 element title { foreign-attributes, attribute xml:space, text } # this is also
invalid

To fix this error, we will need either to remove the xml:space attribute from the name class of our foreign attributes or to remove the implicit mention of xml:space in our definition and just write:

 element title { foreign-attributes, text }

Of course, this doesn't remove the possibility of including an xml:space attribute in the title element, since this attribute is a "foreign attribute" as defined in our named pattern.

The second trap operates at a higher level, but along the same lines. It is specific to the DTD compatibility ID feature. In "Chapter 8: Datatype libraries", when we saw this datatype, we used it to define the book element:

   <element name="book">
    <attribute name="id">
     <data datatypeLibrary="http://relaxng.org/ns/compatibility/datatypes/1.0" type="ID"/>
    </attribute>
    .../...
   </element>

or:

  element book {
   attribute id {dtd:ID},
   .../...
  }

Once again, we will generate an error if we add our foreign nodes. Because this feature is emulating the DTD in all its aspects including the fact that if an element book is defined with an id attribute having a type of ID, all the other definitions of an attribute id hosted by an element book must have the same type "ID". The problem here is that, hidden in the definition of anything, there can be a book having an attribute id of type text. This is an error.

There is a way to work around this problem. If we want to use the DTD type "ID", we have to remove the problematic possibility from the named pattern "anything". A fast solution would be to exclude our own namespaces from the class names in "anything". A better solution will be introduced using features shown in the next chapter.

Adding foreign nodes through combination

In adding our foreign nodes, we have transformed:

    <element name="book">
      <attribute name="id"/>
      <attribute name="available"/>
      <ref name="isbn-element"/>
      <ref name="title-element"/>
      <zeroOrMore>
        <ref name="author-element"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="character-element"/>
      </zeroOrMore>
    </element>

or:

    element book
    {
      attribute id { text },
      attribute available { text }
      isbn-element,
      title-element,
      author-element*,
      character-element*
    }

into:

    <element name="book">
      <interleave>
        <group>
          <attribute name="id"/>
          <attribute name="available"/>
          <ref name="isbn-element"/>
          <ref name="title-element"/>
          <zeroOrMore>
            <ref name="author-element"/>
          </zeroOrMore>
          <zeroOrMore>
            <ref name="character-element"/>
          </zeroOrMore>
        </group>
        <ref name="foreign-nodes"/>
      </interleave>
    </element>

or:

    element book
    {
       (
          attribute id { text },
          attribute available { text },
          isbn-element,
          title-element,
          author-element*,
          character-element*
       )
     & foreign-nodes
    }

This operation could instead have been done as a pattern combination using interleave if the content of the element book was described as a named pattern:

  <define name="book-content">
    <attribute name="id"/>
    <attribute name="available"/>
    <ref name="isbn-element"/>
    <ref name="title-element"/>
    <zeroOrMore>
      <ref name="author-element"/>
    </zeroOrMore>
    <zeroOrMore>
      <ref name="character-element"/>
    </zeroOrMore>
  </define>

or:

 book-content =
    attribute id { text },
    attribute available { text },
    isbn-element,
    title-element,
    author-element*,
    character-element*

This pattern can then easily be extended as:

  <define name="book-content" combine="interleave">
    <ref name="foreign-nodes"/>
  </define>

 book-content &= foreign-nodes

and used to define the book element:

    <element name="book">
      <ref name="book-content"/>
    </element>

or:

 element book { book-content }

This combination can be done in a single document but the mechanism can also be used to extend a vocabulary by merging a grammar containing only these combinations. The exact same approach also works for appending foreign attributes to the elements which have text-based content models.

You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.

Prev�	Up	�Next
Declaring Namespaces in Schemas�	Home	�Chapter 12: Writing Extensible Schemas