by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
The previous couple of schemas can validate instance documents independently of the prefixes being used. They meet the first goal of namespaces: disambiguating elements in multinamespace documents. However, they will fail to validate the instance document where we've added the dc:publisher element. You can easily update the schema to explicitly add this element to the content model of our book element, but that won't make it an open schema that accepts the addition of elements from any other namespace.
Instead of some magic feature that could have been quite rigid, RELAX NG introduced a flexible and clever feature that lets you define your own level of "openness." The idea is to let you define your own wildcard, and, once you have it, you can include it wherever you want in your content model.
Before we start, I'll define what we are trying to achieve! We want a named pattern allowing any element or attribute that doesn't belong to our lib or hr namespaces. We probably want to exclude attributes and elements with no namespaces; attributes, because our own attributes have no namespace, and we might want to differentiate them; and elements, because allowing elements without namespaces in a document using namespaces violates the general intent of disambiguating content. The content model of the elements we'll accept can be anything.
Let's start by defining the inner content of the wildcard and define what we want our "anything" to be. "Anything" in terms of patterns is any number of elements (themselves containing "anything"), attributes, and text, in any order. This is a good candidate for a recursively named pattern:
<define name="anything"> <zeroOrMore> <choice> <element> <anyName/> <ref name="anything"/> </element> <attribute> <anyName/> </attribute> <text/> </choice> </zeroOrMore> </define> |
or:
anything = ( element * { anything } | attribute * { text } | text )* |
The only things new here are the anyName element (in the XML syntax) and the * operator (in the compact syntax), which replace the name of an element or attribute. This is your first example of a name class (a class of names). You'll see that there are many ways to restrict this name class. Now that we have a named pattern to express what "anything" is, we can use it to define what "foreign" elements mean:
<define name="foreign-elements"> <zeroOrMore> <element> <anyName> <except> <nsName ns=""/> <nsName ns="http://eric.van-der-vlist.com/ns/library"/> <nsName ns="http://eric.van-der-vlist.com/ns/person"/> </except> </anyName> <ref name="anything"/> </element> </zeroOrMore> </define> |
or:
default namespace lib = "http://eric.van-der-vlist.com/ns/library" namespace local = "" namespace hr = "http://eric.van-der-vlist.com/ns/person" ... foreign-elements = element * - (local:* | lib:* | hr:*) { anything }* |
To achieve our purpose, we've introduced two new elements embedded in the anyName name class:
except (- in compact syntax) has the same meaning it does with enumerations.
nsName (xxx:* in compact syntax) means "any name from the specified namespace."
When using the XML syntax, nsName uses an ns attribute, while prefixes are employed when using the compact syntax. This usage of prefixes in the compact syntax implies that declarations are added to define prefixes not only for the lib (which is also the default namespace) and hr namespaces, but also for "no namespace" (here I have used the prefix local).
Note that name classes aren't considered patterns; instead, they are a specific set of elements with a specific purpose. A consequence of this statement is that name class definitions can't be placed within named patterns to be reused. Also, we have to repeat the same name class for both elements and attributes.
The same can be done to define foreign attributes:
<define name="foreign-attributes"> <zeroOrMore> <attribute> <anyName> <except> <nsName ns=""/> <nsName ns="http://eric.van-der-vlist.com/ns/library"/> <nsName ns="http://eric.van-der-vlist.com/ns/person"/> </except> </anyName> </attribute> </zeroOrMore> </define> |
or:
foreign-attributes = attribute * - (local:* | lib:* | hr:*) { text }* |
For convenience, we can also define foreign nodes by combining foreign elements and attributes:
<define name="foreign-nodes"> <zeroOrMore> <choice> <ref name="foreign-attributes"/> <ref name="foreign-elements"/> </choice> </zeroOrMore> </define> |
or:
foreign-nodes = ( foreign-attributes | foreign-elements )* |
Now that we have defined what the foreign-nodes wildcard is, we can use the concept to give more extensibility to our schema. To enable foreign-nodes to which we`ve added the dc:publisher element—between the title and author elements—we can write (switching to a "flatter" style to make it more readable):
<element name="book"> <attribute name="id"/> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <ref name="foreign-nodes"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </element> |
or:
book-element = element book { attribute id { text }, attribute available { text }, isbn-element, title-element, foreign-nodes, author-element*, character-element* } |
This does the trick for the instance document shown earlier, but it wouldn't validate a document where foreign nodes were added in any other place—for instance, between the isbn and title elements. We could insert a reference to the foreign-nodes pattern between all the elements, but that method would be very verbose. If you think about it, what we really want to do is interleave these foreign nodes between the content defined for the book element. This is a good opportunity to use the interleave pattern:
<element name="book"> <interleave> <group> <attribute name="id"/> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </group> <ref name="foreign-nodes"/> </interleave> </element> |
or:
element book { ( attribute id { text }, attribute available { text }, isbn-element, title-element, author-element*, character-element* ) & foreign-nodes } |
We may be tempted to allow foreign nodes everywhere in our document. However, while the extensibility gained is often acceptable in elements such as book that already have child elements, it's often considered a bad practice to do the same in elements that contain only text or data. An example would be the isbn element, where this practice would transform a text-content model into a mixed-content model. The reason this trick is considered bad practice comes from the weak support for mixed content models, as mentioned in Chapter 6, where I discussed the limitations of the mixed pattern. A consequence of allowing foreign elements in isbn elements would be that the content of this element could no longer be considered data. Neither datatypes nor restrictions could be applied.
Beyond this limitation of RELAX NG, applications would have to concatenate text nodes spread over the foreign elements. This concatenation can produce verbosity with tools such as XPath and XSLT.
One compromise on this issue is to allow only foreign attributes in text-content models. That's not an problem here because our foreign-attributes is ready for this purpose:
<element name="isbn"> <ref name="foreign-attributes"/> <text/> </element> |
or:
element isbn { foreign-attributes, text } |
This way, the isbn element is extensible but only with attributes from foreign namespaces.
Although most of the time wildcard use is straightforward, there are some situations in which wildcards may lead to unexpected schema errors—especially with attributes, whose usage is subject to restrictions.
The first of the traps is related to the limitation that the definition of attributes can't be duplicated in a schema. The following definition is invalid:
element title { attribute xml:space, attribute xml:space, text } # this is invalid |
This seems to be pretty sensible, since duplicate attributes are forbidden in the instance document. Unfortunately, the attribute xml:space is allowed by our "foreign-attributes" named template. We will get an error as well if we unthinkingly extend the definition of our title element and write:
element title { foreign-attributes, attribute xml:space, text } # also invalid |
To fix this error, we need to remove either the xml:space attribute from the name class of our foreign attributes or the implicit mention of xml:space in our definition and just write:
element title { foreign-attributes, text } |
Of course, this doesn't remove the possibility of including an xml:space attribute in the title element because this attribute is a foreign attribute as defined in our named pattern.
The second trap operates at a higher level but along the same lines. It's specific to the DTD compatibility ID feature. In Chapter 8, when you saw this datatype, it was used to define the book element:
<element name="book"> <attribute name="id"> <data datatypeLibrary="http://relaxng.org/ns/compatibility/datatypes/1.0" type="ID"/> </attribute> ... </element> |
or:
element book { attribute id {dtd:ID}, ... } |
Once again, an error will be generated if we add our foreign nodes. Because this feature is emulating the DTD in all its aspects, including the requirement that if an element book is defined with an id attribute having a type of ID, all the other definitions of an attribute id hosted by an element book must have the same type ID. The problem here is that, hidden in the definition of anything, there can be a book having an attribute id of type text. This situation would result in an error.
There is a way to work around this problem. If we want to use the DTD type ID, we have to remove the problematic possibility from the named pattern anything. A fast solution would be to exclude our own namespaces from the class names in anything. A better solution will be introduced using features shown in Section 12.3 of the next chapter.
In adding our foreign nodes, we have transformed:
<element name="book"> <attribute name="id"/> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </element> |
or:
element book { attribute id { text }, attribute available { text } isbn-element, title-element, author-element*, character-element* } |
into:
<element name="book"> <interleave> <group> <attribute name="id"/> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </group> <ref name="foreign-nodes"/> </interleave> </element> |
or:
element book { ( attribute id { text }, attribute available { text }, isbn-element, title-element, author-element*, character-element* ) & foreign-nodes } |
This operation can instead be accomplished as a pattern combination using interleave if the content of the element book is described as a named pattern:
<define name="book-content"> <attribute name="id"/> <attribute name="available"/> <ref name="isbn-element"/> <ref name="title-element"/> <zeroOrMore> <ref name="author-element"/> </zeroOrMore> <zeroOrMore> <ref name="character-element"/> </zeroOrMore> </define> |
or:
book-content = attribute id { text }, attribute available { text }, isbn-element, title-element, author-element*, character-element* |
This pattern can then easily be extended as:
<define name="book-content" combine="interleave"> <ref name="foreign-nodes"/> </define> |
or:
book-content &= foreign-nodes |
and used to define the book element:
<element name="book"> <ref name="book-content"/> </element> |
or:
element book { book-content } |
This combination can be done in a single document, but the mechanism can also extend a vocabulary by merging a grammar containing only these combinations. The exact same approach also works for appending foreign attributes to the elements that have text-based content models.
This text is released under the Free Software Foundation GFDL.