by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
With the exception of constraints expressed by the RELAX NG schema for RELAX NG and those which are part of the simplification itself, RELAX NG defines all the restrictions on schema structures as they apply to the simplified version. Most of them are obvious and easy to understand.
RELAX NG's constraints match the constraints on attributes defined by the XML 1.0 recommendation:
Attributes can't contain other attributes; attribute patterns can't have another attribute pattern in their descendants.
Attributes can't contain elements; attribute patterns can't have a ref pattern in their descendants.
Attributes can't be duplicated; an attribute may not be found in a oneOrMore pattern with a combination by group or interleave. Furthermore, if attribute patterns are combined in a group or interleave pattern, their name classes must not overlap: they can't have any name that belongs to both name classes.
Attributes that have an infinite name class (anyName or nsName) must be enclosed in a oneOrMore pattern. In other words, you can't specify only one or a certain number of occurrences of these attributes. They can have only text as their model (in other words, data patterns are forbidden here).
Let's explore schemas that may look valid at a quick glance but are going to collide with these restrictions.
This schema states that any content model can be accepted in the bar attribute:
anything = (element * { anything } | attribute * { text } | text)* start = element foo { attribute bar { anything }, text } |
Unfortunately, it's translated into:
start = _ _foo-elt-id2602800 _ _-elt-id2602788 = element * { empty | ((_ _-elt-id2602788 | attribute * { text }) | text)+ } _ _foo-elt-id2602800 = element foo { attribute bar { empty | ((_ _-elt-id2602788 | attribute * { text }) | text)+ }, text } |
This one allows a reference to a named pattern (which means an element in the simplified syntax) and an attribute. Both of these things are forbidden.
You must ensure that the anything defined for the content of the attribute is compatible with the content of attributes as defined by the XML specification. For instance:
anything = (text) start = element foo { attribute bar { anything }, text } |
is simplified into:
start = _ _foo-elt-id2602296 _ _foo-elt-id2602296 = element foo { attribute bar { text }, text } |
This schema expresses the original intent and is valid.
Let's say I want to extend the definition of the title element so that it has the same attributes and content model as the XHTML 2.0 span element. If I look into the RELAX NG module implementing the span element, I can see that its definition is:
span = element span { span.attlist, Inline.model } |
I want to include this in the definition of the title element, which already includes an xml:lang attribute:
namespace x = "http://www.w3.org/2002/06/xhtml2" start = book include "xhtml-attribs-2.rnc" inherit = x include "xhtml-inltext-2.rnc" inherit = x include "xhtml-datatypes-2.rnc" inherit = x book = element book { attribute id { text }, attribute available { text }, element isbn { text }, element title { attribute xml:lang { xsd:language }, span.attlist, Inline.model } } |
Unfortunately, this snippet is invalid because the xml:lang attribute is already included somewhere in the span.attlist pattern. It gets combined during the simplification, which causes the definition of the title element to be:
_ _title-elt-id2641936 = element title { (attribute xml:lang { xsd:language }, (((((((((empty | attribute id { xsd:ID }), (empty | attribute class { xsd:NMTOKENS })), (empty | attribute title { text })), (empty | attribute xml:lang { xsd:language })), (empty | attribute dir { (("ltr" | "rtl") | "lro") | "rlo" })), ((empty | attribute edit { (("inserted" | "deleted") | "changed") | "moved" }), (empty default namespace lib = "http://eric.van-der-vlist.com/ns/library namespace local = "" start = book book = element book { attribute id { text }, attribute available { text }, foreign-attributes, element isbn { text }, element title { attribute xml:lang { xsd:language }, text } } foreign-attributes = attribute * - (local:* | lib:* ) { text }* | attribute datetime { xsd:dateTime }))), ((((((((empty | attribute href { xsd:anyURI }), (empty | attribute cite { xsd:anyURI })), (empty | attribute target { xsd:NMTOKEN })), (empty | attribute rel { xsd:NMTOKENS })), (empty | attribute rev { xsd:NMTOKENS })), (empty | attribute accesskey { xsd:string { length = "1" } })), (empty | attribute navindex { xsd:nonNegativeInteger { pattern = "0-9+" minInclusive = "0" maxInclusive = "32767" } })), (empty | attribute base { xsd:anyURI }))), ((empty | attribute src { xsd:anyURI }), (empty | attribute type { text }))), ((((empty | attribute usemap { xsd:anyURI }), (empty | attribute ismap { "ismap" })), (empty | attribute shape { (("rect" | "circle") | "poly") | "default" })), (empty | attribute coords { text })))), (empty | (empty | (text | (((((((((((((abbr-id2635861 | cite-id2635889) | code-id2635918) | dfn-id2635947) | em-id2635975) | kbd-id2636004) | l-id2636032) | quote-id2636061) | samp-id2636090) | span-id2636118) | strong-id2636147) | sub-id2636176) | sup-id2636204) | var-id2636233)))+) } |
To fix this, I need to remove the xml:lang from the original definition, creating:
namespace x = "http://www.w3.org/2002/06/xhtml2" start = book include "xhtml-attribs-2.rnc" inherit = x include "xhtml-inltext-2.rnc" inherit = x include "xhtml-datatypes-2.rnc" inherit = x book = element book { attribute id { text }, attribute available { text }, element isbn { text }, element title { span.attlist, Inline.model } } |
Let's say that I have the following schema, called book.rnc:
default namespace lib = "http://eric.van-der-vlist.com/ns/library" namespace local = "" start = book book = element book { attribute id { text }, attribute available { text }, foreign-attributes, element isbn { text }, element title { attribute xml:lang { xsd:language }, text } } foreign-attributes = attribute * - (local:* | lib:* ) { text }* |
Although I have accepted foreign attributes, I should be more precise about the definition of some Dublin Core elements. I can extend the schema like this:
namespace dc="http://purl.org/dc/elements/1.1/" include "book.rnc" book.content &= attribute dc:rights { text } ? |
Unfortunately, this is invalid, because it gets simplified to:
book-id2604347 = element book { ((((attribute id { text }, attribute available { text }), (empty | attribute * - (lib:* | local:*) { text }+)), _ _isbn-elt-id2604556), _ _title-elt-id2604551) & attribute ns1:rights { text } } |
The attribute dc:rights is included in the name class * - (lib:* | local:*). To fix this, I need to redefine the named pattern foreign-attributes to remove the name dc:rights or perhaps even all the namespaces for Dublin Core elements:
default namespace lib = "http://eric.van-der-vlist.com/ns/library" namespace dc="http://purl.org/dc/elements/1.1/" namespace local = "" include "book.rnc" { foreign-attributes = attribute * - (local:* | lib:* | dc:* ) { text }* } book.content &= attribute dc:rights { text } ? |
Lists work on text nodes by splitting them into tokens, which are then handled themselves as text nodes. It's therefore not possible to find elements or attributes in a list. Mixing text nodes and embedded lists is confusing and forbidden anyway. List patterns can't have any of these descendants: list, ref (because after simplification, access to elements is done using references to named patterns), attribute, or text. The interleave pattern is also forbidden as a descendant of list patterns because it complicates implementations.
I'd like to define a price element as allowing a numeric followed by a token, such as:
<price>1 Euro</price> |
or a token followed by a numeric:
<price>USD 1</price> |
I might be tempted to write:
element price { list { xsd:decimal & xsd:token } } |
However, this is invalid because interleave is forbidden in a list. To work around this limitation, I need to give all the possible combinations. It's easy with this small example, though it can rapidly grow out of control as more types are added. In this case, it just requires a bit of duplication:
element price { list { (xsd:decimal, xsd:token) | (xsd:token, xsd:decimal) } } |
Except patterns (except elements used in a data pattern) apply to individual pieces of data. An except element with a data parent can contain only data, value, and choice elements.
After simplification, the start pattern describes the list of possible root elements. You can thus find only combinations of choices between ref elements.
RELAX NG defines three different content models for an element:
Empty, when the element has only attributes
Simple, when the element has only attributes and has been described using data, value or list patterns
Complex, in all other cases
This set is identical to the definitions provided by W3C XML Schema and similar but somewhat different from the definition of these terms in plain XML. Consider an element expressed as <foo>bar</foo>. RELAX NG sees it as complex content if its content has been described using a text pattern and as simple content if its content has been described using other patterns. It's not enough for an element to contain only a text node for it to be called simple content. It is also necessary for this element to have been described with a data orientation. When that isn't the case, if the text pattern has been used, the element is considered document-oriented and a special case of mixed content in which no elements are included.
The restriction on the content model is expressed by saying that empty content can be grouped with any other content models but that simple and complex content models can't be grouped together (through group or interleave patterns). Simple and complex content models can appear under the definition of the same element only as alternatives. In other words, for each alternative, you need to choose between being data- or text-oriented, and you can't mix both mindsets.
I mentioned the practical consequence of this restriction on mixed content model in Chapter 7. It's not possible to use data patterns to specify constraints on the text nodes occurring in elements with mixed content.
The last two limitations apply to interleave. The goal of these limitations is to facilitate the implementation of the interleave feature, which other schema languages lack largely because it is seen as difficult to implement. These two limitations are intended to reduce the number of combinations RELAX NG processors need to explore to support interleave:
Elements combined through interleave must not overlap between name classes. You have already seen a similar restriction with attributes, which are always combined through interleave.
There must be at most one text pattern in each set of patterns combined by interleave.
These limitations don't affect the expressive power of RELAX NG (the set of content models that can be written with RELAX NG). Even if you run into a limitation from time to time, schemas can always be rewritten to work around them. Sometimes, though, they can be a nuisance when combining existing patterns with mixed content models.
The limitations are needed to support the different algorithms currently used to implement RELAX NG. James Clark thinks that they can be removed in future versions of RELAX NG: "Better algorithms may be developed that will allow this restriction to be removed in future versions."
You may have the following schema, book.rnc, to describe your books:
start = book book = element book { book.content } book.content = attribute id { text }, attribute available { text }, element isbn { text }, title title = element title { title.attributes, title.content } title.attributes = attribute xml:lang { xsd:language } title.content = text |
To add the XHTML Inline.model to title.content, you might be tempted to write:
include "book.rnc" include "xhtml-attribs-2.rnc" include "xhtml-inltext-2.rnc" include "xhtml-datatypes-2.rnc" title.content &= Inline.model |
Unfortunately, Inline.model already contains a text pattern and gets simplified to:
title-id2635741 = element title { attribute lang { xsd:language }, (text & (empty | (empty | (text | (((((((((((((abbr-id2636549 | cite-id2636578) | code-id2636607) | dfn-id2636636) | em-id2636664) | kbd-id2636693) | l-id2636721) | quote-id2636750) | samp-id2636778) | span-id2636807) | strong-id2636836) | sub-id2636865) | sup-id2636893) | var-id2636922)))+)) } |
Here there are text patterns within interleave. To fix this problem, I need to replace the combination with a redefinition of title.content:
include "book.rnc" { title.content = Inline.model } include "xhtml-attribs-2.rnc" include "xhtml-inltext-2.rnc" include "xhtml-datatypes-2.rnc" include "book.rnc" { title.content = Inline.model } include "xhtml-attribs-2.rnc" include "xhtml-inltext-2.rnc" include "xhtml-datatypes-2.rnc" |
There is no loss in expressive power (I am able to describe what I wanted to describe), but there is a loss in modularity. Changes made to title.content in book.rnc would now have to be manually added to the derived schema.
This text is released under the Free Software Foundation GFDL.