RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)
You are welcome to use our annotation system to give your feedback.
With the exception of constraints expressed by the RELAX NG schema for RELAX NG and those which are part of the simplification itself, RELAX NG defines all the restrictions on schema structres as they apply to the simplified version. Most of them are obvious and easy to understand.
RELAX NG's constraints match the constraints on attributes defined by the XML 1.0 recommendation:
Attributes can't contain other attributes: attribute patterns can't have another attribute pattern in their descendants.
Attributes can't contain elements: attribute patterns can't have a ref pattern in their descendants.
Attributes can't be duplicated: an attribute may not be found in a oneOrMore pattern with a combination by group or interleave. Furthermore, if attribute patterns are combined in a group or interleave pattern, their name classes must not overlap: they cannot have any name which belongs to both name classes.
Attributes which have an infinite name class ( anyName or nsName) must be enclosed in a oneOrMore pattern. In other words, we can't specify that we want to allow only one or a certain number of occurrences of these attributes. They can only have text as their model (in other words, data patterns are forbidden here).
Let's explore schemas which may look valid at a quick glance but are going to collide with these restrictions.
This schema states that any content model can be accepted in the bar attribute:
anything = (element * { anything } | attribute * { text } | text)* start = element foo { attribute bar { anything }, text } |
Unfortunately, it's translated into:
start = __foo-elt-id2602800 __-elt-id2602788 = element * { empty | ((__-elt-id2602788 | attribute * { text }) | text)+ } __foo-elt-id2602800 = element foo { attribute bar { empty | ((__-elt-id2602788 | attribute * { text }) | text)+ }, text } |
This one allows a reference to a named pattern (which means an element in the simplified syntax) and an attribute. Both of these things are forbidden.
We must ensure that the anything defined for the content of the attribute is compatible with the content of attributes as defined by the XML specification. For instance:
anything = (text) start = element foo { attribute bar { anything }, text } |
which will be simplified into:
start = __foo-elt-id2602296 __foo-elt-id2602296 = element foo { attribute bar { text }, text } |
This schema expresses the original intent and it is valid.
Let's say we want to extend the definition of our title element to have the same attributes and content model as the XHTML 2.0 span element. If we look into the RELAX NG module implementing the span element, we can see that its definition is:
span = element span { span.attlist, Inline.model } |
We want to include this in the definition of the title element, which already includes an xml:lang attribute:
namespace x = "http://www.w3.org/2002/06/xhtml2" start = book include "xhtml-attribs-2.rnc" inherit = x include "xhtml-inltext-2.rnc" inherit = x include "xhtml-datatypes-2.rnc" inherit = x book = element book { attribute id { text }, attribute available { text }, element isbn { text }, element title { attribute xml:lang { xsd:language }, span.attlist, Inline.model } } |
Unfortunately, this is invalid because the xml:lang attribute is already included somewhere in the span.attlist pattern. It gets combined during the simplification which causes the definition of the title element to be:
__title-elt-id2641936 = element title { (attribute xml:lang { xsd:language }, (((((((((empty | attribute id { xsd:ID }), (empty | attribute class { xsd:NMTOKENS })), (empty | attribute title { text })), (empty | attribute xml:lang { xsd:language })), (empty | attribute dir { (("ltr" | "rtl") | "lro") | "rlo" })), ((empty | attribute edit { (("inserted" | "deleted") | "changed") | "moved" }), (empty default namespace lib = "http://eric.van-der-vlist.com/ns/library" namespace local = "" start = book book = element book { attribute id { text }, attribute available { text }, foreign-attributes, element isbn { text }, element title { attribute xml:lang { xsd:language }, text } } foreign-attributes = attribute * - (local:* | lib:* ) { text }* | attribute datetime { xsd:dateTime }))), ((((((((empty | attribute href { xsd:anyURI }), (empty | attribute cite { xsd:anyURI })), (empty | attribute target { xsd:NMTOKEN })), (empty | attribute rel { xsd:NMTOKENS })), (empty | attribute rev { xsd:NMTOKENS })), (empty | attribute accesskey { xsd:string { length = "1" } })), (empty | attribute navindex { xsd:nonNegativeInteger { pattern = "0-9+" minInclusive = "0" maxInclusive = "32767" } })), (empty | attribute base { xsd:anyURI }))), ((empty | attribute src { xsd:anyURI }), (empty | attribute type { text }))), ((((empty | attribute usemap { xsd:anyURI }), (empty | attribute ismap { "ismap" })), (empty | attribute shape { (("rect" | "circle") | "poly") | "default" })), (empty | attribute coords { text })))), (empty | (empty | (text | (((((((((((((abbr-id2635861 | cite-id2635889) | code-id2635918) | dfn-id2635947) | em-id2635975) | kbd-id2636004) | l-id2636032) | quote-id2636061) | samp-id2636090) | span-id2636118) | strong-id2636147) | sub-id2636176) | sup-id2636204) | var-id2636233)))+) } |
To fix this, we need to remove the xml:lang from our original definition, creating:
namespace x = "http://www.w3.org/2002/06/xhtml2" start = book include "xhtml-attribs-2.rnc" inherit = x include "xhtml-inltext-2.rnc" inherit = x include "xhtml-datatypes-2.rnc" inherit = x book = element book { attribute id { text }, attribute available { text }, element isbn { text }, element title { span.attlist, Inline.model } } |
Let's say we have the following schema, called book.rnc:
default namespace lib = "http://eric.van-der-vlist.com/ns/library" namespace local = "" start = book book = element book { attribute id { text }, attribute available { text }, foreign-attributes, element isbn { text }, element title { attribute xml:lang { xsd:language }, text } } foreign-attributes = attribute * - (local:* | lib:* ) { text }* |
Although we have accepted foreign attributes, we should be more precise about the definition of some Dublin Core elements. We can extend our schema like this:
namespace dc="http://purl.org/dc/elements/1.1/" include "book.rnc" book.content &= attribute dc:rights { text } ? |
Unfortunately, this is invalid because it gets simplified as:
book-id2604347 = element book { ((((attribute id { text }, attribute available { text }), (empty | attribute * - (lib:* | local:*) { text }+)), __isbn-elt-id2604556), __title-elt-id2604551) & attribute ns1:rights { text } } |
The attribute dc:rights is included in the name class "* - (lib:* | local:*)". To fix this, we need to redefine the named pattern foreign-attributes to remove the name dc:rights or perhaps even all the namespaces for Dublin Core elements:
default namespace lib = "http://eric.van-der-vlist.com/ns/library" namespace dc="http://purl.org/dc/elements/1.1/" namespace local = "" include "book.rnc" { foreign-attributes = attribute * - (local:* | lib:* | dc:* ) { text }* } book.content &= attribute dc:rights { text } ? |
Lists work on text nodes by splitting them into tokens which are then handled as text nodes. It's therefore not possible to find elements or attributes in a list. Mixing text nodes and embedded lists would be confusing and are forbidden anyway:
List patterns cannot have any of these descendants: list, ref (because after simplification, access to elements is done using references to named patterns), attribute, or text. The interleave pattern is also forbidden as a descendant of list patterns because it would complicate implementations.
Let's say we'd like to define a price element as allowing a numeric followed by a token, such as:
<price>1 Euro</price> |
or a token followed by a numeric:
<price>USD 1</price> |
We might be tempted to write:
element price { list { xsd:decimal & xsd:token } } |
But this would be invalid because interleave is forbidden in a list. To work around this limitation, we need to give all the possible combinations. It's easy with this small example, though it can rapidly grow out of control as more types are added. In this case, it just requires a bit of duplication:
element price { list { (xsd:decimal, xsd:token) | (xsd:token, xsd:decimal) } } |
Except patterns ( except elements used in a data pattern) apply to individual pieces of data. An except element with a data parent can only contain data, value and choice elements.
After simplification, the start pattern describes the list of possible root elements. You can thus find only combinations of choices between ref elements.
RELAX NG defines three different content models for an element:
Empty when the element has only attributes.
Simple when the element has only attributes and has been described using data, value or list patterns.
Complex in all other cases.
This is identical to the definitions given by W3C XML Schema and similar but somewhat different from the definition of these terms in "plain" XML. Consider an element expressed as <foo>bar</foo>. RELAX NG sees it as complex content if its content has been described using a text pattern and as simple content if its content has been described using other patterns. It's not enough for an element to contain only a text node for it to be called simple content. It is also necessary for this element to have been described with a data orientation. When that is not the case, if the text pattern has been used, the element is considered document-oriented and a special case of mixed content where no elements are included.
The restriction on the content model is expressed by saying that empty content can be grouped with any other content models but that simple and complex content models can't be grouped together (through group or interleave patterns). Simple and complex content models can only appear under the definition of the same element as alternatives. In other words, for each alternative, you need to choose if you are data- or text-oriented but can't mix both mindsets.
We have already mentioned the practical consequence of this restriction on mixed content model in Chapter 7: Constraining Text Values. it is not possible to use data patterns to specify constraints on the text nodes occurring in elements with mixed content.
The last two limitations apply to interleave. The goal of these limitations is to facilitate the implementation of th e interleave feature which other schema languages lack, largely because it is seen as difficult to implement. These two limitations are intended to reduce the number of combinations that RELAX NG processors need to explore to support interleave:
Elements combined through interleave must no overlap between name classes. We have already seen a similar restriction with attributes which are always combined through interleave.
There must be at most one text pattern in each set of patterns combined by interleave.
These limitations don't affect the expressive power of RELAX NG (the set of content models which can be written with RELAX NG). Even if we run into a limitation from time to time, schemas can always be rewritten to work around them. Sometimes, though, they can be a nuisance when combining existing patterns with mixed content models.
The limitations are needed to support the different algorithms currently used to implement RELAX NG. James Clark thinks that they could be removed in future versions of RELAX NG: "Better algorithms may be developed that will allow this restriction to be removed in future versions."
We may have the following schema, book.rnc, to describe our books:
start = book book = element book { book.content } book.content = attribute id { text }, attribute available { text }, element isbn { text }, title title = element title { title.attributes, title.content } title.attributes = attribute xml:lang { xsd:language } title.content = text |
To add the XHTML Inline.model to title.content we might be tempted to write:
include "book.rnc" include "xhtml-attribs-2.rnc" include "xhtml-inltext-2.rnc" include "xhtml-datatypes-2.rnc" title.content &= Inline.model |
Unfortunately, Inline.model already contains a text pattern and gets simplified to:
title-id2635741 = element title { attribute lang { xsd:language }, (text & (empty | (empty | (text | (((((((((((((abbr-id2636549 | cite-id2636578) | code-id2636607) | dfn-id2636636) | em-id2636664) | kbd-id2636693) | l-id2636721) | quote-id2636750) | samp-id2636778) | span-id2636807) | strong-id2636836) | sub-id2636865) | sup-id2636893) | var-id2636922)))+)) } |
We have text patterns within interleave. To fix this problem, we need to replace our combination with a redefinition of title.content:
include "book.rnc" { title.content = Inline.model } include "xhtml-attribs-2.rnc" include "xhtml-inltext-2.rnc" include "xhtml-datatypes-2.rnc" include "book.rnc" { title.content = Inline.model } include "xhtml-attribs-2.rnc" include "xhtml-inltext-2.rnc" include "xhtml-datatypes-2.rnc" |
There is no loss in expressive power (we are able to describe what we wanted to describe), but there is a loss in modularity. Changes made to title.content in "book.rnc" would now have to be manually added to our derived schema.
You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.