RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

You are welcome to use our annotation system to give your feedback.


Restrictions

With the exception of constraints expressed by the RELAX NG schema for RELAX NG and those which are part of the simplification itself, RELAX NG defines all the restrictions on schema structres as they apply to the simplified version. Most of them are obvious and easy to understand.

RELAX NG's constraints match the constraints on attributes defined by the XML 1.0 recommendation:

Let's explore schemas which may look valid at a quick glance but are going to collide with these restrictions.

This schema states that any content model can be accepted in the bar attribute:

 anything =
   (element * { anything }
    | attribute * { text }
    | text)*
 start =
   element foo {
     attribute bar { anything },
     text
   }

Unfortunately, it's translated into:

 start = __foo-elt-id2602800
 __-elt-id2602788 =
   element * {
     empty
     | ((__-elt-id2602788
         | attribute * { text })
        | text)+
   }
 __foo-elt-id2602800 =
   element foo {
     attribute bar {
       empty
       | ((__-elt-id2602788
           | attribute * { text })
          | text)+
     },
     text
   }

This one allows a reference to a named pattern (which means an element in the simplified syntax) and an attribute. Both of these things are forbidden.

We must ensure that the anything defined for the content of the attribute is compatible with the content of attributes as defined by the XML specification. For instance:

 anything =
   (text)
 start =
   element foo {
     attribute bar { anything },
     text
   }

which will be simplified into:

 start = __foo-elt-id2602296
 __foo-elt-id2602296 =
   element foo {
     attribute bar { text },
     text
   }

This schema expresses the original intent and it is valid.

Let's say we want to extend the definition of our title element to have the same attributes and content model as the XHTML 2.0 span element. If we look into the RELAX NG module implementing the span element, we can see that its definition is:

  span = element span { span.attlist, Inline.model }

We want to include this in the definition of the title element, which already includes an xml:lang attribute:

 namespace x = "http://www.w3.org/2002/06/xhtml2"
 
 start = book
 include "xhtml-attribs-2.rnc" inherit = x
 include "xhtml-inltext-2.rnc" inherit = x
 include "xhtml-datatypes-2.rnc" inherit = x
 book =
   element book {
     attribute id { text },
     attribute available { text },
     element isbn { text },
     element title {
       attribute xml:lang { xsd:language },
       span.attlist,
       Inline.model
     }
   }

Unfortunately, this is invalid because the xml:lang attribute is already included somewhere in the span.attlist pattern. It gets combined during the simplification which causes the definition of the title element to be:

 __title-elt-id2641936 =
  element title {
    (attribute xml:lang { xsd:language },
     (((((((((empty
              | attribute id { xsd:ID }),
             (empty
              | attribute class { xsd:NMTOKENS })),
            (empty
             | attribute title { text })),
           (empty
            | attribute xml:lang { xsd:language })),
          (empty
           | attribute dir {
               (("ltr" | "rtl") | "lro")
               | "rlo"
             })),
         ((empty
           | attribute edit {
               (("inserted" | "deleted") | "changed")
               | "moved"
             }),
          (empty default namespace lib = "http://eric.van-der-vlist.com/ns/library"
 namespace local = ""
 
 start = book
        
 book =
   element book {
     attribute id { text },
     attribute available { text },
     foreign-attributes,
     element isbn { text },
     element title {
       attribute xml:lang { xsd:language },
       text
     }
   }
        
   foreign-attributes = attribute * - (local:* | lib:* ) { text }*
           | attribute datetime { xsd:dateTime }))),
        ((((((((empty
                | attribute href { xsd:anyURI }),
               (empty
                | attribute cite { xsd:anyURI })),
              (empty
               | attribute target { xsd:NMTOKEN })),
             (empty
              | attribute rel { xsd:NMTOKENS })),
            (empty
             | attribute rev { xsd:NMTOKENS })),
           (empty
            | attribute accesskey {
                xsd:string { length = "1" }
              })),
          (empty
           | attribute navindex {
               xsd:nonNegativeInteger {
                 pattern = "0-9+"
                 minInclusive = "0"
                 maxInclusive = "32767"
               }
             })),
         (empty
          | attribute base { xsd:anyURI }))),
       ((empty
         | attribute src { xsd:anyURI }),
        (empty
         | attribute type { text }))),
      ((((empty
          | attribute usemap { xsd:anyURI }),
         (empty
          | attribute ismap { "ismap" })),
        (empty
         | attribute shape {
             (("rect" | "circle") | "poly")
             | "default"
           })),
       (empty
        | attribute coords { text })))),
    (empty
     | (empty
        | (text
           | (((((((((((((abbr-id2635861 | cite-id2635889)
                         | code-id2635918)
                        | dfn-id2635947)
                       | em-id2635975)
                      | kbd-id2636004)
                     | l-id2636032)
                    | quote-id2636061)
                   | samp-id2636090)
                  | span-id2636118)
                 | strong-id2636147)
                | sub-id2636176)
               | sup-id2636204)
              | var-id2636233)))+)
  }

To fix this, we need to remove the xml:lang from our original definition, creating:

 namespace x = "http://www.w3.org/2002/06/xhtml2"
 
 start = book
 include "xhtml-attribs-2.rnc" inherit = x
 include "xhtml-inltext-2.rnc" inherit = x
 include "xhtml-datatypes-2.rnc" inherit = x
 book =
   element book {
     attribute id { text },
     attribute available { text },
     element isbn { text },
     element title {
       span.attlist,
       Inline.model
     }
   }

Let's say we have the following schema, called book.rnc:

 default namespace lib = "http://eric.van-der-vlist.com/ns/library"
 namespace local = ""
 
 start = book
        
 book =
   element book {
     attribute id { text },
     attribute available { text },
     foreign-attributes,
     element isbn { text },
     element title {
       attribute xml:lang { xsd:language },
       text
     }
   }
        
   foreign-attributes = attribute * - (local:* | lib:* ) { text }*

Although we have accepted foreign attributes, we should be more precise about the definition of some Dublin Core elements. We can extend our schema like this:

 namespace dc="http://purl.org/dc/elements/1.1/"
        
 include "book.rnc"
 
 book.content &= attribute dc:rights { text } ?

Unfortunately, this is invalid because it gets simplified as:

 book-id2604347 =
   element book {
     ((((attribute id { text },
         attribute available { text }),
        (empty
         | attribute * - (lib:* | local:*) { text }+)),
       __isbn-elt-id2604556),
      __title-elt-id2604551)
     & attribute ns1:rights { text }
   }
   

The attribute dc:rights is included in the name class "* - (lib:* | local:*)". To fix this, we need to redefine the named pattern foreign-attributes to remove the name dc:rights or perhaps even all the namespaces for Dublin Core elements:

 default namespace lib = "http://eric.van-der-vlist.com/ns/library"
 namespace dc="http://purl.org/dc/elements/1.1/"
 namespace local = ""
 
 include "book.rnc" {
 	foreign-attributes = attribute * - (local:* | lib:* | dc:* ) { text }*
 }
 
 book.content &= attribute dc:rights { text } ?

Lists work on text nodes by splitting them into tokens which are then handled as text nodes. It's therefore not possible to find elements or attributes in a list. Mixing text nodes and embedded lists would be confusing and are forbidden anyway:

RELAX NG defines three different content models for an element:

This is identical to the definitions given by W3C XML Schema and similar but somewhat different from the definition of these terms in "plain" XML. Consider an element expressed as <foo>bar</foo>. RELAX NG sees it as complex content if its content has been described using a text pattern and as simple content if its content has been described using other patterns. It's not enough for an element to contain only a text node for it to be called simple content. It is also necessary for this element to have been described with a data orientation. When that is not the case, if the text pattern has been used, the element is considered document-oriented and a special case of mixed content where no elements are included.

The restriction on the content model is expressed by saying that empty content can be grouped with any other content models but that simple and complex content models can't be grouped together (through group or interleave patterns). Simple and complex content models can only appear under the definition of the same element as alternatives. In other words, for each alternative, you need to choose if you are data- or text-oriented but can't mix both mindsets.

We have already mentioned the practical consequence of this restriction on mixed content model in Chapter 7: Constraining Text Values. it is not possible to use data patterns to specify constraints on the text nodes occurring in elements with mixed content.

The last two limitations apply to interleave. The goal of these limitations is to facilitate the implementation of th e interleave feature which other schema languages lack, largely because it is seen as difficult to implement. These two limitations are intended to reduce the number of combinations that RELAX NG processors need to explore to support interleave:

These limitations don't affect the expressive power of RELAX NG (the set of content models which can be written with RELAX NG). Even if we run into a limitation from time to time, schemas can always be rewritten to work around them. Sometimes, though, they can be a nuisance when combining existing patterns with mixed content models.

The limitations are needed to support the different algorithms currently used to implement RELAX NG. James Clark thinks that they could be removed in future versions of RELAX NG: "Better algorithms may be developed that will allow this restriction to be removed in future versions."

We may have the following schema, book.rnc, to describe our books:

 start = book
 book = element book { book.content }
 book.content =
   attribute id { text },
   attribute available { text },
   element isbn { text },
   title
 title = element title { title.attributes, title.content }
 title.attributes = attribute xml:lang { xsd:language }
 title.content = text

To add the XHTML Inline.model to title.content we might be tempted to write:

 
 include "book.rnc"
 include "xhtml-attribs-2.rnc" 
 include "xhtml-inltext-2.rnc" 
 include "xhtml-datatypes-2.rnc" 
 
 title.content &= Inline.model

Unfortunately, Inline.model already contains a text pattern and gets simplified to:

 title-id2635741 =
  element title {
    attribute lang { xsd:language },
    (text
     & (empty
        | (empty
           | (text
              | (((((((((((((abbr-id2636549 | cite-id2636578)
                            | code-id2636607)
                           | dfn-id2636636)
                          | em-id2636664)
                         | kbd-id2636693)
                        | l-id2636721)
                       | quote-id2636750)
                      | samp-id2636778)
                     | span-id2636807)
                    | strong-id2636836)
                   | sub-id2636865)
                  | sup-id2636893)
                 | var-id2636922)))+))
  }

We have text patterns within interleave. To fix this problem, we need to replace our combination with a redefinition of title.content:

 include "book.rnc" {
   title.content = Inline.model
 }
 include "xhtml-attribs-2.rnc" 
 include "xhtml-inltext-2.rnc" 
 include "xhtml-datatypes-2.rnc"  
 include "book.rnc" {
   title.content = Inline.model
 }
 include "xhtml-attribs-2.rnc" 
 include "xhtml-inltext-2.rnc" 
 include "xhtml-datatypes-2.rnc" 

There is no loss in expressive power (we are able to describe what we wanted to describe), but there is a loss in modularity. Changes made to title.content in "book.rnc" would now have to be manually added to our derived schema.


You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.