by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)


A Restriction Related to interleave

You'll see the restrictions of RELAX NG in Chapter 15, but I need to mention the principal restriction related to the interleave compositor, as it might affect you at some point if you combine mixed-content models.

Let's extend our title element to allow not only links (a elements) but also bold characters marked by a b element:

<title xml:lang="en">Being a
 <a href="http://dmoz.org/Recreation/Pets/Dogs/">Dog</a>
    Is a <b>Full-Time</b>
 <a href="http://dmoz.org/Business/Employment/Job_Search/">Job</a>
</title>

Because text can appear before the a elements, between a and b, and after the b element, you might be tempted to write the following schemas:

<element name="title">
 <interleave>
  <attribute name="xml:lang"/>
  <text/>
  <zeroOrMore>
    <element name="a">
      <attribute name="href"/>
      <text/>
    </element>
  </zeroOrMore>
  <text/>
  <zeroOrMore>
    <element name="b">
      <text/>
    </element>
  </zeroOrMore>
  <text/>
 </interleave>
</element>

or:

element title {
  attribute xml:lang {text}
  & text
  & element a {attribute href {text}, text} *
  & text
  & element b {text} *
  & text
}

Running the Jing validator against this schema raises the following error:

Error at URL "file:/home/vdv/xmlschemata-cvs/books/relaxng/examples/RngMorePatterns/
interleave-restriction2.rnc",
line number 1, column number 2: both operands of "interleave" contain "text"

This error results because there can be only one text pattern in each interleave pattern. You have seen that text patterns match zero or more text nodes, and in this case, the remedy is simple enough: the schema must be rewritten as:

<element name="title">
  <interleave>
    <attribute name="xml:lang"/>
    <text/>
    <zeroOrMore>
      <element name="a">
        <attribute name="href"/>
        <text/>
      </element>
    </zeroOrMore>
    <zeroOrMore>
      <element name="b">
        <text/>
      </element>
    </zeroOrMore>
  </interleave>
</element>

or:

element title {
  attribute xml:lang {text}
  & text
  & element a {attribute href {text}, text} *
  & element b {text} *
 }

This new schema is perfectly valid and does what we tried to do with our invalid schema.

In this example, diagnosing the problem was very simple, but in practice, the situation is often more complex. There can be conflicting text patterns belonging to different subpatterns of interleave or mixed patterns. When using pattern libraries (as shown in Chapter 10), the conflicting text patterns often belong to different RELAX NG grammars, making it still more difficult to pinpoint the problem. To make it even worse, the error messages from the RELAX NG processors are often quite cryptic, in this case telling you there are conflicting text patterns in interleave patterns without saying where they come from. Unfortunately, for now at least, you'll have to figure this out by yourself.

[Tip]Tip

The reason behind the restriction of only one text pattern in each interleave pattern is to optimize RELAX NG implementations using the derivative method described by James Clark. When processing mixed-content models, instead of processing each text node, these implementations can simply memorize the fact that this is mixed content and ignore each text node. To do so, the implementation needs to be able to quickly find if a content model mixed or not mixed. That's where the restriction makes a difference in terms of programming complexity and execution speed.


This text is released under the Free Software Foundation GFDL.