RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)
You are welcome to use our annotation system to give your feedback.
We will see the restrictions of RELAX NG in Chapter 15: Simplification And Restrictions, but we need to mention the principal restriction related to the interleave compositor. This may affect you at some point if you combine mixed content models.
Let's say we want to extend our title element to allow not only links (a elements) but also bold characters marked by a b element:
<title xml:lang="en">Being a <a href="http://dmoz.org/Recreation/Pets/Dogs/">Dog</a> Is a <b>Full-Time</b> <a href="http://dmoz.org/Business/Employment/Job_Search/">Job</a> </title> |
Because text can appear before the a elements, between a and b and after the b element, we might be tempted to write the following schema:
<element name="title"> <interleave> <attribute name="xml:lang"/> <text/> <zeroOrMore> <element name="a"> <attribute name="href"/> <text/> </element> </zeroOrMore> <text/> <zeroOrMore> <element name="b"> <text/> </element> </zeroOrMore> <text/> </interleave> </element> |
or:
element title { attribute xml:lang {text} & text & element a {attribute href {text}, text} * & text & element b {text} * & text } |
Running the Jing validator against this schema will raise the following error:
Error at URL "file:/home/vdv/xmlschemata-cvs/books/relaxng/examples/RngMorePatterns/interleave-restriction2.rnc", line number 1, column number 2: both operands of "interleave" contain "text" |
This is because there can be only one text pattern in each interleave pattern. We have seen that text patterns match zero or more text nodes, and in this case, the remedy is simple enough: the schema must be rewritten:
<element name="title"> <interleave> <attribute name="xml:lang"/> <text/> <zeroOrMore> <element name="a"> <attribute name="href"/> <text/> </element> </zeroOrMore> <zeroOrMore> <element name="b"> <text/> </element> </zeroOrMore> </interleave> </element> |
Or:
element title { attribute xml:lang {text} & text & element a {attribute href {text}, text} * & element b {text} * } |
This new schema is perfectly valid and it does what we were trying to do with our invalid schema.
In this simple example, diagnosting the problem was very simple, but in practice the situation is often more complex. There can be conflicting text patterns belonging to different sub patterns of interleave or mixed patterns. When using pattern libraries (as shown in Chapter 10: Creating Building Blocks), the conflicting text patterns will often belong to different RELAX NG grammars, making it still more difficult to pinpoint where the problem is. To make it even worse, the error messages from the RELAX NG processors are often quite cryptic, telling you that you have conflicting text patterns in interleave patterns without saying where they come from. Unfortunately, for now at least, you'll have to figure this out by yourself.
The reason behind the restriction of only one text pattern in each interleave pattern is to optimize RELAX NG implementations using the derivative method described by James Clark. When processing mixed content models, instead of processing each text node, these implementations can simply memorize the fact that this is mixed content and ignore each text node. To do so, the implementation needs to be able to quickly find if a content model mixed or not mixed. That's where the restriction makes a difference in terms of programming complexity and execution speed.
You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.