by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)


Data Versus Text

In Chapter 6, I provided a detailed description of the text pattern and its behavior within interleave patterns. There's another pattern that also describes and attaches datatypes to text nodes. Even though this pattern will become more useful with the introduction of the datatype libraries in Chapter 8, it's worth examining its core features right now to be sure you've touched on most of the definitions related to nodes.

The data pattern accepts a type attribute (as for the value pattern) and checks that the value is valid per this type. Since our two built-in types accept any value, the data pattern with built-in types is almost equivalent to a text pattern. However, the data pattern doesn't mean, like the text pattern, "zero or more text nodes" but instead "one text node." The data pattern has been designed to represent data. It's forbidden in mixed-content models because the authors of the RELAX NG specification considered mixing data and elements poor practice.

This restriction applies to all patterns that match a single text node (data, value, and list) that can never be associated with patterns matching sibling elements (elements that can add the same parent element in the same instance document). In practice, this means you can't use a data pattern to describe content models such as:

 <price><currency>USD</currency>20</price>

or:

 <price>20<currency>Euro</currency></price>

These content models were considered poor practice by the authors of the RELAX NG specification. They advise reformulating them as:

 <price>
  <amount>20</amount>
  <currency>USD</currency>
 </price>

or:

 <price currency="USD">20</price>

This is the second time RELAX NG has given priority to good practices over the ability to describe all the combinations possible according to the XML recommendation. (The first one was the no "unordered noninterleaved" pattern in Chapter 6.) This case actually increases the complexity of the implementations of RELAX NG processors, which must check that data patterns aren't included within mixed content models. The support of data in mixed-content models would have been possible using the general algorithms without any additional complexity. The only benefit for RELAX NG processors is that they can skip whitespace occurring between two elements, but this benefit seems really minimal compared to the possibilities that are lost by this restriction.

This restriction appears to come from a strict distinction between data- and document-oriented applications of XML. Mixed content has been considered an aspect of document-oriented applications, which shouldn't need datatypes, while datatypes are limited to data-oriented applications, which shouldn't need mixed content.


This text is released under the Free Software Foundation GFDL.