RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

You are welcome to use our annotation system to give your feedback.


Data versus text

In the previous chapter (Chapter 6: More patterns) we spent a lot of time providing a detailed description of the text pattern and its behavior within interleave patterns. There is another pattern used to describe and attach datatypes to text nodes. Even though this pattern will become more useful with the introduction of the datatype libraries in next chapter, we can describe its core features right now to be sure we've touched on most of the subject area regarding nodes.

The pattern we're previewing is the data pattern. The data pattern accepts a type attribute (as we have seen for the value pattern) and checks that the value is valid per this type. Since our two built-in types accept any value, the data pattern with built-in types is almost equivalent to a text pattern. However, the data pattern does not mean, like the text pattern, "zero or more text nodes" but "one text node". The data pattern has been designed to represent data. It is forbidden in mixed content models since the authors of the RELAX NG specification have considered mixing data and elements poor practice.

This restriction apply to all the patterns which match a single text node (data, value and list) which can never be associated with patterns matching sibling elements (elements which could have add the same parent element in the same instance document). In practice, this means that we won't be able to use a data pattern to describe content models such as:

 <price><currency>USD</currency>20</price>

or

 <price>20<currency>Euro</currency></price>

The previous content models have been considered poor practice by the authors of the RELAX NG specification. They advise reformulating them as:

 <price>
  <amount>20</amount>
  <currency>USD</currency>
 </price>

or

 <price currency="USD">20</price>

This is the second time that we've seen RELAX NG giving priority to good practices over the ability to describe all the combinations possible according to the the XML recommendation. (The first one was when we saw that there is no "unordered non interleaved" pattern in the previous chapter). This case actually increases the complexity of the implementations of RELAX NG processors. They must check that data patterns are not included within mixed content models. The support of data in mixed content models would have been possible using the general algorithms without any additional complexity. The only benefit for RELAX NG processors is that they can skip whitespace occurring between two elements, but this benefit seems really minimal compared to the possibilities which are lost by this restriction.

This restriction appears to come from a strict distinction between data-oriented and document-oriented applications of XML. Mixed content has been considered to belong to document-oriented applications, which shouldn't need datatypes, while datatypes are limited to data-oriented applications, which shouldn't need mixed content.


You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.