RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

You are welcome to use our annotation system to give your feedback.


Using string datatypes in attribute values

The fact that no whitespace normalization is performed when RELAX NG's string datatype is used may lead to some surprises. When we define attributes, the XML parsers must remove the line feeds and carriage returns they find there. This can lead to surprises in processing.

Attribute white space normalization may be confusing in several ways. Our previous schema defined that the attribute that must match "on hold" will always match an attribute where the space between "on" and "hold" will have been replaced by a line feed as in:

 <book id="b0836217462" available="on
 hold">

Attribute white space normalization is normal behavior in XML 1.0. All XML parsers must normalize an attribute's value before reporting it to other applications, producing "on hold", in this case. No schema language can change this. These issues can also make it difficult to create schemas which include strings which incorporate whitespace. This RELAX NG XML syntax schema requires new features in order to be translated to the compact syntax:

 <attribute name="available">
  <choice>
   <value type="string">available</value>
   <value type="string">checked out</value>
   <value type="string">on
 hold</value>
  </choice>
 </attribute>

The compact syntax doesn't permit new lines within quotes. To translate this into the compact syntax, we need to introduce a couple of new features to permit the inclusion of linefeeds in values.

The first way to include them is borrowed from Python. If instead of using single (') or double (") quotes, you use three single (''') or three double (""") quotes, you can include nearly everything in your values including new lines.

 attribute available {string "available"|string "checked out"|string
"""on ���
 hold"""}

or:

 attribute available {string "available"|string "checked out"|string
'''on 
 hold'''�'�'�}

The second way to allow new lines is through escaping the new line character using the syntax \x{A} (where A is the Unicode value of newline in hexadecimal):

 attribute available {string "available"|string "on hold"|string
"on\x{A}hold"}

This pattern specifies that the attribute could contain a value with a line feed, something which can only happen in XML if the newline in the attribute is explicitly specified through its numeric value, such as in:

 <book id="b0836217462" available="who&#x0A;knows?">

These are unlikely cases, but now you know what to do if you encounter them.


You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.