by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)


Using String Datatypes in Attribute Values

The lack of whitespace normalization with RELAX NG's string datatype may lead to some surprises. When attributes are defined, the XML parsers must remove the linefeeds and carriage returns they find there, which can lead to surprises in processing.

Attribute whitespace normalization can be confusing in several ways. Our previous schema specified that the attribute that must match on hold always matches an attribute in which the space between on and hold is replaced by a linefeed as in:

 <book id="b0836217462" available="on 
    hold">

Attribute whitespace normalization is normal behavior in XML 1.0. All XML parsers must normalize an attribute's value before reporting it to other applications, producing on hold, in this case. No schema language can change this. These issues can also make it difficult to create schemas that include strings that incorporate whitespace. This RELAX NG XML syntax schema requires new features in order to be translated to the compact syntax:

 <attribute name="available">
  <choice>
   <value type="string">available</value>
   <value type="string">checked out</value>
   <value type="string">on 
         hold</value>
  </choice>
 </attribute>

The compact syntax doesn't permit new lines within quotes. To translate this into the compact syntax, we need to introduce a couple of new features to permit the inclusion of linefeeds in values.

The first way to include them is borrowed from Python. If instead of using single (') or double (") quotes, you use three single (''') or three double (""") quotes, you can include nearly everything in your values, including new lines:

attribute available {string "available"|string "checked out"|string
"""on 
hold"""}

or:

attribute available {string "available"|string "checked out"|string
'''on 
hold'''''}

The second way to allow new lines is through escaping the newline character using the syntax \x{A} (where A is the Unicode value of newline in hexadecimal):

attribute available {string "available"|string "on hold"|string
"on\x{A}hold"}

This pattern specifies that the attribute can contain a value with a linefeed, something that can happen in XML only if the newline in the attribute is explicitly specified through its numeric value, such as:

 <book id="b0836217462" available="who&#x0A;knows?">

These are unlikely cases, but now you know what to do if you encounter them.


This text is released under the Free Software Foundation GFDL.