RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)
You are welcome to use our annotation system to give your feedback.
RELAX NG includes a native type system, but this type library is weak and imperfect. It consists of only two datatypes (token and string) which only differ in the whitespace processing applied before validation. The whole RELAX NG datatype system can be seen as a mechanism for adding validating transformations to text nodes. These transformations change text nodes into canonical formats (formats where all the different formats for a same value are converted into a single normalized or "canonical" format). The two native datatypes do not detect format errors (their formats are broad enough to allow any value) but still transform text nodes in their canonical forms, which can make a difference for enumerations. Other datatype libraries, covered in the next chapter, can detect format errors.
Enumerations are the first place where we will see datatypes at work. Applying datatypes to enumeration values is done by adding a type attribute in value patterns. Up to now, we haven't specified any datatype when we've written value elements. By default, they have the default type token from the built-in library. Text values of this datatype receive full whitespaces normalization similar to that performed by the XPath normalize-space() function: all sequences of one or more whitespace characters -the characters #x20 (space), #x9 (tab), #xA (linefeed), and #xD (carriage return)- are replaced by a single space and the leading and trailing space is then trimmed.
If we return to previous examples, writing:
<attribute name="available"> <choice> <value>available</value> <value>checked out</value> <value>on hold</value> </choice> </attribute> |
or:
attribute available {"available"|"checked out"|"on hold"} |
has been using the default type value (token) and was equivalent to the following:
<attribute name="available"> <choice> <value type="token">available</value> <value type="token">checked out</value> <value type="token">on hold</value> </choice> </attribute> |
or:
attribute available {token "available"|token "checked out"|token "on hold"} |
When the token datatype is used, whitespace normalization is applied to the value defined in the schema and to the value found in the instance document. The comparison is done using the result of the normalization, which explains why "on hold" was matching " on hold " with spaces or tabs added before, between and after the words.
Note | |
---|---|
The name of the token datatype, borrowed from W3C XML Schema, is highly confusing. In IT jargon, a token is a piece of a string between two delimiters, what we would call a "word" in plain English. The token datatype doesn't denote a word. Otherwise "on" and "hold" would be valid tokens, but "on hold" wouldn't. The token datatype is more a "token-ized" datatype in the sense that it's a string made ready to be easily cut into tokens when non-significant whitespace has been removed. This confusion is dangerous since it leads many people into using the string datatype when what they really need is token. (We will see later in this chapter that using the string datatype should be reserved for very specific cases). |
To suppress this normalization, we can specify the second built-in datatype, string, which doesn't perform any transformation on the values before comparing them to the specified value:
<attribute name="available"> <choice> <value type="string">available</value> <value type="string">checked out</value> <value type="string">on hold</value> </choice> </attribute> |
or:
attribute available {string "available"|string "checked out"|string "on hold"} |
Using the new definition, the value of our attribute must exactly match the value specified in the schema: "available", "checked out" and "on hold". No extra whitespace is permitted.
Note | |
---|---|
The native token and string datatypes have the same basic definition as the W3C XML Schema token and string datatypes. The difference is that additional restrictions, which can be applied by using param attributes to the W3C XML Schema datatypes, are not available with RELAX NG's native datatypes. More details are provided in the next chapter, Chapter 8: Datatype Libraries. |
You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.