by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
In their simplest form, pattern facets may be used as enumerations applied to the lexical space rather than on the value space.
If, for instance, you have a byte value that can take only the values 1, 5, or 15, the classical way to define such a datatype is to use RELAX NG's choice pattern:
<choice> <value type="byte">1</value> <value type="byte">5</value> <value type="byte">15</value> </choice> |
or:
element foo { xsd:byte "1" | xsd:byte "5" | xsd:byte "15" } |
This example is the normal way to define this datatype if it matches the lexical space and the value space of an xsd:byte. It grants the flexibility to accept the instance documents with values such as 1, 5, and 15, but also 01 or 0000005.
As far as validation alone is concerned, if you want to remove the variations with leading zeros, you can use another datatype such as token instead of xsd:byte in your choice pattern:
<choice> <value type="token">1</value> <value type="token">5</value> <value type="token">15</value> </choice> |
or:
xsd:token "1" | xsd:token "5" | xsd:token "15" |
However, you might have good reasons to use xsd:byte. For example, you can use it if you're interested in type annotation and want to use a RELAX NG processor supporting type annotation. That processor can usefully report the datatype as xsd:byte and not xsd:token.
One of the peculiarities of the pattern facet is that it is the only facet constraining the lexical space. If you have an application that doesn't like leading zeros, you can use pattern facets instead of enumerations to define your datatype:
<data type="byte"> <param name="pattern">1|5|15</param> </data> |
or:
xsd:byte {pattern = "1|5|15"} |
Here, I am still using the xsd:byte datatype with its associated semantics, but its lexical space is now constrained to accept only 1, 5, and 15, leaving out any variation that has the same value but a different lexical representation.
This example has been carefully chosen to avoid using any metacharacters within pattern facets, which are: . \ ? * + { }( )[ and ]. You'll see the meaning of these characters later in this chapter; however, for the moment, you just need to know that each of these characters needs to be escaped by a leading backslash to be used as a literal. For instance, to define a similar datatype for a decimal when lexical space is limited to 1 and 1.5, write:
<data type="decimal"> <param name="pattern">1|1\.5</param> </data> |
or:
xsd:decimal {pattern = "1|1\.5"} |
A common source of errors is that normal characters shouldn't be escaped: you'll see later that a leading backslash changes their meaning. For instance, \P matches all the Unicode punctuation characters, not the character P.
This text is released under the Free Software Foundation GFDL.