RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

You are welcome to use our annotation system to give your feedback.


The Simplest Possible Pattern facets

In their simplest form, pattern facets may be used as enumerations applied to the lexical space rather than on the value space.

If, for instance, we have a byte value that can only take the values "1", "5", or "15", the classical way to define such a datatype is to use RELAX NG's choice pattern:

      <choice>
        <value type="byte">1</value>
        <value type="byte">5</value>
        <value type="byte">15</value>
      </choice>

or:

  element foo {
  xsd:byte "1"
  | xsd:byte "5"
  | xsd:byte "15"
 }

This is the normal way of defining this datatype if it matches the lexical space and the value space of an xsd:byte. It grants the flexibility to accept the instance documents with values such as "1", "5", and "15", but also "01" or "0000005."

As far as validation only is concerned, if we wanted to remove the variations with leading zeros, we could just use another datatype such as token instead of xsd:byte in our choice pattern:

      <choice>
        <value type="token">1</value>
        <value type="token">5</value>
        <value type="token">15</value>
      </choice>

or:

  xsd:token "1"
  | xsd:token "5"
  | xsd:token "15"

However, we might have good reasons to use xsd:byte. For example, we might use it if we are interested in type annotation and want to use a RELAX NG processor supporting type annotation. That processor could usefully report the datatype as xsd:byte and not xsd:token.

One of the peculiarities of the pattern facet is it is the only facet constraining the lexical space. If we have an application that doesn't like leading zeros, we can use pattern facets instead of enumerations to define our datatype:

      <data type="byte">
        <param name="pattern">1|5|15</param>
      </data>

or:

 xsd:byte {pattern = "1|5|15"}

Here, we are still using the xsd:byte datatype with its associated semantics, but its lexical space is now constrained to accept only "1", "5", and "15", leaving out any variation that has the same value but a different lexical representation.

[Tip]Tip

This is an important difference from Perl regular expressions, on which W3C XML Schema pattern facets are built. A Perl expression such as /15/ matches any string containing "15," while the W3C XML Schema pattern facet matches only the string equal to "15." The Perl expression equivalent to this pattern facet is thus /^15$/.

This example has been carefully chosen to avoid using any of the meta characters used within pattern facets, which are: ".", "\", "?", "*", "+", "{", "}", "(", ")", "[", and "]". We will see the meaning of these characters later in this chapter; however, for the moment, we just need to know that each of these characters needs to be "escaped" by a leading "\" to be used as a literal. For instance, to define a similar datatype for a decimal when lexical space is limited to "1" and "1.5," we write:

      <data type="decimal">
        <param name="pattern">1|1\.5</param>
      </data>

Or:

 xsd:decimal {pattern = "1|1\.5"}

A common source of errors is that "normal" characters should not be escaped: we will see later that a leading "\" changes their meaning. For instance, "\P" matches all the Unicode punctuation characters and not the character "P".


You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.