Co-occurrence constraints

RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)

Co-occurrence constraints
Prev�	Chapter 7: Constraining Text Values	�Next

Co-occurrence constraints

Another, and considerably more frequent, use of value patterns is to use them to define co-occurrence constraints where the value of a node (often an attribute) changes the content model of another node (often an element). In our library, the author and character elements are very similar. We could group them under the person element and use a type attribute to differentiate between the kind of "person" being described. To make the example clearer, make it more visually obvious that something is different between the two, I'm going add some additional elements describing Peppermint Patty, creating an instance document that contains:

   <person id="CMS" type="author">
    <name>Charles M Schulz</name>
    <born>1922-11-26</born>
    <dead>2000-02-12</dead>
   </person>

and

   <person id="PP" type="character">
    <name>Peppermint Patty</name>
    <born>1966-08-22</born>
    <qualification>bold, brash and tomboyish</qualification>
   <shoecolor>green</shoecolor>
   <hairstyle>thatched roof</hairstyle>
   <favoritealthete>that black and white kid with the big nose</favoriteathlete> 
   <likelycareer>olympic coach or unemployed gym teacher</likelycareer>
      </person>

You can see that both of these use the person element, yet because of the type attribute's contents, a different set of child elements is listed. Support for this approach is a key area where RELAX NG allows more functionality than other schema languages. In these kind of schemas, validation tools need to recognize that the content models may vary depending on the value of the type attribute. RELAX NG supports this feature using value patterns. If we want to require that all the authors precede the characters, we can just update the definitions of the elements describing authors and characters and keep them in sequence in the definition of the book element:

   <element name="book">
    <attribute name="id"/>
    <attribute name="available"/>
    <element name="isbn">
     <text/>
    </element>
    <element name="title">
     <attribute name="xml:lang"/>
     <text/>
    </element>
    <zeroOrMore>
     <element name="person">
      <attribute name="type">
       <value>author</value>
      </attribute>
      <attribute name="id"/>
      <element name="name">
       <text/>
      </element>
      <element name="born">
       <text/>
      </element>
      <optional>
       <element name="dead">
        <text/>
       </element>
      </optional>
     </element>
    </zeroOrMore>
    <zeroOrMore>
     <element name="person">
      <attribute name="type">
       <value>character</value>
      </attribute>
      <attribute name="id"/>
      <element name="name">
       <text/>
      </element>
      <element name="born">
       <text/>
      </element>
      <element name="qualification">
       <text/>
      </element>
     </element>
    </zeroOrMore>
   </element>

Or, using the compact syntax:

  element book {
   attribute id {text},
   attribute available {text},
   element isbn {text},
   element title {attribute xml:lang {text}, text},
   element person {
    attribute id {text},
    attribute type {"author"},
    element name {text},
    element born {text},
    element dead {text}?}*,
   element person {
    attribute id {text},
    attribute type {"character"},
    element name {text},
    element born {text},
    element qualification {text}}*
  }

The use of the value attributes in the declarations for the two person elements makes the first declaration only apply to authors, and the second only apply to characters.

	Warning
	While this is a powerful feature, it is unfortunately not a feature that will survive conversion to DTDs or W3C XML Schema. RELAX NG has fewer restrictions on the XML structures it can describe than either of those, as we'll see this in Chapter 16: Determinism and Datatype Assignment. RELAX NG's co-occurrence constraints can not be expressed with W3C XML Schema because this type of schema is not "deterministic". Some co-occurrence constraints can be expressed in W3C XML Schema using either `xsi:type` when possible, or `xs:key` as a tricky hack. This doesn't work for the general case and this isn't something easy to implement in a schema translator. For more information about this hack, see my book XML Schema.

However, if we choose using only one person element, it is probably to build on commonalities between these elements. We may prefer allow mixing of the definitions of characters and authors. We can express this part of the schema as zero or more person elements having two possible definitions such as:

   <element name="book">
    <attribute name="id"/>
    <attribute name="available"/>
    <element name="isbn">
     <text/>
    </element>
    <element name="title">
     <attribute name="xml:lang"/>
     <text/>
    </element>
    <zeroOrMore>
     <element name="person">
      <choice>
       <group>
        <attribute name="type">
         <value>author</value>
        </attribute>
        <attribute name="id"/>
        <element name="name">
         <text/>
        </element>
        <element name="born">
         <text/>
        </element>
        <optional>
         <element name="dead">
          <text/>
         </element>
        </optional>
       </group>
       <group>
        <attribute name="type">
         <value>character</value>
        </attribute>
        <attribute name="id"/>
        <element name="name">
         <text/>
        </element>
        <element name="born">
         <text/>
        </element>
        <element name="qualification">
         <text/>
        </element>
       </group>
      </choice>
     </element>
    </zeroOrMore>
   </element>

Or, in the compact syntax:

   element person {
    (
     attribute id {text},
     attribute type {"author"},
     element name {text},
     element born {text},
     element dead {text}?
    ) | (
     attribute id {text},
     attribute type {"character"},
     element name {text},
     element born {text},
     element qualification {text}
    )}

Now that we have put the definitions of the two contents for the person element next to each other, we see that an attribute and the two first sub-elements are common and can be refactored to take advantage of this similarity. The definition of the person element can thus be combined and simplified to:

     <element name="person">
      <attribute name="id"/>
      <element name="name">
       <text/>
      </element>
      <element name="born">
       <text/>
      </element>
      <choice>
       <group>
        <attribute name="type">
         <value>author</value>
        </attribute>
        <optional>
         <element name="dead">
          <text/>
         </element>
        </optional>
       </group>
       <group>
        <attribute name="type">
         <value>character</value>
        </attribute>
        <element name="qualification">
         <text/>
        </element>
       </group>
      </choice>
     </element>

or:

   element person {
    attribute id {text},
    element name {text},
    element born {text},
    ((
     attribute type {"author"},
     element dead {text}?
    ) | (
     attribute type {"character"},
     element qualification {text}
    ))
   }

Note that in the compact syntax, we have had to use a double parenthesis to express our choice. This is because the operators used at any level must be homogeneous (you can't mix ,, |, and & within the same level -- it would be ambiguous). The other thing to note is that because we have been able to group the elements with the attribute used to create the distinction between content models, we have been able to refactor the id attribute and the name and born elements and keep the type attribute and its two possible values in the choice. This has been possible, not only because the example had been carefully prepared, but also because of the semantic implicit to interleave given to the attribute patterns which lets us locate the attribute either inside or outside of the choice. Finally, we should note that this refactoring is just a syntactical variation. Even when a situation arises where such simplification is impossible, the co-occurrence constraint can still be expressed even though it will be more verbose.

You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.

Prev�	Up	�Next
Fixed Values�	Home	�Enumerations