Co-Occurrence Constraints

Co-Occurrence Constraints
Prev	Constraining Text Values	Next

Another, and considerably more frequent, use of value patterns is to define co-occurrence constraints, in which the value of a node (often an attribute) changes the content model of another node (often an element). In our library, the author and character elements are very similar. You can group them under the person element and use a type attribute to differentiate between the kind of "person" being described. To make the example clearer, to make it more visually obvious that something is different between the two, I'm going add some additional elements describing Peppermint Patty, creating an instance document that contains:

<person id="CMS" type="author">
  <name>Charles M Schulz</name>
  <born>1922-11-26</born>
  <dead>2000-02-12</dead>
</person>

and:

<person id="PP" type="character">
 <name>Peppermint Patty</name>
 <born>1966-08-22</born>
 <qualification>bold, brash and tomboyish</qualification>
 <shoecolor>green</shoecolor>
 <hairstyle>thatched roof</hairstyle>
 <favoriteathlete>that black and white kid with the big nose</favoriteathlete> 
 <likelycareer>olympic coach or unemployed gym teacher</likelycareer>
</person>

You can see that both examples use the person element, yet because of the type attribute's contents, a different set of child elements is listed. Support for this approach is a key area in which RELAX NG allows more functionality than other schema languages. In these kind of schemas, validation tools need to recognize that the content models might vary depending on the value of the type attribute. RELAX NG supports this feature using value patterns. If you want to require that all the authors precede the characters, just update the definitions of the elements describing authors and characters and keep them in sequence in the definition of the book element:

<element name="book">
   <attribute name="id"/>
   <attribute name="available"/>
   <element name="isbn">
    <text/>
   </element>
   <element name="title">
    <attribute name="xml:lang"/>
    <text/>
   </element>
   <zeroOrMore>
    <element name="person">
     <attribute name="type">
      <value>author</value>
     </attribute>
     <attribute name="id"/>
     <element name="name">
      <text/>
     </element>
     <element name="born">
      <text/>
     </element>
     <optional>
     <element name="dead">
       <text/>
     </element>
     </optional>
    </element>
   </zeroOrMore>
   <zeroOrMore>
    <element name="person">
     <attribute name="type">
      <value>character</value>
     </attribute>
     <attribute name="id"/>
     <element name="name">
      <text/>
     </element>
     <element name="born">
      <text/>
     </element>
     <element name="qualification">
      <text/>
     </element>
     <optional>
      <element name="shoecolor">
       <text/>
      </element>
     </optional>
     <optional>
      <element name="hairstyle">
       <text/>
      </element>
     </optional>
     <optional>
      <element name="favoriteathlete">
       <text/>
      </element>
     </optional>
     <optional>
      <element name="likelycareer">
       <text/>
      </element>
     </optional>
     <optional>
      <element name="shoecolor">
       <text/>
      </element>
     </optional>
    </element>
   </zeroOrMore>
  </element>

or, using the compact syntax:

element book {
  attribute id { text },
  attribute available { text },
  element isbn { text },
  element title {
   attribute xml:lang { text },
   text
  },
  element person {
   attribute type { "author" },
   attribute id { text },
   element name { text },
   element born { text },
   element dead { text }?
  }*,
  element person {
   attribute type { "character" },
   attribute id { text },
   element name { text },
   element born { text },
   element qualification { text },
   element shoecolor { text }?,
   element hairstyle { text }?,
   element favoriteathlete { text }?,
   element likelycareer { text }?,
   element shoecolor { text }?
  }*
 }

The use of the value attributes in the declarations for the two person elements makes the first declaration apply only to authors, and the second apply only to characters.

	Warning
	While co-occurrence constraints provide powerful capabilities, they unfortunately don't survive conversion to DTDs or W3C XML Schema. RELAX NG has fewer restrictions on the XML structures it can describe than either of those, as you'll see this in Chapter 16. RELAX NG's co-occurrence constraints can't be expressed with W3C XML Schema, because this type of schema isn't "deterministic." Some co-occurrence constraints can be expressed in W3C XML Schema using either `xsi:type` when possible or `xs:key` as a tricky hack. These methods don't work for the general case and aren't easy to implement in a schema translator. For more information about this hack, see my book XML Schema (O'Reilly).

The flexibility RELAX NG provides for defining co-occurrence constraints makes it a good tool to check how styles are used in XHTML, OpenOffice, or Microsoft Office documents. For example, it's easy to use such constraints on the XHTML class attributes so that a class "bar" is used only when embedded in a class "foo". This feature is useful for checking style best practices in text documents

However, if you choose only one person element, it's to build on commonalities between these elements. I might prefer to allow mixing of the definitions of characters and authors. I can express this part of the schema as zero or more person elements having two possible definitions, such as:

<element name="book">
   <attribute name="id"/>
   <attribute name="available"/>
   <element name="isbn">
    <text/>
   </element>
   <element name="title">
    <attribute name="xml:lang"/>
    <text/>
   </element>
   <zeroOrMore>
    <element name="person">
     <choice>
      <group>
       <attribute name="type">
        <value>author</value>
       </attribute>
       <attribute name="id"/>
       <element name="name">
        <text/>
       </element>
       <element name="born">
        <text/>
       </element>
       <optional>
        <element name="dead">
         <text/>
        </element>
       </optional>
      </group>
      <group>
       <attribute name="type">
        <value>character</value>
       </attribute>
       <attribute name="id"/>
       <element name="name">
        <text/>
       </element>
       <element name="born">
        <text/>
       </element>
       <element name="qualification">
        <text/>
       </element>
       <optional>
        <element name="shoecolor">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="hairstyle">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="favoriteathlete">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="likelycareer">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="shoecolor">
         <text/>
        </element>
       </optional>
      </group>
     </choice>
    </element>
   </zeroOrMore>
</element>

or, in the compact syntax:

element book {
  attribute id { text },
  attribute available { text },
  element isbn { text },
  element title {
   attribute xml:lang { text },
   text
  },
  element person {
   (attribute type { "author" },
    attribute id { text },
    element name { text },
    element born { text },
    element dead { text }?)
   | (attribute type { "character" },
     attribute id { text },
     element name { text },
     element born { text },
     element qualification { text },
     element shoecolor { text }?,
     element hairstyle { text }?,
     element favoriteathlete { text }?,
     element likelycareer { text }?,
     element shoecolor { text }?)
  }*
}

Now that you have seen the definitions of the two contents for the person element next to each other, you can see that an attribute and the two first subelements are common and can be refactored to take advantage of this similarity. The definition of the person element can thus be combined and simplified to:

<element name="book">
   <attribute name="id"/>
   <attribute name="available"/>
   <element name="isbn">
    <text/>
   </element>
   <element name="title">
    <attribute name="xml:lang"/>
    <text/>
   </element>
   <zeroOrMore>
    <element name="person">
     <attribute name="id"/>
     <element name="name">
      <text/>
     </element>
     <element name="born">
      <text/>
     </element>
     <choice>
      <group>
       <attribute name="type">
        <value>author</value>
       </attribute>
       <optional>
        <element name="dead">
         <text/>
        </element>
       </optional>
      </group>
      <group>
       <attribute name="type">
        <value>character</value>
       </attribute>
       <element name="qualification">
        <text/>
       </element>
       <optional>
        <element name="shoecolor">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="hairstyle">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="favoriteathlete">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="likelycareer">
         <text/>
        </element>
       </optional>
       <optional>
        <element name="shoecolor">
         <text/>
        </element>
       </optional>
      </group>
     </choice>
    </element>
   </zeroOrMore>
</element>

or:

element book {
  attribute id { text },
  attribute available { text },
  element isbn { text },
  element title {
   attribute xml:lang { text },
   text
  },
  element person {
   attribute id { text },
   element name { text },
   element born { text },
   ((attribute type { "author" },
    element dead { text }?)
    | (attribute type { "character" },
     element qualification { text },
     element shoecolor { text }?,
     element hairstyle { text }?,
     element favoriteathlete { text }?,
     element likelycareer { text }?,
     element shoecolor { text }?))
  }*
}

Note that in the compact syntax, I had to use double parentheses to express my choice, because the operators used at any level must be homogeneous. You can't mix commas, pipes, and ampersands within the same level; this mixing is ambiguous. Also, because I grouped the elements with the attribute used to create the distinction between content models, I can refactor the id attribute and the name and born elements and keep the type attribute and its two possible values in the choice. This is possible not only because the example has been carefully prepared, but also because of the semantic implicit to interleave given to the attribute patterns, which lets you locate the attribute either inside or outside of the choice. Finally, note that this refactoring is just a syntactical variation. Even when a situation arises in which such simplification is impossible, the co-occurrence constraint can still be expressed, even though it will be more verbose.

Prev	Up	Next
Fixed Values	Home	Enumerations