by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)


First Patterns

In plain English, the document, shown in Example 3-1 can be described as having:

The good news—and what makes RELAX NG so easy to learn—is that in its simplest form, RELAX NG is pretty much a way to formalize the previous statements with simple matching rules. Terms described in the plain English description have matching terms in the RELAX NG Schema document that look a lot like XML:

You saw in Chapter 2 that almost every XML structure is a natural pattern for RELAX NG. Further, each RELAX NG element is a pattern; therefore, each RELAX NG pattern matches a structure from the XML document. Let's now spend some time examining each basic pattern.

The text Pattern

This pattern is the simplest; it simply matches a text node. More precisely, it matches zero or more text nodes. As you'll see in Chapter 6, the text pattern may also be used in the definition of mixed content models, elements that may have both child elements and text nodes. For now, though, think of text as matching a text node.

Because attribute values contain text, the text pattern can also match any attribute value. (The W3C XML Infoset doesn't consider attribute values to be nodes, but RELAX NG does.)

The RELAX NG XML expression for text patterns is just:

 <text/>

The attribute Pattern

Not surprisingly, the attribute pattern matches attributes from an XML instance document. The name of the attribute is defined in the name attribute of the attribute pattern. The content of an attribute is defined as a child element of the attribute pattern.

To define the id attribute, you can write:

 <attribute name="id">
  <text/>
 </attribute>

In this brief example, you can see how the definitions given earlier apply here. The attribute's name, id, is defined within the name attribute. The content, text, is in a child element.

This example reads as: "an attribute named id with a text value." Since any attribute can have a value, the text pattern is assumed, so writing out <text/> is not required. Thus, the previous definition is strictly equivalent to this shorter one:

  <attribute name="id"/>

The last thing to know about the attribute pattern is that while attribute names are defined by the name attribute or the attribute pattern, it is also possible to define sets of possible names for an attribute. This feature is explained in detail in Chapter 12.

The element Pattern

Just as the attribute pattern matches attributes, the element pattern matches elements. To define the name element, write:

 <element name="name">
  <text/>
 </element>

Like the attribute pattern, it is possible to replace the name attribute of the element pattern with a set of names. This practice will be explained in detail in Chapter 12.

Unlike attributes, not all elements accept text nodes. For that reason, the text pattern isn't implicitly assumed for elements. In fact, there is no implicit content for elements. The content of each element must be explicitly described, even if the description shows that the element is always empty.

Because a text pattern matches zero or more text nodes, the previous definition of the name element also matches empty elements such as:

 <name/>

as well as elements such as:

 <name>Charles M Schulz</name>

There are additional ways to restrict text nodes. You'll see in Chapter 7 how to add additional restrictions to text nodes to avoid empty elements if necessary. In Chapter 8, you'll learn how to use the datatypes from W3C XML Schema to add more specific restrictions such as date or number requirements.

Attributes can be added within elements. To define the title element, write:

 <element name="title">
  <attribute name="xml:lang"/>
  <text/>
 </element>

You can see that an xml:lang attribute has been defined from the XML namespace. I will describe the support of namespaces in Chapter 11, but here you can begin to see how straightforward it is. The description of this attribute is added by inserting xml:lang as the name of the attribute. Any xml prefix has been predeclared to refer to the XML namespace, http://www.w3.org/XML/1998/namespace. This means that the previous address doesn't need to be written out. For other namespaces, however, you need to declare the namespace using mechanisms described in Chapter 11.

Note that RELAX NG is clever enough to know that attributes are always located in the start tag of XML elements and that the order in which they are written isn't considered significant. This means that the attribute pattern can be located anywhere in the definition of elements. It doesn't make a difference if you write:

 <element name="title">
  <attribute name="xml:lang"/>
   <text/>
 </element>

as before or if you switch the order of the attributes like this:

 <element name="title">
  <text/>
  <attribute name="xml:lang"/>
 </element>

In addition to text nodes and attributes, elements can also include child elements. You can define the author element this way:

 <element name="author">
  <attribute name="id"/>
  <element name="name">
   <text/>
  </element>
  <element name="born">
   <text/>
  </element>
  <element name="died">
   <text/>
  </element>
 </element>

That's not exactly the right definition, since we want the born and died elements to be optional. To make this happen, I need to introduce a new pattern: the optional pattern.

The optional Pattern

The optional pattern makes its content just that, optional; the element doesn't have to be there. To specify that the born and died elements are optional, write:

 <optional>
  <element name="born">
   <text/>
  </element>
 </optional>
 <optional>
  <element name="died">
   <text/>
  </element>
 </optional>

Note that the markup and meaning are different from:

 <optional>
  <element name="born">
   <text/>
  </element>
  <element name="died">
   <text/>
  </element>
 </optional>

And also different from:

 <optional>
  <element name="born">
   <text/>
  </element>
  <optional>
   <element name="died">
    <text/>
   </element>
  </optional>
 </optional>

In the first case, each element is embedded in its own optional pattern. The two elements are thus independently optional. I can include one, both, or none of them in valid instance documents.

In the second case, both elements are embedded in the same optional pattern. Thus I can include either none or both in instance documents.

In the third case, the first optional pattern includes the born element and an optional died element. Both or none of them can be in an instance document, but now there are more possibilities: the born element can be there alone, or the born element can be there with the died element, but the died element can't be there without the born element because of the way the elements are nested.

None of these combinations is "right" or "wrong"; they are just different pattern combinations that allow different element combinations in the instance document. What's nice about RELAX NG is that there are so few restrictions that almost any combination is allowed. Indeed, there are a few restrictions, but you don't need to think about them until they're covered in Chapter 15.

The oneOrMore Pattern

The oneOrMore pattern specifies, as you might have guessed, that its content may appear one or more times. oneOrMore specifies that a book must have one or more authors:

<oneOrMore>
  <element name="author">
    <attribute name="id"/>
    <element name="name">
      <text/>
    </element>
    <element name="born">
      <text/>
    </element>
    <optional>
      <element name="died">
        <text/>
      </element>
    </optional>
  </element>
</oneOrMore>

The zeroOrMore Pattern

The last pattern needed in our example is zeroOrMore. You'll have figured out that it specifies its content to appear zero or more times. This example shows the character elements:

<zeroOrMore>
 <element name="character">
  <attribute name="id"/>
  <element name="name">
   <text/>
  </element>
  <optional>
   <element name="born">
    <text/>
   </element>
  </optional>
  <element name="qualification">
   <text/>
  </element>
 </element>
</zeroOrMore>

This text is released under the Free Software Foundation GFDL.