RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)
You are welcome to use our annotation system to give your feedback.
In plain English, we could describe the document shown in Example 3-1 as having:
one libraryelement composed of one of more
book elements having
id and available attributes and
an isbn element composed of text
a title element with an xml:lang attribute and a text node
one or more author elements with
an id attribute
a name element
an optional born element
an optional died element
zero or more character elements with
an id attribute
a name element
an optional born element
a qualification element.
The good news - and what makes RELAX NG so easy to learn - is that in its simplest form, RELAX NG is pretty much a way to formalize the previous statements with simple matching rules. Terms described in the plain English document above have matching terms in the RELAX NG Schema document that look a lot like XML.
A "library element" matches <element name="library">...</element>
An "id attribute" matches <attribute name="id"/>
"One or more" matches <oneOrMore>...</oneOrMore>
"Zero or more" matches <zeroOrMore>...</zeroOrMore>
"Text" matches <text/>
"Optional" matches <optional>...</optional>
We've seen in "Chapter 2: Simple Is Beautiful" that almost every XML structure is naturally a pattern for RELAX NG. Further, each of these RELAX NG elements are patterns. Therefore, each RELAX NG pattern matches a structure from the XML document. Let's now spend some time examining each of these basic patterns.
This pattern is the simplest; it simply matches a text node. More precisely, it matches zero or more text nodes. As we'll see in "Chapter 6: More Patterns", the text pattern may also be used in the definition of mixed content models, elements which may have both child elements and text nodes. For now, though, we can think of text as simply matching a text node.
As attribute values contain text, the text pattern can also match any attribute value. (The W3C XML Infoset doesn't consider attribute values to be nodes, but RELAX NG does.)
The RELAX NG XML expression for text patterns is just:
<text/> |
Not surprisingly, the attribute pattern matches attributes from an XML instance document. The name of the attribute is defined in the name attribute of the attribute pattern. The content of an attribute is defined as a child element of the attribute pattern.
To define the id attribute, we could write:
<attribute name="id"> <text/> </attribute> |
With this brief example you can see how the definitions given above apply. The attribute's name, id is defined within the name attribute. The content, text is in a child element.
This would read as: "an attribute named id with a text value". Since any attribute can have a value, the text pattern is assumed, so writing out <text/> is not required. Thus, this definition is strictly equivalent to this shorter one:
<attribute name="id"/> |
The last thing we need to mention about the attribute pattern is that while the name of the attributes is defined by the name attribute or the attribute pattern, it is also possible to define sets of possible names for an attribute. This feature will be explained in detail in Chapter 12: Writing Extensible Schemas.
Just as the attribute pattern matches attributes, the element pattern matches elements. To define the name element, we will write:
<element name="name"> <text/> </element> |
Like the attribute pattern, it is possible to replace the name attribute of the "element" pattern with a set of names. This will be explained in detail in Chapter 12: Writing Extensible Schemas.
Unlike attributes, not all elements accept text nodes. For that reason, the text pattern isn't implicitly assumed for elements. In fact there is no implicit content for elements. The content of each element must be explicitly described, even if the description shows that the element is always empty.
The fact that a text pattern matches zero or more text nodes means that the definition of the name element above would also match empty elements such as:
<name/> |
as well as more elements such as:
<name>Charles M Schulz</name> |
There are additional ways to restrict text nodes. We will see in Chapter 7: Constraining Text Values, how we can add additional restrictions to text nodes to avoid empty elements if necessary. In Chapter 8: Datatype Libraries, we'll learn how to use the datatypes from W3C XML Schema to add more specific restrictions such as requiring a date or number.
Attributes can be added within elements. To define the title element we will write:
<element name="title"> <attribute name="xml:lang"/> <text/> </element> |
Here we had to define an attribute (xml:lang) from the XML namespace. We will see the support of namespaces in Chapter 11: Namespaces, but here we can begin to see how straightforward it is. We've just added the description of this attribute simply by inserting xml:lang as the name of the attribute. Any xml prefix has been predeclared to refer to the XML namespace, http://www.w3.org/XML/1998/namespace. This means that the address above doesn't need to written out. For other namespaces, however, we would need to declare the namespace using mechanisms described in Chapter 11: Namespaces.
Note that RELAX NG is clever enough to know that attributes are always located in the start tag of XML elements and that the order in which they are written is not considered significant. This means that the attribute pattern can be located anywhere in the definition of elements. It would mot make a difference if we had written:
<element name="title"> <attribute name="xml:lang"/> <text/> </element> |
as we did before or if we switched the order of the attributes like this:
<element name="title"> <text/> <attribute name="xml:lang"/> </element> |
In addition to text nodes and attributes, elements can also include child elements. We can define the "author" element this way:
<element name="author"> <attribute name="id"/> <element name="name"> <text/> </element> <element name="born"> <text/> </element> <element name="died"> <text/> </element> </element> |
That's not exactly the definition we want, since we want the born and died elements to be optional. To make this happen, we need to introduce a new pattern:
The optional pattern makes its content just that, optional; the element doesn't have to be there. To specify that the born and died elements are optional, we will write:
<optional> <element name="born"> <text/> </element> </optional> <optional> <element name="died"> <text/> </element> </optional> |
Note that the markup and meaning are different from
<optional> <element name="born"> <text/> </element> <element name="died"> <text/> </element> </optional> |
And also different from
<optional> <element name="born"> <text/> </element> <optional> <element name="died"> <text/> </element> </optional> </optional> |
In the first case, each element is embedded in its own optional pattern. The two elements are thus independently optional. I can include both of them, none of them or one of them in valid instance documents.
In the second case, both elements are embedded in the same optional pattern. Thus we can include either none of them or both of them in instance documents.
In the third case, the first optional pattern includes the born element and an optional died element. We can find both of them or none of them in an instance document; but now there are more possibilities: the born element can be there alone, the born element can be there with the died element, but the died element can't be there without the born element because of the way the elements are nested.
None of these combinations is "right" or "wrong", they are just different pattern combinations that allow different element combinations in the instance document. What's nice with RELAX NG is that there are so few restrictions that almost any combination is allowed. Indeed, there are a few restrictions, but we don't need to think about them until they're covered in Chapter 15: Simplification And Restrictions.
The oneOrMore pattern specifies, as you might have guessed, that its content may appear one or more times. The use case for oneOrMore in our example is to define that a book must have one or more authors:
<oneOrMore> <element name="author"> <attribute name="id"/> <element name="name"> <text/> </element> <element name="born"> <text/> </element> <optional> <element name="died"> <text/> </element> </optional> </element> </oneOrMore> |
The last pattern needed in our example is zeroOrMore. You'll have caught on that it tells that its content may appear zero or more times. Our example here is for our character elements:
<zeroOrMore> <element name="character"> <attribute name="id"/> <element name="name"> <text/> </element> <optional> <element name="born"> <text/> </element> </optional> <element name="qualification"> <text/> </element> </element> </zeroOrMore> |
You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.