by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)

Flattening the First Schema

Table of Contents

5.1. Defining Named Patterns
5.2. Referencing Named Patterns
5.3. The grammar and start Elements
5.4. Assembling the Parts
5.5. Problems That Never Arise
5.6. Recursive Models
5.7. Escaping Named Pattern Identifiers in the Compact Syntax

If you look at the structure of the Russian doll-style schema, you'll see that it follows the structure of the instance document it applies to, as shown in Figure 3-1. Writing the first schema has pretty much been limited to inserting text, element, or attribute elements into the schema each time a text node, element, or attribute was encountered in the instance document. This method of creating schemas can be seen as a serialization of the XML infoset (i.e., of the structure available in the document) and could, therefore, be easily automated.


Automated serialization is the principle behind Examplotron, a program described in Chapter 14.

There are a couple of drawbacks to modeling documents with the Russian doll-style schemas, however. First, they aren't modular and therefore become difficult to read and maintain when documents are large or complex. Second, they can't represent recursive (self-referencing) models. (Lists that may themselves contain lists are a common case of this model.)

The lack of modularity can be seen in a document as simple as the first schema, shown in Example 3-1. There's a name element that uses the same model within both the character and author elements.

Figure 5-1 shows how, in the first schema, you need to give the definition of what name means in each context:

Figure 5-1. Two different definitions of name in the same schema

Two different definitions of name in the same schema

You might think that the extra text won't make a difference, but that's not completely true. The additional verbosity here is innocuous because the definition of the name element is simple, and thus not verbose. The principle is the same if the definition is complex, however. It will require redundancy. This redundancy makes maintenance of the schema more error-prone. If I need to update the definition of the name element, I'll need to update it as many times at it appears, but I'll give myself more room for mistakes. Common sense applies the same rules to XML schema languages as to any programming language. Limiting repetitive work makes developers happy!

Another rule borrowed from programming languages concerns recursive models. Recursive models, models that reference themselves, are those like XHTM in which, for example, div elements can be embedded within other div elements without any restriction in the number of levels of embedding. You can just copy the definition of the div element again and again, but it's both inefficient and limiting. We need a way to define and reference the content model of the div element recursively. In the course of this chapter, we'll examine cases of both modularity and recursive models.

This text is released under the Free Software Foundation GFDL.