by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
This pattern-based approach is both new and old. It's new in the sense that the idea of patterns has been applied to XML in RELAX and now in RELAX NG. It's old because it is the adaptation of techniques and theories developed around regular expressions in the 1960s. The name "RELAX," which stands for REgular LAnguage for XML, suggests this related nature. ("NG" stands for New Generation.) RELAX NG relies on both the strong mathematical theory underlying regular expressions and on additional work done by Murata Makoto, which adapts the mathematical concept of "hedges" to XML.
When I asked Murata Makoto, one of the fathers of RELAX NG, my first questions, he kindly pointed me to the details of his work. I was shocked to see that I had forgotten all the mathematics I had learned at school. I couldn't understand a word of it. Fortunately, I can assure you that you won't need to understand hedges or any of the other math behind RELAX NG. Nevertheless, it's very comforting to know that the schema language you are using has an elegant mathematical background. It ensures that the design will work, and work well. While the math behind it is difficult, the results it produces are surprisingly intuitive.
In keeping with its mathematical foundation, RELAX NG patterns are defined as logical operations performed on sets of XML structures. This gives the specification a formalism that removes any possibility of ambiguous interpretation. The lack of ambiguity is incredibly helpful for ensuring the interoperability of different implementations of RELAX NG.
The strong mathematical background of RELAX NG didn't mean that everything needed to be reinvented for RELAX NG implementers. On the contrary, the derivative algorithm used by James Clark in his Jing RELAX NG processor was inspired by work done in 1964 on the derivation of regular expressions. It recursively removes the nodes found in the instance documents from the patterns: the document is valid if the patterns left after the last node are all optional.
Murata Makoto has adapted the well-known algorithm of finite state machines to cope with the level of nondeterminism accepted by RELAX NG. He has, for example, used this to develop a RELAX NG validator that is lightweight enough to be used in a mobile phone.
Apart from the fact that it can be implemented with well-known and well-documented algorithms, developers of RELAX NG processors also appreciate the simplicity of its underlying model. This simplicity should also guarantee a strong interoperability between implementations, unlike with some more complex schema languages.
This text is released under the Free Software Foundation GFDL.