by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
Unified Modeling Language (or UML) is an Object Management Group (OMG) standard and a successor to many of the object-oriented methods developed in the 1980s and 1990s. The idea of using UML to model XML documents isn't new. Much that is good has already been published on the subject (see, for instance, the book Modeling XML Applications with UML by David Carlson (Addison Wesley) and his articles on XML.com).
There are two different levels at which UML and XML can be mapped:
UML can be used to model the structure of XML documents directly. XML schemas can be generated for the purpose of validating the documents, but they are provided as a convenience for application developers. UML doesn't worry about schema details. Their style and modularity aren't their most important features. The algorithm for producing these schemas is focused on expressing validation rules that make the XML data match the UML diagram as closely as possible.
UML can be used to model an XML schema. The UML diagram is a higher-level view of the schema, and the schema by itself is the main delivery. The UML diagram needs to be able to control exactly how each schema structure is described. Specific stereotypes and parameters are often added to customize the level of control.
One of the points that appears clearly in all the work related to this topic is that it is quite easy to map UML objects into XML or to use UML to describe classes of instance documents. The most difficult issue when doing so is that UML operates on and graphs, XML is a tree structure. Some links need to be either removed or serialized using techniques to make the mapping happen cleanly (you can use XLink, but it isn't built into XML 1.0). Except for this issue, the relationship between UML and XML is quite natural in both directions: UML provides a simple language to model XML documents and XML provides a natural serialization syntax for UML objects.
Another point concerning XML and UML is that it's not simple to generate DTDs and W3C XML Schemas from UML. When generating DTDs or W3C XML schemas from UML, you have to cope with the restrictions of these languages, notably those related to unordered content models. Unordered content models are a natural fit for UML, in which the attributes of a class are unordered. The limitations of DTDs and W3C XML Schemas create problems when UML attributes are serialized as XML elements.
The issue when modeling W3C XML schemas in UML is that the model needs to describe the XML instances and the schema itself. This is where all the complexity of W3C XML schemas enters the UML world. While there is a good overlap between UML and XML, the overlap isn't so good between XML and W3C XML schemas. W3C XML schemas have in some ways enriched XML with their own expectations, and their expectations don't match those of UML. Figure 14-4 shows how the overlaps work and don't work.
With RELAX NG, on the contrary, the overlap between XML and the schema language is nearly perfect: RELAX NG can describe almost any XML structure. As it has no notion of a Post Schema Validation Infoset (PSVI), RELAX NG doesn't want to add anything to XML. As a result, the overlap between UML, XML, and RELAX NG is almost as big as the overlap between UML and XML, as shown in Figure 14-5.
Designed with a UML editor such as ArgoUML, our library can be pictured as the model shown in Figure 14-6.
This example uses conventions that may look natural but are far from being official. For instance, I have prefixed attribute names with @, an idea borrowed from Will Provost's work on XML.com. Also, to model the title element with its text node and attribute, I have used the name rng:data to identify its text content as a UML attribute.
ArgoUML saves its documents using the XML Metadata Interchange (XMI) format defined by the Object Management Group (OMG). (You can find more information about XMI at http://www.omg.org/technology/xml/.) XMI is verbose; the XMI document generated by ArgoUML for this diagram is more than 800 lines long. I won't include it here, but it's not difficult to generate a schema from this document with unordered content models, such as:
<?xml version="1.0" encoding="utf-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="library"> <interleave> <zeroOrMore> <element name="book"> <interleave> <element name="isbn"> <data type="token"/> </element> <attribute name="id"> <data type="token"/> </attribute> <attribute name="available"> <data type="boolean"/> </attribute> <zeroOrMore> <element name="author"> <interleave> <attribute name="id"> <data type="token"/> </attribute> <element name="name"> <data type="token"/> </element> <element name="born"> <data type="date"/> </element> <element name="died"> <data type="date"/> </element> </interleave> </element> </zeroOrMore> <zeroOrMore> <element name="character"> <interleave> <attribute name="id"> <data type="token"/> </attribute> <element name="name"> <data type="token"/> </element> <element name="born"> <data type="date"/> </element> <element name="qualification"> <data type="token"/> </element> </interleave> </element> </zeroOrMore> <element name="title"> <attribute name="xml:lang"> <data type="language"/> </attribute> <data type="token"/> </element> </interleave> </element> </zeroOrMore> </interleave> </element> </start> </grammar> |
or, after conversion into the compact syntax with Trang:
start = element library { element book { element isbn { xsd:token } & attribute id { xsd:token } & attribute available { xsd:boolean } & element author { attribute id { xsd:token } & element name { xsd:token } & element born { xsd:date } & element died { xsd:date } }* & element character { attribute id { xsd:token } & element name { xsd:token } & element born { xsd:date } & element qualification { xsd:token } }* & element title { attribute xml:lang { xsd:language }, xsd:token } }* } |
The only trouble I've had with RELAX NG itself comes out of one of the few restrictions of RELAX NG, which was mentioned in Chapter 7. Data patterns can't be interleaved. When generating this schema, you must be careful to treat complex-type simple-content models (i.e., elements such as the title element, which accepts attributes and text nodes but no children elements) as an exception. This straight translation is of course impossible with W3C XML schemas, because of the cardinality of the character and author subelements. Containers need to be added to fit the limitations of that language.
Note that I've generated a Russian-doll design; depending on the strategy used in the translation, I can generate other designs as well.
This text is released under the Free Software Foundation GFDL.