by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)


UML

Unified Modeling Language (or UML) is an Object Management Group (OMG) standard and a successor to many of the object-oriented methods developed in the 1980s and 1990s. The idea of using UML to model XML documents isn't new. Much that is good has already been published on the subject (see, for instance, the book Modeling XML Applications with UML by David Carlson (Addison Wesley) and his articles on XML.com).

There are two different levels at which UML and XML can be mapped:

One of the points that appears clearly in all the work related to this topic is that it is quite easy to map UML objects into XML or to use UML to describe classes of instance documents. The most difficult issue when doing so is that UML operates on and graphs, XML is a tree structure. Some links need to be either removed or serialized using techniques to make the mapping happen cleanly (you can use XLink, but it isn't built into XML 1.0). Except for this issue, the relationship between UML and XML is quite natural in both directions: UML provides a simple language to model XML documents and XML provides a natural serialization syntax for UML objects.

Another point concerning XML and UML is that it's not simple to generate DTDs and W3C XML Schemas from UML. When generating DTDs or W3C XML schemas from UML, you have to cope with the restrictions of these languages, notably those related to unordered content models. Unordered content models are a natural fit for UML, in which the attributes of a class are unordered. The limitations of DTDs and W3C XML Schemas create problems when UML attributes are serialized as XML elements.

The issue when modeling W3C XML schemas in UML is that the model needs to describe the XML instances and the schema itself. This is where all the complexity of W3C XML schemas enters the UML world. While there is a good overlap between UML and XML, the overlap isn't so good between XML and W3C XML schemas. W3C XML schemas have in some ways enriched XML with their own expectations, and their expectations don't match those of UML. Figure 14-4 shows how the overlaps work and don't work.

Figure 14-4. Overlaps between XML, UML, and W3C XML schema

Overlaps between XML, UML, and W3C XML schema

With RELAX NG, on the contrary, the overlap between XML and the schema language is nearly perfect: RELAX NG can describe almost any XML structure. As it has no notion of a Post Schema Validation Infoset (PSVI), RELAX NG doesn't want to add anything to XML. As a result, the overlap between UML, XML, and RELAX NG is almost as big as the overlap between UML and XML, as shown in Figure 14-5.

Figure 14-5. Overlaps between UML and RELAX NG

Overlaps between UML and RELAX NG

Designed with a UML editor such as ArgoUML, our library can be pictured as the model shown in Figure 14-6.

Figure 14-6. A UML model for the library

A UML model for the library

This example uses conventions that may look natural but are far from being official. For instance, I have prefixed attribute names with @, an idea borrowed from Will Provost's work on XML.com. Also, to model the title element with its text node and attribute, I have used the name rng:data to identify its text content as a UML attribute.

ArgoUML saves its documents using the XML Metadata Interchange (XMI) format defined by the Object Management Group (OMG). (You can find more information about XMI at http://www.omg.org/technology/xml/.) XMI is verbose; the XMI document generated by ArgoUML for this diagram is more than 800 lines long. I won't include it here, but it's not difficult to generate a schema from this document with unordered content models, such as:

 <?xml version="1.0" encoding="utf-8"?>
 <grammar xmlns="http://relaxng.org/ns/structure/1.0"
          datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
   <start>
     <element name="library">
       <interleave>
         <zeroOrMore>
           <element name="book">
             <interleave>
               <element name="isbn">
                 <data type="token"/>
               </element>
               <attribute name="id">
                 <data type="token"/>
               </attribute>
               <attribute name="available">
                 <data type="boolean"/>
               </attribute>
               <zeroOrMore>
                 <element name="author">
                   <interleave>
                     <attribute name="id">
                       <data type="token"/>
                     </attribute>
                     <element name="name">
                       <data type="token"/>
                     </element>
                     <element name="born">
                       <data type="date"/>
                     </element>
                     <element name="died">
                       <data type="date"/>
                     </element>
                   </interleave>
                 </element>
               </zeroOrMore>
               <zeroOrMore>
                 <element name="character">
                   <interleave>
                     <attribute name="id">
                       <data type="token"/>
                     </attribute>
                     <element name="name">
                       <data type="token"/>
                     </element>
                     <element name="born">
                       <data type="date"/>
                     </element>
                     <element name="qualification">
                       <data type="token"/>
                     </element>
                   </interleave>
                 </element>
               </zeroOrMore>
               <element name="title">
                 <attribute name="xml:lang">
                   <data type="language"/>
                 </attribute>
                 <data type="token"/>
               </element>
             </interleave>
           </element>
         </zeroOrMore>
       </interleave>
     </element>
   </start>
 </grammar>

or, after conversion into the compact syntax with Trang:

 start =
   element library {
     element book {
       element isbn { xsd:token }
       & attribute id { xsd:token }
       & attribute available { xsd:boolean }
       & element author {
           attribute id { xsd:token }
           & element name { xsd:token }
           & element born { xsd:date }
           & element died { xsd:date }
         }*
       & element character {
           attribute id { xsd:token }
           & element name { xsd:token }
           & element born { xsd:date }
           & element qualification { xsd:token }
         }*
       & element title {
           attribute xml:lang { xsd:language },
           xsd:token
         }
     }*
   }

The only trouble I've had with RELAX NG itself comes out of one of the few restrictions of RELAX NG, which was mentioned in Chapter 7. Data patterns can't be interleaved. When generating this schema, you must be careful to treat complex-type simple-content models (i.e., elements such as the title element, which accepts attributes and text nodes but no children elements) as an exception. This straight translation is of course impossible with W3C XML schemas, because of the cardinality of the character and author subelements. Containers need to be added to fit the limitations of that language.

Note that I've generated a Russian-doll design; depending on the strategy used in the translation, I can generate other designs as well.


This text is released under the Free Software Foundation GFDL.