RELAX NG by Eric van der Vlist will be published by O'Reilly & Associates (ISBN: 0596004214)
You are welcome to use our annotation system to give your feedback.
A pattern is ambiguous when a fragment of an instance document may be valid against using several alternatives in its choice patterns. RELAX NG allows ambiguous patterns but they can be a problem for annotation and datatype assigment.
Specifying a namespace in include, externalRef or parentRef to give a namespace to grammars or patterns defined without a namespace is known as "chameleon design." This is because the imported grammar or pattern takes on the new namespace like a chameleon takes on the color of the environment in which it is placed.
In a regular expression, a character class is an atom that matches a set of characters. Character classes may be classical Perl character classes, Unicode character classes, or user-defined character classes.
A set of character classes designated by a single letter, for which upper- and lowercases of the same letter are complementary (for instance, "\d" is all the decimal digits, and "\D" is all the characters that are not decimal digits).
A compositor is a pattern which can be used to combine other patterns. RELAX NG has three basic compositors: group, choice, and interleave. A fourth compositor, mixed, is a shortcut for interleave with an embedded text pattern.
A description of the structure of child elements and text nodes (independent of attributes). The content model is "simple" when there is a text node but no elements, "complex" when there are element nodes but no text, "mixed" when there are text and element nodes, and "empty" when there are neither text nor element nodes.
A term used by RELAX NG to qualify the content of a simple content element or attribute. Datatypes should not be confused with XML 1.0 element types--those are called element names by RELAX NG.
A pattern is deterministic if a schema processor can always determine which path through the schema to follow by looking only at the current element under validation. Unlike W3C XML Schema, RELAX NG does not require deterministic patterns.
Document Object Model. An object-oriented model of XML documents, including the definition of the API allowing its manipulation. The third version of DOM (DOM Level 3) will include an API named "Abstract Schemas" to facilitate schema-guided editions of XML documents (see http://www.w3.org/TR/DOM-Level-3-Core).
Document Schema Definition Languages (DSDL) is a project undertaken by ISO (ISO/IEC JTC 1/SC 34/WG 1, to be precise) whose objective is "to create a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology" (see http://dsdl.org).
Document Type Definition. XML 1.0 DTDs are inherited from SGML, in which rules were included that allow the customization of the markup itself and played a very central role. Because of the syntactical rules included in their DTDs, SGML applications need a DTD to be able to read an SGML document. One of the simplifications of XML is to state that a XML parser should be able to read a document without needing a DTD. DTDs have therefore been simplified from their SGML ancestors and remain the first incarnation of what is today called an XML Schema language.
One of the basic type of nodes in the tree represented by an XML document. An element is delimited by start and end tags. In the corresponding tree, an element is a nonterminal node, which may have subnodes of type element, character (text), namespace, and attribute, as well as comment and processing instruction nodes.
Term used in the XML 1.0 Recommendation, which is equivalent to the notion of element names in W3C XML Schema and should not be confused with the simple or complex datatype of an element.
An element that has neither child elements nor text nodes (with or without attributes).
A constraint added to the lexical or value space of a simple datatype of the W3C XML Schema datatype system. The list of facets that can be used depends on the simple datatype. W3C XML Schema's facets can be used as parameters in RELAX NG data patterns.
A grammar is a pattern which acts a container for a start pattern and any number of named patterns.
XML Information Set. A formal description of the information that may be found in a well-formed XML document.
A XML document that is a candidate for being validated by a schema. Any well-formed XML 1.0 document that conforms to the Namespaces in XML 1.0 Recommendation can be considered a valid or invalid instance document.
The set of all representations (after parsing and whitespace processing) allowed for a simple datatype.
The name of a component within its namespace. It is the part of the qualified name that comes after the namespace prefix.
The content of an element that contains both child element and text nodes.
Named patterns are globally defined in a grammar. These patterns may be referenced from anywhere in this grammar or in the child grammars.
A unique identifier that can be associated with a set of XML elements and attributes. This identifier is a URI, which is not required to point to an actual resource but must "belong" to the author of these elements and attributes. Since the full URI can't be included in the name of each element and attribute, a namespace prefix is assigned to the namespace URI by using a namespace declaration. This prefix is added to the local name of the elements and attributes to form a qualified name. Namespaces are optional and elements and attributes may have no namespaces attached.
Any part of a RELAX NG schema that can be matched against a set of attributes and a sequence of elements and strings is a pattern. With the exception of name classes, all parts (including the whole schema) of a RELAX NG schema are patterns.
Regular expressions (or patterns) are composed of pieces. Each piece is itself composed of an atom describing a condition on a substring and an optional quantifier defining the expected number of occurrences of the atom.
The complete name of a component, including the prefix associated with its target namespace if one is defined.
Recursive content models are content models in which elements can be included directly or indirectly within themselves (such as XHTML div or span elements).
Recursive patterns are named patterns which include direct or indirect references to themselves. RELAX NG only allows recursive patterns which describe recursive content models -- those for which the definition of the named pattern is isolated from its reference by an element pattern.
A syntax used to express conditions on strings. The syntax used by the W3C XML Schema for its patterns is very close to the syntax introduced by the Perl programming language. A regular expression is composed of elementary pieces.
A grammar-based XML Schema language developed by Murata Makoto and published in March 2000 as a Japanese ISO Standard (see http://www.xml.gr.jp/relax).
A grammar-based XML Schema language resulting from a merger between RELAX and TREX (see http://relaxng.org).
A schema where the definitions of elements and attributes are embedded one inside the other without using named patterns.
Simple API for XML. A streaming, event-based API used between parsers and applications. Its streaming nature means that pipelines of XML processing may be created using SAX (see http://www.saxproject.org).
A rule-based XML Schema language, developed by Rick Jelliffe, using XPath expressions to describe validation rules (see http://www.ascc.net/xml/resource/schematron/schematron.html).
Standard Generalized Markup Language. Created in 1980, the ancestor of XML. XML was designed as a simplified subset of SGML to be used on the Web.
An element has a simple content model when it has a child text node only (and no subelements). A simple content element has a simple type if it has no attributes, and it has a complex type if it has any attributes.
The action of simplifying and normalizing a RELAX NG schema to remove the syntactical variations and use only a few basic patterns and name classes.
A character that may be used as an atom after a "\" to accept a specific character, either for convenience or because this character is interpreted differently in the context of a regular expression.
When a grammar is used to validate an instance document, its start pattern is matched against the root element of the instance document. When a grammar is embedded in another grammar, the embedded grammar is replaced by its start pattern during the simplication of the schema.
A grammar-based XML Schema language developed by James Clark (see http://www.thaiopensource.com/trex).
A pattern is unambiguous when any fragment of an instance document which is valid per this pattern is valid for precisely one of the choices in the schema. RELAX NG does not require the use of unambiguous patterns but they can be considered good practice for annotation and datatype assigment, especially when conversion from RELAX NG to another schema language is necessary.
A set of characters classified by their "localization" (Latin, Arabic, Hebrew, Tibetan, and even Gothic or musical symbols).
A set of characters classified by their usage (letters, uppercase, digit, punctuation, etc.).
A set of character classes based on the Unicode blocks and categories.
Uniform Resource Identifier. Defined by RFCs 2396 and 2732. URIs were created to extend the notion of URLs (Uniform Resource Locators) to include abstract identifiers that do not necessarily need to "locate" a resource.
Uniform Resource Locator, a common identifier used on the Web. URLs are absolute when the full path to the resource is indicated, and relative when a partial path is given that needs to be evaluated in relation with a base URL.
An XML document that is well-formed and conforms to a schema (RELAX NG, DTD, W3C XML Schema, etc.) of some kind.
The set of all the possible values for a simple datatype, independent of their actual representation in the instance documents.
World Wide Web Consortium. Originally created to settle HTML and HTTP as de facto standards. The main specification body for the core specifications of the World Wide Web and the keeper of the core XML specifications (see http://www.w3.org).
An XML document that meets the conditions defined in the XML 1.0 Recommendation: it must be readable without ambiguity. Syntax errors will be detected by an XML parser even without schema of any type.
Characters #x9 (tab), #xA (linefeed), #xD (carriage return), and #x20 (space). These are often used to indent the XML documents to make them more readable, and are filtered by an operation named "whitespace processing."
A W3C specification defining a general purpose inclusion mechanism for XML documents (see http://www.w3.org/TR/xinclude).
Extensible Markup Language. A subset of SGML created to be used on the Web. Its core specification (XML 1.0) was published by the W3C in February 1998. New XML specifications have been added since this date, and the W3C considers that, with the addition of W3C XML Schema, the core specifications are now complete.
A query language used to identify a set of nodes within an XML document. Originally defined to be used with XSLT, it is also used by other specifications such as Schematron, XPointer, W3C XML Schema or XForms (see http://www.w3.org/TR/xpath).
Extensible Stylesheet Language Transformations. A programming language specialized for the transformation of XML documents (see http://www.w3.org/TR/xslt).
You are welcome to use our annotation system to give your feedback.
[Annotations for this page]
All text is copyright Eric van der Vlist, Dyomedea. During development, I give permission for non-commercial copying for educational and review purposes. After publication, all text will be released under the Free Software Foundation GFDL.