by Eric van der Vlist is published by O'Reilly & Associates (ISBN: 0596004214)
Specifying a namespace in include, externalRef, or parentRef to give a namespace to grammars or patterns defined without a namespace is known as chameleon design, because the imported grammar or pattern takes on the new namespace like a chameleon takes on the color of the environment in which it is placed.
In a regular expression, a character class is an atom that matches a set of characters. Character classes may be classical Perl character classes, Unicode character classes, or user-defined character classes.
A set of character classes designated by a single letter, for which upper- and lowercases of the same letter are complementary (for instance, \d is all the decimal digits, and \D is all the characters that aren't decimal digits).
A compositor is a pattern that can be used to combine other patterns. RELAX NG has three basic compositors: group, choice, and interleave. A fourth compositor, mixed, is a shortcut for interleave with an embedded text pattern.
A description of the structure of child elements and text nodes (independent of attributes). The content model is simple when there is a text node but no elements, complex when there are element nodes but no text, mixed when there are text and element nodes, and empty when there are neither text nor element nodes.
A term used by RELAX NG to qualify the content of a simple content element or attribute. Datatypes shouldn't be confused with XML 1.0 element types; those are called element names by RELAX NG.
A pattern is deterministic if a schema processor can always determine which path through the schema to follow by looking only at the current element under validation. Unlike W3C XML Schema, RELAX NG doesn't require deterministic patterns.
Document Object Model. An object oriented model of XML documents, including the definition of the API allowing its manipulation. The third version of DOM (DOM Level 3) includes an API called Abstract Schemas, which facilitate schema-guided editions of XML documents; also see http://www.w3.org/TR/DOM-Level-3-Core).
Document Schema Definition Languages (DSDL) is a project undertaken by ISO (ISO/IEC JTC 1/SC 34/WG 1, to be precise) whose objective is "to create a framework within which multiple validation tasks of different types can be applied to an XML document to achieve more complete validation results than just the application of a single technology"; see http://dsdl.org.
Document Type Definition. XML 1.0 DTDs are inherited from SGML, in which rules were included that allow the customization of the markup itself and played a very central role. Because of the syntactical rules included in their DTDs, SGML applications need a DTD to read an SGML document. One of the simplifications of XML is to state that a XML parser should be able to read a document without needing a DTD. DTDs have therefore been simplified from their SGML ancestors and remain the first incarnation of what is today called an XML Schema Language.
One of the basic type of nodes in the tree represented by an XML document. An element is delimited by start and end tags. In the corresponding tree, an element is a nonterminal node, which may have subnodes of type element, character (text), namespace, and attribute, as well as comment and processing instruction nodes.
Term used in the XML 1.0 Recommendation, which is equivalent to the notion of element names in W3C XML Schema and shouldn't be confused with the simple or complex datatype of an element.
An element that has neither child elements nor text nodes (with or without attributes).
XML Information Set. A formal description of the information that may be found in a well-formed XML document.
A XML document that is a candidate for being validated by a schema. Any well-formed XML 1.0 document that conforms to the Namespaces in XML 1.0 Recommendation can be considered a valid or invalid instance document.
Named patterns are globally defined in a grammar. These patterns may be referenced from anywhere in this grammar or in the child grammars.
A unique identifier that can be associated with a set of XML elements and attributes. This identifier is a URI that isn't required to point to an actual resource but must "belong" to the author of these elements and attributes. Because the full URI can't be included in the name of each element and attribute, a namespace prefix is assigned to the namespace URI using a namespace declaration. This prefix is added to the local name of the elements and attributes to form a qualified name. Namespaces are optional, and elements and attributes may have no namespaces attached.
Any part of a RELAX NG schema that can be matched against a set of attributes and a sequence of elements and strings is a pattern. With the exception of name classes, all parts (including the whole schema) of a RELAX NG schema are patterns.
Regular expressions (or patterns) are composed of pieces. Each piece is itself composed of an atom describing a condition on a substring and an optional quantifier defining the expected number of occurrences of the atom.
Recursive content models are content models in which elements can be included directly or indirectly within themselves (such as XHTML div or span elements).
Recursive patterns are named patterns that include direct or indirect references to themselves. RELAX NG allows only recursive patterns that describe recursive content models—those for which the definition of the named pattern is isolated from its reference by an element pattern.
A syntax that expresses conditions on strings. The syntax used by the W3C XML Schema for its patterns is very close to the syntax introduced by the Perl programming language. A regular expression is composed of elementary pieces.
A grammar-based XML Schema language developed by Murata Makoto and published in March 2000 as a Japanese ISO Standard; see http://www.xml.gr.jp/relax.
A grammar-based XML Schema language resulting from a merger between RELAX and TREX; see http://relaxng.org.
A schema in which the definitions of elements and attributes are embedded one inside the other without using named patterns.
Simple API for XML. A streaming, event-based API used between parsers and applications. Its streaming nature means that pipelines of XML processing may be created using SAX; see http://www.saxproject.org.
A rule-based XML Schema language, developed by Rick Jelliffe, using XPath expressions to describe validation rule; see http://www.ascc.net/xml/resource/schematron/schematron.html.
Standard Generalized Markup Language, the ancestor of XML. XML was designed as a simplified subset of SGML to be used on the Web.
An element has a simple-content model when it has a child text node only (and no subelements). A simple content element has a simple type if it has no attributes; it has a complex type if it has any attributes.
The process of simplifying and normalizing a RELAX NG schema to remove the syntactical variations and use only a few basic patterns and name classes.
A character that may be used as an atom after a slash ( \) to accept a specific character, either for convenience or because this character is interpreted differently in the context of a regular expression.
When a grammar validates an instance document, its start pattern is matched against the root element of the instance document. When a grammar is embedded in another grammar, the embedded grammar is replaced by its start pattern during the implementation of the schema.
A pattern is unambiguous when any fragment of an instance document that is valid per this pattern is valid for precisely one of the choices in the schema. RELAX NG doesn't require the use of unambiguous patterns, but they can be considered good practice for annotation and datatype assignment, especially when conversion from RELAX NG to another schema language is necessary.
A set of characters classified by their localization (Latin, Arabic, Hebrew, Tibetan, and even Gothic or musical symbols).
A set of characters classified by their usage (letters, uppercase, digit, punctuation, etc.).
A set of character classes based on the Unicode blocks and categories.
Uniform Resource Identifier. Defined by RFCs 2396 and 2732. URIs were created to extend the notion of URLs (Uniform Resource Locators) to include abstract identifiers that don't necessarily need to locate a resource.
Uniform Resource Locator, a common identifier used on the Web. URLs are absolute when the full path to the resource is indicated, and relative when a partial path is given that needs to be evaluated in relation with a base URL.
World Wide Web Consortium. Originally created to settle HTML and HTTP as de facto standards. The main specification body for the core specifications of the World Wide Web and the keeper of the core XML specifications; see http://www.w3.org.
An XML document that meets the conditions defined in the XML 1.0 Recommendation: it must be readable without ambiguity. Syntax errors are detected by an XML parser even without schema of any type.
Characters #x9 (tab), #xA (linefeed), #xD (carriage return), and #x20 (space). These are often used to indent the XML documents to make them more readable, and are filtered by an operation called whitespace processing.
A W3C specification defining a general purpose inclusion mechanism for XML documents; see http://www.w3.org/TR/xinclude.
Extensible Markup Language. A subset of SGML created to be used on the Web. Its core specification (XML 1.0) was published by the W3C in February 1998. New XML specifications have been added since this date, and the W3C considers that, with the addition of W3C XML Schema, the core specifications are now complete.
A query language that identifies a set of nodes within an XML document. Originally defined to be used with XSLT, it's also used by other specifications such as Schematron, XPointer, W3C XML Schema or XForms; see http://www.w3.org/TR/xpath.
Extensible Stylesheet Language Transformations. A programming language specialized for the transformation of XML documents; for more information, see http://www.w3.org/TR/xslt.
This text is released under the Free Software Foundation GFDL.