ISO/IEC JTC 1/SC34 N0275

ISO/IEC JTC 1/SC34

Information Technology --

Document Description and Processing Languages

Title: CD 19757-0 - DSDL Part 0 - Overview
Source: G. Ken Holman
Project: DSDL
Project editor: G. Ken Holman
Status: Draft for comment
Action:
Date: 11 December 2001
Summary:
Distribution: SC34 and Liaisons
Refer to:
Supercedes:
Reply to: Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mailk: mailto:mxm@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
25 West 43rd Street
New York, NY 10036
Tel: +1 212 642-4937
Fax: +1 212 840-2298
E-mail: shafele@ansi.org

ISO/IEC JTC 1/SC 34 N-0275 - CD 19757-0 - DSDL Part 0 - Overview

Author: G. Ken Holman
Date: $Date: 2001/12/11 18:46:03 $(UTC)

Copyright (C) 2001 SC34

Table of Contents

1 Document Schema Definition Language (DSDL)
2 Composition of the IS
2.1 Part 1 - Framework
2.2 Part 2 - Grammar-oriented schema languages
2.3 Part 3 - Primitive data type semantics
2.4 Part 4 - Path-based integrity constraints
2.5 Part 5 - Object-oriented schema languages
2.6 Part 6 - Information item manipulation
2.7 Part 7 - Namespace-aware processing with DTD syntax

1: Document Schema Definition Language (DSDL)

Document Schema Definition Language (DSDL) is a multipart International Standard defining a modular set of specifications for describing the document structures, data types, and data relationships in structured information resources. Two kinds of integrated specifications are included: specifications for describing aspects of validity of a document, and rules for combining and packaging a collection of processes applicable to the task of validating a document. This integration makes DSDL applicable to both business and publishing applications of structured information resources. This applicability reflects the expansion of Extensible Markup Language (XML) applications beyond the publishing environment in which XML and its foundation - the Standard Generalized Markup Language (SGML) - were first developed.

2: Composition of the IS

In addition to this overview document being Part 0, the following parts comprise this International Standard.

2.1: Part 1 - Framework

The information items in structured information resources are not always ordered as a simple hierarchy. The abstraction representing all of the relationships within the information items is often a graph where there exist some links relating items in ways that are non-hierarchical. Expressing such a graph of items in a tree structure breaks those links that cannot be expressed by the grammar of the hierarchical representation. Reconstituting the structured information resource restores the cut links to make the resource intact.

Validating the hierarchical representation of the graph involves checking the tree structure against the grammar of the structure with cut links, as well as checking the links that were cut. This IS describes the task of validating all of the relationships in a document using a pipeline of consecutively applied composite processes of validation and simple transformation. This IS includes a standardized set of basic composite processes and an extensible mechanism of referencing other possible composite processes.

An example of a pipeline of three consecutively applied processes could include validating the structural hierarchy of an information resource, followed by ascribing default values for absent information items in the structure, followed by validating the non-hierarchical links between the resulting information items. This pipeline would produce different results than first validating non-hierarchical links between information items in the resource, followed by supplying default values for absent items, followed by validating the structural hierarchy of the result of the manipulation.

The DSDL framework includes:

Portions of this Part will be initially based on RELAX Namespaces.

2.2: Part 2 - Grammar-oriented schema languages

Grammar-oriented schema languages validate the structure of information items in an instance conforms to a set of constraints described by a tree grammar. This includes constraining the text in the tree found at the terminal symbols in the grammar to data types and parameters described in Part 3 of this IS.

This Part includes a syntax for specifying and identifying:

This Part is initially based on RELAX NG.

2.3: Part 3 - Primitive data type semantics

Terminal symbols of text in the hierarchical tree may represent values of a data type.

This Part defines:

This Part is initially based on a subset of primitive data types and their facets from Part 2 of W3C XML Schema.

2.4: Part 4 - Path-based integrity constraints

The non-hierarchical links between information items in a structured resource can be reconstituted by addressing the items and expressing the relationship between them found in the original graph of information. The addressing mechanism includes hierarchy-based paths of steps along the tree to the information item being addressed.

This Part defines:

This Part is initially based on Schematron.

2.5: Part 5 - Object-oriented schema languages

Object-oriented schema languages validate the structure of information items in an instance conforms to a set of constraints described using inheritance. These constraints can be useful when using XML in conjunction with object-oriented concepts used widely in modern programming languages (e.g. Java) and modern modeling languages (e.g. UML).

This Part is initially based on Part 1 of W3C XML Schema and the sections of Part 2 of W3C XML Schema describing the derivation of new simple types and describing the syntax for referring to primitive data types.

2.6: Part 6 - Information item manipulation

Structured information resources may need to be augmented, reduced, or have information items otherwise manipulated as part of the validation process. XML Document Type Definitions (DTDs) and HyTime include methods of defaulting attributes and information item renaming that characterize the changes that are sometimes necessary.

These highly-limited micro-transformations can:

This Part will be declarative in nature and will not attempt to provide totally general purpose transformation requirements.

2.7: Part 7 - Namespace-aware processing with DTD syntax

Ed. Note: could this be titled "Syntax-oriented schema language" or "Legacy-oriented schema language?

Existing structural constraints on and defaulted values for information items in a structured resource may already be described using XML Document Type Definition (DTD) syntax. These constraints could be interpreted accommodating namespaces. These constraints need not be directly coupled to the XML document through a document type declaration.

This Part will address:


ISO/IEC JTC 1/SC 34 N-0275 - CD 19757-0 - DSDL Part 0 - Overview
G. Ken Holman
Copyright (C) 2001 SC34
$Date: 2001/12/11 18:46:03 $(UTC)