ISO/IEC 13250:2000: How Two Syntaxes Can Make One Standard

ISO/IEC JTC 1/SC34 N0235

ISO/IEC JTC 1/SC34

Information Technology --

Document Description and Processing Languages

Title:	Discussion Paper on ISO/IEC 13250:2000 Topic Maps - Defect Report
Source:	ISO/IEC JTC 1 / SC 34 / WG 3
Project:
Project editor:	Michel Biezunski and Steve Newcomb
Status:
Action:	For review and comment.
Date:	25 July 2001
Summary:	This document is distributed for review and discussion at the WG 3 meeting on 11 August 2001 in Montréal, Quebec. The intent is to provide guidance in the development of the full defect report of 13250.
Distribution:	SC34 and Liaisons
Refer to:
Supercedes:
Reply to:	Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mail: mailto:mxm@y12.doe.gov http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Ms. Sara Hafele, ISO/IEC JTC 1/SC 34 Secretariat American National Standards Institute 25 West 43rd Street New York, NY 10036 Tel: +1 212 642 4976 Fax: +1 212 840 2298 E-mail: shafele@ansi.org

ISO/IEC 13250:2000: How Two Syntaxes Can Make One Standard

July 24, 2001

Michel Biezunski and Steven R. Newcomb

Introduction
------------------------------------------

The ISO/IEC 13250:2000 "Topic Maps" International Standard, which seems about to integrate a second interchange syntax, the XTM DTD, does not explain to
what degree, and exactly how, the two syntaxes are functionally equivalent. The standard should explain this.

How to Describe the Semantic Commonalities of the Syntaxes?
------------------------------------------

One might think that there are two ways to formalize the semantic commonalities of the two syntaxes:

(1) Describe a rigorous syntactic transformation
      process that will show how instances of one
      syntax can be transformed into instances of the
      other syntax, or

(2) Describe how instances of each syntax can be
      transformed into instances of the common
      underlying model (which could be, but need not
      be, a syntactic model), and describe how
      instances of the underlying model can be
      transformed into instances of each syntax.

The first approach might seem easier, at least superficially. However, if we select this solution, we are focusing on just two syntaxes, instead of recognizing the fact that information that has the character of topic map information may be expressed in
many different notations. It is highly desirable to be able to federate all kinds of "finding information", not just the finding information that happens to be expressed in one of only two syntaxes. For example, it would be inappropriate to exclude instances of RDF or
NewsML from the possibility of being understood as interchangeable topic map documents, with their information becoming directly available to topic map
application software. If we adopt the first approach, RDF and NewsML instances would be only indirectly available, by means of some sort of syntactic transformation into the form of a syntactic topic map, which would then, in turn, be parsable as a topic map
and made available to topic map applications. The extra overhead and inconvenience of this transformation would be a barrier for RDF and NewsML instances.

Unlike the first approach, the second approach will be applicable to any number of notations, although the ISO 13250 standard would only actually apply the approach
to the two syntaxes. The second approach is more ambitious in the sense that it requires that the underlying foundational model be made explicit, and it will make topic map applications far more ubiquitous and omnivorous over the long term.

The Difference Between Topic Map Syntax and Topic Map Information
------------------------------------------

The structure of the topic maps that are represented for interchange in either the existing HyTime-based syntax of 13250, or in the newly-contributed XTM syntax, is *not* identical to the syntactic structures of the documents used to interchange them. Therefore, neither 13250-based nor XTM-based topic map documents are "ready-to-use" by application-specific logic. In other words, a syntactically represented topic map doesn't reflect exactly what a topic map software application would be expected to understand from it. Before a topic map software application can be expected
to perform its application-specific functions, generic processing -- processing that must be performed in order to understand the topic map that an interchangeable instance of that topic map is designed to represent -- to make the topic map "ready-to-use".

From an economic standpoint, there are significant advantages in using a distinct software module that implements this generic processing, commonly called a
"topic map engine" or a "topic map parser". We urge that the term "topic map parsing" be reserved to mean all of the aspects of "topic map processing" that are required to be done by all topic map software that takes, as input, interchangeable topic maps that
conform to either the HyTime-based or XTM-based syntaxes. We urge that the term "topic map processing" be used generically, so that it can be used to refer to any kind of processing, including both topic map parsing (as just defined) and application-specific
processing of ready-to-use topic maps.

Four rules must be applied by all topic map parsers:

-- the subject-based merging rule
-- the name-based merging rule
-- the node-demander rule
-- the no-redundancy rule

These rules are already implicit in 13250. We propose that 13250 should emphasize their definitions and to explain their ramifications. These explanations will be invaluable to users of the standard who need to create conventions for the understanding of instances
of various (both ISO and non-ISO) notations as sources of topic map information.

We urge that 13250 should fully explain and constrain the topic maps parsing process, but only to the extent of describing the rules and goals of the parsing process, and how these rules and goals are to be applied in the case of each of the two syntaxes. For
the Topic Maps software industry, this is the least-constraining approach that is consistent with 13250's goal of facilitating universal and accurate understanding of Topic Maps information. This approach allows software vendors to compete on the grounds of
product differentiation, without unduly increasing the cost of merging disparate topic maps emanating from multiple, differently-specialized software applications.

Two Underlying Models Have Been Proposed
------------------------------------------

Two different underlying models, both expressed in terms of how XTM instances should be understood by topic map parsers, have been contributed to the discussion. Both deserve serious attention.

- An "XML Infoset"-like model, called "A Topic Map
   Data Model", has been proposed by Lars Marius
   Garshol.

- A "Processing Model for XTM 1.0" has been proposed
   by Michel Biezunski and Steven R. Newcomb.

The two proposals do not necessarily contradict each other, and the advantages and drawbacks of each of them should be studied.

The underlying model that will be adopted by ISO must clarify how specific applications of Topic Maps can be defined and identified.

The documents that are available for study include:

- Lars Marius Garshol, "A Topic Map Data Model -- An
   infoset-based proposal",
   http://www.ontopia.net/topicmaps/materials/proc-model.html

- Michel Biezunski and Steven R. Newcomb,
   "Topicmaps.net's Processing Model for XTM 1.0,
   version 1.0.1" [now sometimes called "PMTM4"],
   http://www.topicmaps.net/pmtm4.htm

   Other materials offer help in understanding PMTM4:

   - Biezunski/Newcomb, "The Structure of Topic Maps
     Foundations," http://www.topicmaps.net/struct.htm

   - Biezunski/Newcomb, "A Topic Maps Graph in XML,
     http://www.topicmaps.net/simpleTMGraph3.htm and
     http://www.topicmaps.net/simpleTMGraph3.dtd.

   - Biezunski/Newcomb, "An API to a Topic Maps Graphs
     in XML", http://www.topicmaps.net/TMGraphAPI3.htm
     and http://www.topicmaps.net/TMGraphAPI3.dtd

The decisions that will be taken on these issues will influence the work that need to be done to complete the work in progress for a topic map query language as well as the one for a topic map constraint language.

We encourage the members of the ISO working group WG3 to read these documents and to send questions and comments to the newly created mailing list for discussion. (The subscription server is http://www.isotopicmaps.org/mailman/listinfo/sc34wg3 )