Title: | Requirements for the Canonical XML Topic Maps
Specification |
Source: | Khalil Ahmed, JTC1/SC34 |
Project: | JTC 1 NP Number – ISO/IEC 13250-4 |
Project editor: | Steve Pepper |
Status: | Author's draft |
Action: | For Review and Comment |
Date: | 2003/07/28 |
Summary: | |
Distribution: | National Bodies and Liaisons of SC34 |
Refer to: | |
Supersedes: | |
Version: |
The development of a canonicalisation for the topic map data model was first proposed for the roadmap for the ISO 13250 family of standards in [N278]. The first draft proposal for Canonical XML Topic Maps [N0395] was developed by Steve Pepper.
Canonicalisation is the process by which a data model is reduced to a serialised form such that two logically equivalent data model instances result in an identical byte-by-byte serialization.
Canonical XML Topic Maps (CXTM) will define a means to express a topic map processed according to the processing rules defined in ISO 13250-2: Topic Maps -- Data Model [DM] in a canonical form. The canonicalisation will be based on the model defined by ISO 13250-2 and henceforth referred to as the Topic Maps Data Model. Such a canonical form will enable the instance of the Topic Maps Data Model constructed by one topic map processor to be directly compared to that constructed by another topic map processor.
This document will define the requirements for the CXTM specification and is intended as a guide to the principles which should be applied in the development of the specification.
CXTM is to be developed by ISO/JTC SC34 WG3 and this requirements document is
for members of the committee and implementors who have expressed an interest in
the development of a canonical expression of the Topic Maps Data Model.
The keywords "MUST," "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMEND", "MAY", and "OPTIONAL" will be used in this document as defined in [RFC 2119].
Pre-existing algorithms for canonicalisation of data structures and especially of graph structures may contribute to the definition of CXTM. CXTM will be based on the next version of ISO 13250 as defined by [N323] and will be defined in terms of a canonical representation of the Topic Maps Data Model as defined by [DM].
The W3C recommendation on Canonical XML [CXML] may contribute to the definition of CXTM.
CXTM shall be specified in terms of the data model defined in ISO 13250-2 and will not be specified on any serialization format for topic maps. This will automatically allow it to support any syntax from which valid instances of the Topic Maps Data Model may be constructed including, but not limited to the XML syntax defined by ISO 13250-3: Topic Maps -- XML Syntax and the HyTime-based syntax defined by ISO 13250-4: Topic Maps -- HyTime Syntax.
For each type of information item defined by the Topic Maps Data Model, CXTM must define an algorithm for expressing the identity of an information item of that type. The information item identity must be unique to that information item in a given instance of the Topic Maps Data Model.
3.2.3.1 Information Item Type Sort Order
CXTM must define a canonical ordering of the types of information items defined
by the Topic Maps Data Model. If information item type A is sorted higher than
information item type B, then all information items of type A will be treated as
sorting higher than any information item of type B.
3.2.3.2 Instance Sort Order
For each information item type defined by the Topic Maps Data Model, CXTM must
define a comparison algorithm which enables two information items of the same
type from the same instance of the Topic Maps Data Model to be compared and
their relative positions in a canonical sorting order established. The canonical
sorting order must be consistent such that if A > B and B > C then A >
C and such that for items A1 and B1 the ordering must be the same as items A2
and B2 when A1 == A2 and B1 == B2. Additionally, no two different information
items in a given instance of the Topic Maps Data Model may be defined to be
equal under the canonical sort order.
The instance sort order algorithms must produce the same results on all machines, regardless of locale, operating system, programming language and input format.
The instance sort order must not be sensitive to changes in the URI from which the input file(s) is/are loaded.
The Topic Maps Data Model instance comparison algorithm will enable two model instances to be compared and be declared to be either canonically equivalent or not.
Editor's Note: This section will be removed. It remains at the moment to preserve section numbering for comments.
An XML representation of the canonicalised model will allow the canonicalised output of topic map processors to be written to a file to support the development of test suites. Further, it may be possible to use XML canonicalisation algorithms to directly compare XML representations of canonicalised instances of the Topic Maps Data Model.
It has not yet been proven that a deterministic canonicalisation algorithm for all topic maps can be found. While the committee will strive to develop a completely deterministic algorithm, if it is the case that the final algorithm is not capable of dealing with some specific topic map model instances then the specification will define the rules for identifying such models so that they can be avoided by the users of this standard.
Some topic map processing may require a processor to indicate one or more error conditions. In such a case, it may not be possible for an instance of the Topic Maps Data Model to be constructed by the processor. In such cases, there will be no defined canonicalisation.
It is recommended that a test suite be developed, but it will not be within the scope of this ISO project of work. However, we encourage other standards bodies such as OASIS to consider it.
If a suite were developed, it could possibly consist of a set of topic maps in a variety of syntaxes that map to the Topic Maps Data Model and XML representations of the CXTM instance expected after processing those topic maps. In addition the test suite may define combinations of topic maps which may be merged and the CXTM instance expected after merge processing. The test suite may also define topic maps which are either invalid against their syntax schema or which are expected to cause one or more errors to be raised during the processing of the topic map.
The editor would like to thank the following individuals for their contributions of review comments and additional requirements to this document.
Lars Marius Garshol, Steve Pepper