ISO/IEC JTC 1/SC34 N439

ISO/IEC JTC 1/SC34

Information Technology --

Document Description and Processing Languages

TITLE: Report of the SC34 WG3 Meeting, Montreal, 1–4 August 2003
SOURCE: Patrick Durusau, acting convenor
PROJECT: All WG3 projects
PROJECT EDITOR: All WG3 editors
STATUS: Working Group Report
ACTION: For information
DATE: 7 May 2003
DISTRIBUTION: SC34 and Liaisons
REFER TO: Documents in internal references
REPLY TO: Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Bldg. 9113, M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-18964
Network: masonjd@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/
ftp://ftp.y12.doe.gov/pub/sgml/sc34/

Mrs. Sara Desautels, ISO/IEC JTC 1/SC 34 Secretariat
American National Standards Institute
25 West 43rd St - 4th floor
New York, NY 10036
Tel: +1 212 642 4937
Fax: +1 212 840 2298
Email: sdesaute@ansi.org

ISO/IEC JTC 1/SC34/WG3 Meeting Report (Montreal, 1-4 August 2003)

ISO/IEC JTC 1/SC34/WG3 meet for four very productive days in Montreal, Canada and tenders the following report of its activities:

Friday, 1 August 2003

Attendees

Agenda for Friday, 1 August 2003

WG3 Timetable of Work

A timetable for pending work was agreed upon by the WG and is set forth below.

Document Responsible Deadline
TMQL requirements 1.1 Garshol 2003-08-11
TMCL requirements & use cases Moore, Nishikawa 2003-08-11
CXTM requirements, final draft Ahmed 2003-08-18
ISO 13250-2 (DM), review draft Garshol, Moore 2003-09-01
ISO 13250-3 (XTM), committee draft Garshol, Moore 2003-09-15
ISO 13250-2 (DM), committee draft Garshol, Moore 2003-10-01
TMQL use cases, first draft Barta 2003-10-01
CXTM, editor's draft Ahmed 2003-10-15
RM requirements, next version Durusau 2003-10-15
ISO 13250, intro annex, editor's draft Pepper, Naito-san 2003-11-01
Query language survey Garshol, Durusau 2003-11-01
RM, editor's draft Newcomb

Detailed notes on work schedule

ISO 13250-2 and ISO 13250-3Work to do: apply agreed issue resolutions from London and Montréal, rewrite to ISO style. Will be sent out for a brief review period (a few days), then sent to ISO as a committee draft for ballot.

TMQL documentsAll three should be ready before the Philadelphia meeting, so that they can be used as foundcation for decisions about TMQL at that meeting, with a goal to having a real TMQL proposal Q1 2004. Survey should list query languages, grouped by evaluation model, and list main features supported, plus references to specs etc.

WG3 Input to Editors of ISO 13250-1

Editors need to write a first draft to be reviewed by the NBs, who should have comments ready for the Philadelphia meeting.

If this is only going to be an introduction to topic maps that does not actually standardize anything it cannot be a part of the standard, though it can be an annex. We agree that this is worthwhile to have in the standard, and so think it should become an annex rather than a full part.

(Added after the WB meeting and does not constitute part of the WG activities) Note: After this decision by the WG, Jim Mason confirmed that introductory material should appear in an annex and advised that ISO 8879 should be followed as a guide on this issue. ISO/IEC Directives Part 2: Rules for the structure and drafting of International Standards, Section 6.4.1.1 in particular appears applicable to some of the content of the proposed part 1. It is suggested that the editors of Part 1 may wish to consider what material from the currently proposed part 1 meets ISO requirements to appear in part 1 and what should appear in an annex prior to the WG3 meeting in December and revise their proposal to WG3. That could assist WG3 in resolving this issue at the meeting in December without further delay. (Added by Patrick Durusau, acting convenor for WG3 at this meeting.)

Proposed content:

CXTM

Process

  1. Preprocess data model
  2. Sort information item instances
  3. Serialise information item instances and their properties

Preprocess data model

Sort information item instances

Assign identifiers to information items

Serialise information item instances and their properties

Issues

Note: In all of the following where there are numbered steps it is sufficient to proceed only as far as the first step which results in a non-equal comparison. If all the steps are exhausted, the compared items are considered equal under the comparison algorithm.

Handling of NULL value

Ordering of Information Item Types and Basic Types

Comparison Algorithm For Strings

Comparison Algorithm For Collections

  1. Sort by the collection size. Collections sort in order of size
  2. Sort each collection. Starting with the lowest item in each collection, sort by pairwise comparison of items in each collection until a non-equal comparison is found. Collections sort in....

Comparison Order For Locator Items

  1. [address] property compared as Strings
  2. [notation] property compared as Strings

Canonical Sort Order For Topic Items # SAM issue topic-identity-required.

  1. [source locators] property compare as a Collection of Locator items
  2. [subject indicators] property compared as a Collection of Locator items
  3. [subject addresses] property compared as a Collection of Locator items.

Canonical Sort Order For TopicName Information Items # SAM issue items-parent-required

  1. Sort by [value] as String
  2. Sort by [type] as Topic
  3. Sort by [scope] as Collection of Topic items
  4. Sort by {parent topic} as Topic
  5. #Sort by [variants] as Collection of VariantName items

Canonical Sort Order For VariantName Information Items

  1. Sort by [value] as String
  2. Sort by [resource] as Locator
  3. Sort by [scope] as Collection of Topic items
  4. Sort by {parent topic name} as TopicName

Canonical Sort Order For Occurrence Information Items # SAM issue prop-parent

 

  1. Sort by [value] as String
  2. Sort by [resource] as Locator
  3. Sort by [type] as Topic
  4. Sort by [scope] as Collection of Topic items
  5. Sort by {parent topic} as Topic

Canonical Sort Order For Association Information Items

  1. Sort by [type] as Topic
  2. Sort by [roles] as Collection of AssociationRole items
  3. Sort by [scope] as Collection of Topic items

Canonical Sort Order For AssociationRole Information Items

  1. Sort by [role playing topic] as Topic
  2. Sort by [type] as Topic
  3. Sort by [scope] as Collection of Topic items
  4. Sort by {parent association} as Association

Saturday, 2 August 2003

Attendees

Agenda for Saturday, 2 August 2003

Reference Model

NOTE: Need examples that illustrate the different requirements.

DOCUMENTATION

  1. Establishes requirements for documenting for human readers the ontological commitments made in constructing an application-specific information model.
  2. Facilitates merging in a multi-world-view scenario by establishing requirements for documenting for human readers
    1. the basis for deciding whether or not two proxies represent the same subject, and
    2. the consequence of discovering that two proxies represent the same subject.
  3. Establishes requirements for documenting for human readers the mapping of the interchange syntax(es), if any, to subject proxies governed by application-specific information models.
  4. Establishes terminology for describing application-specific information models and their merging rules, such that different models can be compared and contrasted.
  5. Defines a MODEL which
    1. makes a minimum of ontological commitments,
  6. preserves integrity across the process of merging (by putting a floor on reification), and describes the information being represented.
This MODEL must not have an interchange syntax, a schema language, a query language, or an API.

HyTM

All reporting national bodies indicated there were no users that would be impacted by dropping HyTM from ISO 13250. Unfortunately, a report was not available from Canada so no decision is recommended at this time.


WG3 Meeting at XML 2003

The meeting of WG3 for XML 2003 was discussed and it was concluded that it would take three (3) days to cover all the outstanding materials. To efficiently cover all the materials in the time allowed will require advance discussion and preparation by all participants. The estimated time for each segment and issues to be covered are as follows:


XTM

xtm-href-whitespace

Whitespace is forbidden in URIs unless it is escaped.

xtm-mergemap-and-topicref

When encountering a mergeMap only load the external topic map if that topic map has not been loaded before with the same set of added themes. For each XTM document, topicRef elements pointing to external documents for which there is no mergeMap reference in the same XTM document are considered to imply a mergeMap reference to that external document with no added themes.

xtm-same-doc-refs

XTM should follow RFC 2396.

xtm-member-id

The id of a member element only becomes a source locator if that member element only has a single player. If it has more than one player the id is ignored.

xtm-subjectidentity-children

Keep the content model as it is.

xtm-topicref-notatopic

This is not an error, instead it is treated as a reference to a non-existent topic.

xtm-resourcedata-markup

Arbitrary XML markup is allowed inside the resourceData and baseNameString elements, and it must be preserved by XTM processors.

xtm-namespace-support

XTM processors are required to do namespace processing.

xtm-fixed-attributes

The #FIXED attributes will be declared as optional, but with specific required values, if given.

xtm-normative-schema

The RELAX-NG schema is the normative schema.

xtm-href-xpointer

Full XPointer references are disallowed on all elements.

xtm-unknown-elements

We don't ignore unknown elements.

xtm-namespace-uri

XTM 1.1 uses the same namespace URI as XTM 1.0. Note that this is only acceptable so long as XTM 1.1 is backwards compatible with version 1.0.

xtm-topicmap-version

A version attribute should be added to the topicMap element. If an XTM processor finds an XTM document in a version it does not support that is an error.

XML Strategy

Data Model

prop-parent

Add the parent property as a computed property.

topic-identity-requiredA value is required for at least one of these properties.

item-parent-required

Items are required to be reachable from the topic map item.

prop-subject-addresses

The name should be 'subject locator'.

XML-Data-Representation: decided to write up two different approaches and to be reviewed by the committee


Scope Montreal 2003 Proposal

The following is drawn from slides prepared and reviewed by the WG3 group in Montreal.

This proposal was developed by Kal Ahmed, Ann Wrightson, Dmitry Bogachev in a sub-group meeting

Kal Ahmed also prepared a report on the issue of structured scope for the topic maps data model. That report is attached hereto as an appendix for the convenience of the reader.


Sunday, 3 August 2003

Attendees

Agenda for Sunday, 3 August 2003

TMQL

Requirements document edited and issued by wg

Feedback on the TMQL use cases

Sections 1-3

Sections 1, 2, and 3 were not reviewed at the meeting, being considered more appropriate for individual reading and commenting.

Section 4

Section 4 was thought to be too general. Preferrably it should be broken up into three use cases like the one in section 5, and each use case presented complete with ontology and example queries.

Section 4.1 was considered a good use case that offered something different from the other three.

Section 4.2 could perhaps be changed to "Visualization of arbitrary topic maps," and then be extended with specific queries stolen from TMNav and the Omnigator. Lars Marius Garshol volunteered to supply Omnigator queries.

Section 4.3 was considered too specialized, and should probably be changed to a different content generation use case. The Italian Opera Topic Map web site was suggested as one alternative. Lars Marius Garshol volunteered to supply the ontology and specific queries.

Section 5

It was thought that the XTM making up the data should not be given verbatim in the document, but rather be made available through a link. The ontology would perhaps best be represented with a diagram showing the structure, or, if this is too difficult, as AsTMa= examples.

The list of queries in section 5.2 was thought to be very good in general. We still need more use cases and queries, but diversity and range of the queries was considered very good. The number of queries for one use cases was also thought to be about right.

In section 5.2 the results should be taken out, and instead specified in terms of the data model. For example, query 1 has a query result that is "a list of topic items".

Query 3 in 5.2 should be specified more precisely, and the same applies to most of the rest of the queries. What are the "heads" mentioned in query 5, and also in several of the later queries? To query 11 should be added a note that TMQL is unlikely to support this query. In query 14 there is a typo: "No are duplicates allowed".

Section 5.3 should be called "XML output". The DTDs should be cut, since they may be thought to be input to the query processor, which of course is not the case. Query 2 should be replaced with one that has namespaces and attributes, and where some of the attribute values are computed by the query. A third query with some conditional content should be added.

Section 5.4 should be called "topic map output" (see the updated requirements document for clarification). In query 1 the reference to fragments should be removed. We should also consider the relationship between TMQL and XTM Fragments more carefully.

Section 6

The meeting was not sure what to make of this section, and decided to ask the author what the intent behind it was before forming any opinion on the section.

Appendix B

The hyphen in Ann Wrightson's name should be removed :-).

General

For more use cases and queries it might be worthwhile to look at the RDF query use cases and test cases and also the XML Query use cases.

The references section appears to be missing. Stylesheet bug?

Monday, 4 August 2003

Attendees

Agenda for Monday, 4 August 2003

The following discussion notes are rather cryptic without being present at the meeting or having access to the document under discussion. The discussion document was not available at the time of the posting of these notes.

TMCL - Use Cases and Requirements

2.1

2.2

SR2

SR3 data typing

SR4 by -> on. Also applies to general typing issues.

3.3 general : comment on the forcefulness of this section.

sr5 what can be scoping topic.

sr6 for example needs to be added. add 'no more than', 'topic map wide properties'

sr7 ditch.

sr8 keep and add Rnx. clearer defintion of equality. refine last para to indicate that tmcl may create named complex conditions.

sr9 (sr10) permissive and (sr9) prescriptive validation. redo along lines of sr10.

sr11, s12 ditch.

----------------------------

rnx tmcl shall define 2 levels (tmcl lite). One level makes use of tmql, the other level (tmcl lite) makes use of types for selectors.

the syntax of tmcl full extends the syntax of tmcl lite. A tmcl lite schema is also a tmcl full schema. There is more work here on defining tmcl lite.

graham shall encourage those tasked with actions from london to actually do something.

r1 re-phrase to say constraints upon instances of the TM DM.

rn1 tmcl will have a defined data model preferably the TM DM.

rn2 tmcl should enable alternative serialisations of a TMCL schema.

rn3 tmcl must define a) a syntax for expressing constraints b) a model for internal representation for these constraints, c) behaviour of TMCL validation.

r2 clarify use of violation. defn. causes an exception.

r3 composition of schema rather than 'merge'.

r4 relax must to should.

r7 a topic map.

r9 remove 'comparing and'

r10 change to type-hierarchy constraints. remove complex definitions.

r11 definitions changes to 'constraints on'

r12 thoughts : link between DSDL, TM DM issue for XML representation and TMQL basic types. - requirement to DSDL WG1 regarding use of DSDL in basic data typing. - tmcl should not specify which basic type system is used.

r12 XML Schema to Relax-NG note: try to colocate reqs under headings.

r13 re-word. use tm info items and literals.

r14 keep 1st sentence only.

r15 lose 2nd sentence.

r16 steal from LMG - same as TMQL.

R18 Anything retrievable by TMQL can be constrained.

r19 type. and re-write.

r20 sp. re-write to reference tmcl model.

New suggested requirement: ability to state evaluation heuristics against a set of constraints.

Scope of TMCL:

Contrasting TMRM and TMCL merging:

tm1 (references ->) tmcl_schema1

Requirement:

Process Issues:


Appendix

Proposal For Structuring Scope In The Topic Maps Data Model (submitted by Kal Ahmed)

1. Background

This document was prepared to summarise the results of an informal working group within the ISO/IEC SC34 WG3 group. The group was given the task of reviewing the existing use of scope in ISO 13250:2003 and making a proposal for how scope should be treated in the next version of the standard. The primary context for this work was as input to the development of ISO 13250-2, Topic Maps Data Model but the working group also considered the requirements of and possibilities presented by the development of XTM (ISO 13250-3), TMQL (ISO 18408) and TMCL (ISO 19756).

2. Scope as it is used now

The working group found that because the semantics of scope are underspecified in the existing version of ISO 13250, a number of interpretations have been applied by topic map users. These differ in the way in which a scope of a characteristic is evaluated against an application context and in the way in which a scoped characteristic may be treated as a result of such an evaluation.

2.1 Evaluation of Scope

The common model for evaluating the scope of a characteristic is to define a process which takes as its input the set of topics which define the scope and a set of topic which define a user context against which the scope is to be evaluated. The process then applies simple set operations to determine whether the characteristic is in-scope or out-of-scope. The three most common processing algorithms are:

  1. The set of topics that define the scope must be a subset of the set of topics that define the user context (called the 'ALL' algorithm).
  2. The set of topics that define the scope must have some intersection with the set of topics that define the user context (called the 'ANY' algorithm).
  3. The set of topics that define the scope must be exactly the same as the set of topics that define the user context (called the 'EXACT' algorithm).

For clarity, the following table shows the way in which a scoped characteristic would be evaluated against a user context in each of these three mechanisms.

Inputs Evaluation Algorithm
Scope Context ALL ANY EXACT
A,B A out-of-scope in-scope out-of-scope
A,B A,B in-scope in-scope in-scope
A,B A,B,C in-scope in-scope out-of-scope
A,B C out-of-scope out-of-scope out-of-scope
A,B EMPTY out-of-scope out-of-scope out-of-scope

The committee found that in practice, each of these approaches has benefits and drawbacks and that for maximally reliable interchange, a topic map author should be able to declare the model to be applied to the scopes that they create.

2.2 Effect of the evaluation of scope

The committee found that while there is common agreement that a characteristic that is in scope is considered to be a valid statement about the subject represented by the topic that it is a characteristic of, there is not the same clarity for out-of-scope characteristics.

An out-of-scope characteristic may be treated under a negation or a no negation model. Under the negation model the processor infers for an out-of-scope characteristic that the inverse is true, so if topic T has characteristic C which is out-of-scope, this model would infer (NOT C) applies to T. Under the no negation model, the processor infers only that the characteristic does not apply.

3. Proposals

The committee makes the following recommendations.

3.1 Scope processing must follow a no negation model

The Topic Maps Data Model must make it clear that when a characteristic is out of scope, the application should infer only that the characteristic does not apply and should not infer that the the statement is negated.

3.2 Scope Should Be Structured

The topic map author should be allowed to create a scope as a logical expression consisting of topics composed by AND and OR operators only.

3.3 A Processing Model For Scope Should Be Defined

The Topic Maps Data Model must define a processing model for evaluating a user context against a scope to determine if the scoped characteristic is in-scope or out-of-scope. This processing model may allow scope to be evaluated against either:

While the former option is the easier to implement, the latter option provides the greatest expressive power in the processing of scopes.

3.4 Scope Comparison Rules

When scopes are compared for equality, they must be compared in a normalised form. The Topic Maps Data Model must define the algorithm for this normalisation process.

3.5 XTM Considerations

The following recommendations are made with regards to the XTM syntax

3.5.1 scope element

The content model of the scope element must be altered. The scope element content model must allow references to topics and other nested scope elements to be freely mixed. The scope element must have an optional attribute used to specify the compositor used for the list of child elements (AND or OR).

    <occurrence>
      <scope compositor="AND">
        <topicRef xlink:href="english"/>
        <scope compositor="OR">
          <topicRef xlink:href="beginner"/>
          <topicRef xlink:href="intermediate"/>
        </scope>
      </scope>
      <resourceRef xlink:href="...."/>
    </occurrence>
  

3.5.2 Defaulting Of Compositor Attribute Values

The topicMap element should have an additional, optional attribute to specify the default compositor for a scope which consists of a list of themes only. This compositor would then be applied to all elements which do not explicitly override the default value. To preserve backwards compatibility to XTM 1.0, a default value for this attribute should be provided. This attribute would also provide a migration path for XTM 1.0 documents to XTM 1.1 documents and provide a syntactic shortcut for XTM 1.1 documents.

This means that the following would be equivalent to the example given above

  <topicMap compositorDefault="AND">
    <topic>
    ...
    <occurrence>
      <scope>
        <topicRef xlink:href="english"/>
        <scope compositor="OR">
          <topicRef xlink:href="beginner"/>
          <topicRef xlink:href="intermediate"/>
        </scope>
      </scope>
      <resourceRef xlink:href="...."/>
    </occurrence>
    ...
    </topic>
  </topicMap>
 

Because the default would apply to any scope element with no explicitly set compositor, the following would be equivalent to both of the preceeding examples.

  <topicMap compositorDefault="OR">
    <topic>
    ...
    <occurrence>
      <scope compositor="AND">
        <topicRef xlink:href="english"/>
        <scope>
          <topicRef xlink:href="beginner"/>
          <topicRef xlink:href="intermediate"/>
        </scope>
      </scope>
      <resourceRef xlink:href="...."/>
    </occurrence>
    ...
    </topic>
  </topicMap>
 

4. Issues

The following issues have not yet been addressed by the group.

4.1 Determining redundancy for duplicate elimination

Consider the following topic:

  <topic>
    <baseName>
      <scope>
        <topicRef xlink:href="#A"/>
      </scope>
      <baseNameString>foo</baseNameString>
    </baseName>
    <baseName>
      <scope compositor="OR">
        <topicRef xlink:href="#A"/>
        <topicRef xlink:href="#B"/>
      </scope>
      <baseNameString>foo</baseNameString>
    </baseName>
  </topic>

The two base names do not have the same scope, but the first is logically redundant as whenever it applies, the second also applies. In this case, should the first name be removed ?

Author's Note: My feeling is that duplicate elimination rules should be concerned with the elimination of exact duplicates only and not with the elimination of redundant characteristics as this could lead to hard-to-explain side-effects for topic map authoring processes.