ISO/IEC JTC 1/SC34 N0396, 2003-04-03
Title: | The Standard Application Model for Topic Maps |
Source: | Lars Marius Garshol, Graham Moore, JTC1/SC34 |
Project: | ISO 13250 |
Project editor: | Steven R. Newcomb, Michel Biezunski, Martin Bryan |
Status: | First committee draft |
Action: | For review and comment |
Date: | 2003-04-03 |
Summary: | |
Distribution: | SC34 and Liaisons |
Refer to: | ISO/IEC JTC 1/SC34 N0356, 2002-12-04 |
Supercedes: | ISO/IEC JTC 1/SC34 N0356, 2002-12-04 |
Reply to: | Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mail: mailto:mxm@y12.doe.gov http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Mrs. Sara Desautels, ISO/IEC JTC 1/SC 34 Secretariat American National Standards Institute 25 West 43rd Street New York, NY 10036 Tel: +1 212 642-4937 Fax: +1 212 840-2298 E-mail: sdesaute@ansi.org |
This document defines the standard data model and related semantics of topic maps. The model is defined using prose, the Information Set formalism and UML notation to provide an unambiguous definition of topic map structures. In addition, this document defines the rules for merging and provides a set of published subjects.
Together with the Reference Model specification and the HyTM syntax specification this document will supersede [ISO13250]. Together with the XTM syntax specification this document will supersede [XTM]. It is intended to become part of the new ISO 13250 standard. For more information on this process, see [tm-guide].
This is $Revision: 1.51 $.
1 Introduction
2 The metamodel
2.1 The basic types
2.2 Constraints
3 Information item types
3.1 Locator items
3.2 Source locators
3.3 The topic map item
3.4 Topic items
3.4.1 Identifying subjects
3.4.2 Topic characteristics
3.4.3 Scope
3.4.4 Reification
3.4.5 Properties
3.5 Topic name items
3.6 Variant items
3.7 Occurrence items
3.8 Association items
3.9 Association role items
4 Merging
4.1 Merging topics
4.2 Merging topic names
4.3 Merging variant items
4.4 Merging occurrence items
4.5 Merging association items
4.6 Merging association role items
4.7 Merging locator items
5 Published subjects
5.1 The type-instance relationship
5.2 The supertype-subtype relationship
5.3 Variant name scopes
5.4 Topic characteristic types
5.5 Topic map constructs
6 Conformance
A References
B Correspondence with XTM 1.0 PSIs
C Guide to terminology (Non-Normative)
D Resolved issues (Non-Normative)
Topic maps are abstract structures that can encode knowledge about a domain and connect this encoded knowledge to information resources that are considered relevant to the domain. Topic maps are organized around topics, which represent subjects of discourse; associations, representing relationships between the subjects; and occurrences, which connect the subjects to pertinent information resources.
Topic maps may be represented in many ways: using topic map syntaxes in files, inside databases, as internal data structures in running programs, and even mentally in the minds of humans. All these forms are different ways of representing the same abstract structure, and it is that structure that is defined in this document, in the form of a data model.
Topic map implementations must have internal representations of topic maps that have a documented correspondence to the model defined in this Technical Specification. A number of structural constraints and operations on instances of the model are defined, to which implementations must conform.
The process of exporting topic maps from an implementation's internal representation to an instance of a topic map syntax is known as serialization. The opposite process, that of building such a representation from information encoded using a topic map syntax, is known as deserialization. Specifications of topic map syntaxes can define these processes in terms of the model specified in this Technical Specification.
A topic map processor is any module or system that can process topic maps in conformance with this standard. It is assumed that the topic map processor does its work on behalf of another module known as the topic map application. It is assumed that a topic map processor will do deserialization on behalf of the application, and that the processor will manage the topic maps on behalf of the application.
Ed. Note:
There should be a clear mapping between all ISO 13250 and XTM 1.0 defined terms with terms in SAM.
Ed. Note:
Go through XTM 1.0: ensure consistency, and make notes about divergences. Ditto for ISO 13250:2000.
Ed. Note:
Rewrite this document in the correct style for an ISO standard.
The metamodel used in this document is the same as that used by the XML Information Set [infoset]. A topic map's information set consists of a number of information items, which are abstract representations of some part of the topic map. Every information item is an instance of some information item type, which specifies a number of named properties which the information item must have. Throughout this Technical Specification the term "information item" refers to the information item types defined in this model, while information items of particular types are referred to as "topic items", "topic name items", and so on.
The names of these properties are written in square brackets: [property name], following the convention used in [infoset]. Every property has an associated type that constrains what values it may have. The values of the information item's properties constitute the information recorded about that part of the topic map.
Certain properties in the model are specified as computed properties, which means that they are specified in terms of how their values may be produced from other properties in the model. Such properties, while present conceptually, are strictly speaking redundant.
All types defined in this Technical Specification, whether basic types or information item types, have a well-defined test of equality. This equality test is used to avoid duplicate values in properties whose values are of type set. Information items have identity, independent of their values, so items can be compared both by identity and by value.
UML diagrams [UML] are used in addition to the infoset formalism for purposes of illustration. These diagrams are purely informative, and in cases of discrepancy between the diagrams and normative prose, the prose is definitive.
The values of information item properties may be either other information items, or values of the following three basic types.
Strings are sequences of abstract Unicode characters conforming to Unicode Normalization Form C [unicode].
Strings are equal if they consist of the exact same sequence of abstract Unicode characters. This implies that all comparisons are case-sensitive.
Sets are collections of zero or more unordered elements that contain no elements that are equal to each other. Attempts to add a new element that is equal to one already in the set will not cause the set to change; instead the new element must be merged with the equal element already in the set, following the rules for merging information items of that particular type (see section 4 Merging). In topic map information sets, the elements of a set are always information items.
Two sets are equal unless there exists an element in one set for which no equal element can be found in the other.
Null is used to indicate that properties have no value; it does not necessarily indicate that the value of the property is unknown. In this model null can never be contained in a set.
Null is distinct from all other values (including the empty set and the empty string); it is only equal to itself.
The model defined in this Technical Specification contains not only basic types and information item types with named properties, but also constraints on the allowed instances of the model. The purpose of these constraints is to prevent inconsistencies with respect to the data model described herein.
Topic map processors are required to be able to detect violations of the constraints marked as 'SAM constraints' on behalf of topic map applications. There are not requirements regarding when or how such detection is done, nor with how violations are reported.
Note:
Other specifications building on this Technical Specification should take care to specify at which time violations of SAM constraints must be reported.
The Standard Application Model for Topic Maps is the set of information item types and properties that is presented in this section.
An information resource is a resource that can be represented as a sequence of bytes, and thus could potentially be retrieved over a network. Topic maps can refer to information resources external to themselves in order to make statements about them. These information resources are not part of the topic map; they are only referenced from it.
A locator is a string that references one or more information resources. Locators are always expressed in some notation, which defines their formal syntax and interpretation. The definition of locator notations is outside the scope of this Technical Specification.
In instances of this model locator items represent locators. Locator items have the following properties:
[notation]: A non-empty string. The string is the name of
the notation used by this locator. If the string is "URI"
the notation is that described in [RFC2396] and modified
in [RFC2732]; if it is "HyTime"
the
notation is one of those described in [HyTime]. If it is
neither, the two first characters of the string must be
"X-"
; all values that do not begin with "X-"
are reserved.
[reference]: A non-empty string. The string is the locator, whose interpretation and syntax is governed by the value of the [notation] property.
Equality rule: Locator items are equal if they have the same values in their [notation] and [reference] properties.
Note:
Processors are not required to apply normalization to the syntactical expressions of locators in order to detect that syntactically different but logically equivalent locators are in fact equivalent. Processors are, however, encouraged to implement such logic. As such logic cannot be expected to be present in all processors, or to be the same in the processors that do implement it, applications are strongly discouraged from relying on locator normalization for merging.
The source locators of an information item is a set of
locators that may be used to refer to the item. When a topic map
information set is created through deserialization from some topic map
syntax, source locators are created that point back to the syntactical
constructs that gave rise to the information items in the information
set. In these cases the source locators will point to the minimal
syntactical construct of origin, which means that for topic items
created from the XTM syntax, for example, the source locator will
point to the originating topic
element, rather than the
containing topicMap
element.
It is not specified how and when source locators are assigned to information items; this is left to the deserialization specifications for each syntax. For topic maps not created by deserialization from a syntax it is not required that any source locators be assigned. Applications may also freely assign source locators to information items in any way they wish, for example in order to use them to refer to the information items.
Source locators are used to define reification, and in the syntax specifications to ensure that when information is deserialized from different information resources cross references to topic map constructs are correctly interpreted. Other specifications will use source locators to define mechanisms for referencing topic map constructs.
Topic map constructs may have any number source locators since when duplicate constructs are merged the resulting construct inherits all the source locators of the original constructs.
SAM Constraint: Duplicate source locators
It is an error for two different information items to have locator items that are equal in their [source locators] properties, unless they are topic items. If they are topic items they must be merged according to the procedure in 4.1 Merging topics.
A topic map is a set of topics and associations, which may be represented in many forms. Its purpose is to convey information about subjects through the assignment of characteristics to topics representing those subjects. The topic map itself has no meaning or significance beyond its use as a container for the information about those subjects; in particular, the topic map does not represent anything but itself.
The topic map item may be reified, however, in order to make statements about the topic map (that is, the collection of topics and associations) as a whole. These statements may for example provide traditional metadata such as author, version, copyright, or they may reference system metadata such as a schema for the topic map, external documentation of it, and so on.
There is exactly one topic map item in each information set, and all information in the set is available from the properties of that item. Every topic map item represents a single topic map.
Topic map items have the following properties:
[topics]: A set of topic items. This is the set of all the topics in the topic map.
[associations]: A set of association items. This is the set of all the associations in the topic map.
[reifier]: A topic item, or null. The topic item is the topic that reifies this information item.
Computed value: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.
[base locator]: A locator item, or null. The locator item refers to the location where the topic map is stored.
[source locators]: A set of locator items. This is the set containing the source locators of the topic map item.
A subject can be anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. In particular, it is anything on which the creator of a topic map chooses to discourse.
Note:
Examples of subjects for which topics may be created are:
The moon.
The Soviet Union. This subject no longer exists as an organizational unit, but the idea still exists, and so is still a subject.
The letters 'A', 'B', 'C', and 'D'. This is a single subject, a set with four elements.
Plato's notion of the good. This subject is different from, but related to, "good" in the abstract, and John Stuart Mill's notion of "good".
A topic is a symbol used within a topic map to represent some subject, about which the creator of the topic map wishes to make statements. Topics are proxies for the subjects they represent in order to allow statements to be made about the subjects through the assignment of characteristics to the topics that represent them. A statement is the assignment of a value to one of the properties of a topic item representing topic characteristics.
Every topic represents one, and only one, subject. The process of merging ensures that whenever two topics are known to represent the same subject they are merged. It may well be, however, that two topics may represent the same subject without this being detectable by the rules of this standard. Applications and users are therefore free to merge topics as they see fit. Most commonly this will be done by inferring the subject of the topics from their characteristics.
Subjects may be identified in one or more of the following ways:
A subject indicator is an information resource that is referred to from a topic map in an attempt to unambiguously identify the subject of a topic to a human being. Any information resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.
A subject identifier is a locator that refers to a subject indicator. Topic maps contain only subject identifiers, and consequently it is the subject identifier that is the basis for merging; the subject indicator is ignored during merging.
A subject address is a locator that refers to the information resource that is the subject of a topic. The topic thus represents that particular information resource. Different locators are considered to address different information resources. If a topic item has a subject address it is assumed that the topic represents the information resource the subject address refers to.
Note:
Consider the URI http://www.topicmaps.org
. If given as
the subject address of a topic A this would mean that that topic
represents the information resource identified by this URI. However,
using it as the subject identifier of a topic B would mean that B
represents what is described in that information resource. At the time
of writing this would seem to be the organization known as
TopicMaps.Org. (Note: the organization; the real-world
institution known by that name.)
Note the uncertainty in the last sentence above. The information resource in question is a subject indicator for topic B, but it was not written to be a subject indicator (that is, it is not a published subject indicator), and so is not entirely unambiguous with respect to what subject it indicates. Nor is it guaranteed to be stable, so at the time of reading it may indicate some other subject, or it may no longer exist.
Merging of topics in topic maps is defined in terms of subject identifiers, subject addresses, and source locators.
Topic names, occurrences, and association roles are collectively known as topic characteristics, as they are the only characteristics topics may have in a topic map. A topic characteristic assignment is the statement that a certain topic characteristic belongs to a certan topic. In the information set this is represented by the inclusion of an information item representing a topic characteristic in the value of a property of a topic item. Any topic characteristic assignment constitutes a statement about the subject represented by the topic.
The properties of topic items that do not represent topic characteristics are not statements about the subject; they are statements about the topic. As such they are part of the topic map machinery, rather than statements about the subject represented in the topic map.
All topic characteristic assignments have a scope, which defines the context within which the assignment is valid. Outside the context represented by the scope the assignment is not known to be valid. Formally, a scope is composed of a set of subjects that together define the context. That is, the topic characteristic is known to be valid only in contexts where all the subjects in the scope apply.
If the scope of a topic characteristic assignment is the empty set the statement is considered to have unlimited validity, and the assignment is said to be in the unconstrained scope.
Precisely how a subject defines a context is not defined by this standard, but left for those creating topic maps to define as part of the definition of their subjects.
Examples of the use of scope are given below:
The term "Suomi" is the name of Finland in the context of Finnish. This corresponds to assigning the base name "Suomi" to a topic representing Finland, and giving it as scope a topic representing Finnish.
According to Norman Davies World War II started on June 6, 1937 [Davies]. This corresponds to creating a topic representing WWII, and assigning to it the string "June 6, 1937" as an occurrence of type "start date", and giving this occurrence as scope a topic representing the person Norman Davies.
According to Peter T. Daniels, the Devanagari script is an instance of the script type "abugida," whereas according to William Bright it is an "alphasyllabary". This corresponds to having two "class-instance" associations, each scoped with a topic representing the relevant authority.
Every topic represents one subject, and the relationship between the two is always one of representation. However, the term reification is used for situations where the subject represented by the topic is part of the topic map.
In many cases it is desirable to be able to attach additional information to topic map constructs such as topic names or associations. One may want to give an association occurrences, or to give an occurrence a name. The basic topic map model does not allow this, but through reification this can be done by creating a topic that reifies the topic map construct. The necessary information can then be attached to the reifying topic, and the reification relationship is present in structured form, and can reliably be detected by software.
Reification is achieved by giving the reifying topic a subject identifier that refers to the topic map construct that is being reified. In model terms, this means that if an information item has a source locator item that is equal to one of the items in the [subject identifiers] property of a topic, that topic item reifies the information item.
Note:
One topic cannot reify another. To make one topic the subject indicator of another implies that the two topics represent the same subject, and they will therefore be merged, and thus become a single topic.
Topic items represent topics, and have the following properties:
[topic names]: A set of topic name items. This is the set of topic names assigned to this topic.
[occurrences]: A set of occurrence items. This is the set of occurrences assigned to this topic.
[roles played]: A set of association role items. This is the set of association roles played by this topic.
Computed value: the set of all association role items whose [role playing topic] property contains this topic item.
[subject identifiers]: A set of locator items. The locator items refer to the subject indicators of this topic.
[subject addresses]: A set of locator items. The locators, if present, refer to the information resource that is the subject of this topic. If the set contains more than one locator this implies that the locators all address the same information resource.
[reified]: an information item, or null. This is the information item reified by this topic item; that is, the topic map construct that is the subject of this topic.
Computed value: if any information item has in its [source locators] property a locator item equal to one in the [subject identifiers] property of this topic item, that information item is the value of the [reified] property. If no such information item is found the value is null.
[source locators]: A set of locator items. This is the set containing the source locators of the topic item.
SAM Constraint: Single reified
The computation that produces the value of the [reified] property must yield a single information item, as topics are required to have only one subject.
Equality rule: Two topic items are equal if they have:
at least one equal locator item in their [subject identifiers] properties,
at least one equal locator item in their [source locators] properties,
at least one equal locator item in their [subject addresses] properties,
an equal locator in the [subject identifiers] property of the one topic item and the [source locators] property of the other, or
the same information item in their [reified] property
Topics which have a non-empty [subject addresses] property are considered to represent the information resource they reference in that property.
Note:
Locators which refer directly to subjects which are not information resources must be used with caution. They should not be used in the [subject addresses] property, as this is intended only for references to information resources. Rather, they should be placed in the [subject identifiers] property.
A topic name is a base name together with its associated variant names. It is the topic name which is a topic characteristic; the base name and variant names are only parts of the topic name characteristic.
Topic names may have a type, which defines what kind of name the topic name represents. They always have a scope, which defines in what contexts the topic name is an appropriate label for the subject. A subject may have any number of topic names, and the only basis for choosing which topic name(s) to use in any given situation is their type and scope.
A base name is a name or label for a subject, expressed as a string. That is, it is something that identifies the subject (though not necessarily uniquely) and can be used as a label for the subject in user interfaces. The notion of a base name corresponds closely to the common sense notion of a name. Suitable base names for people, countries, and organizations are their names, while base names for documents, musical works, and movies might be their titles. Base names may have variant names, which are alternative forms of the base name that may be more appropriate in specific contexts. Essentially, a base name is a specialized kind of occurrence.
Topic name items represent topic names, and have the following properties:
[value]: A string. This string is the base name.
[type]: A topic item, or null. The topic item represents the subject that defines what kind of topic name this is.
[scope]: A set of topic items. This set is the scope that represents the validity of this topic name as a label for the subject.
[variants]: A set of variant items. This set contains the variant names that are alternative forms of the base name.
[reifier]: A topic item, or null. The topic item is the topic that reifies this information item.
Computed value: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.
[source locators]: A set of locator items. This is the set containing the source locators of the topic name item.
Equality rule: Topic name items are equal if the values of their [value], [type], and [scope] properties are equal and they are contained in the [topic names] property of the same topic item.
A variant name is an alternative form of a base name that may be more suitable in certain contexts than the base name itself. The scope of the variant name is the only basis for establishing what variant name is most suitable in any given situation. A variant name may be a string, but it may also be any other kind of information resource.
When choosing a label for a topic, applications are expected to select the base name they consider most appropriate, and then evaluate which of the forms of that base name is best suited for display in that particular context, which may be the base name or one of its variants. This standard does not constrain the process by which this is done.
Section 5.3 Variant name scopes defines some published subjects that may be useful for scope variant names.
Variant items represent variant names, and have the following properties:
[value]: A string, which may be empty, or it may be null. The string, if set, is the variant name.
[resource]: A locator item, or null. The locator, if set, refers to the information resource that is the variant name.
[scope]: A non-empty set of topic items. This set is the scope that describes in what context(s) the variant name may be preferred as a label for the topic.
[reifier]: A topic item, or null. The topic item is the topic that reifies this information item.
Computed value: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.
[source locators]: A set of locator items. This is the set containing the source locators of the variant item.
The value of the [scope] property of each variant item must be a true superset of the value of the [scope] property of the base name item that is its parent.
SAM Constraint: Value/resource exclusion
Exactly one of the [value] and [resource] properties must contain null.
Equality rule: Variant items are equal if the values of their [value], [resource], and [scope] properties are equal and they are constained in the [variants] property of the same topic name item.
An occurrence is a relationship between a subject and an information resource. The subject in question is that represented by the topic in whose [occurrences] property the occurrence item can be found. The precise nature of the relationship is described by the occurrence type, a subject which is attached to the occurrence. Occurrences are generally used to attach information resources to the subjects they are relevant to. The information resource may either be a string inside the topic map or an external information resource.
Note that the occurrence is properly not the resource, but the relationship between it and the subject. Occurrences are essentially a specialized kind of association, where one participant in the association must be an information resource.
All occurrences have a scope, which defines the contexts in which the occurrence relationship between the information resource and the subject is valid.
An occurrence item represents an occurrence and have the following properties:
[value]: A string, or null. The string, if present, is the information resource the occurrence connects with the subject.
[resource]: A locator item, or null. The locator, if set, is a reference to the information resource the occurrence connects with the subject.
[scope]: A set of topic items. This set is the scope that describes in what context the occurrence relationship may be considered valid.
[type]: A topic item, or null. The topic item represents the subject that defines the nature of the occurrence relationship.
[reifier]: A topic item, or null. The topic item is the topic that reifies this information item.
Computed value: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.
[source locators]: A set of locator items. This is the set containing the source locators of the occurrence item.
SAM Constraint: Value/resource exclusion
Exactly one of the [value] and [resource] properties must contain null.
Equality rule: Occurrence items are equal if the values of their [value], [resource], [scope], and [type] properties are equal and they are contained in the [occurrences] property of the same topic item.
An association is a relationship between one or more subjects. Associations have an association type, a subject which describes the nature of the relationship. The involvement of each subject in the relationship is called its association role.
Note:
An example of an association might be the 'authorship' relationship between Henrik Ibsen and the play 'Peer Gynt'. In this relationship there are two roles: Ibsen plays the role of 'author', while 'Peer Gynt' plays the role of 'work'.
Another example might be the 'parenthood' relationship between Hamlet, King Hamlet, and Queen Gertrude. This relationship has three roles: Hamlet plays the role of 'child', the King that of 'father', and the Queen that of 'mother'.
All associations have a scope, which defines the context in which the statement represented by the association can be considered valid. The scope applies to the association roles as characteristics of the topics that play these roles, but all association roles in an association have the same scope, and so the scope is considered to apply to the association as a whole as well.
Association items represent associations, and have the following properties:
[scope]: A set of topic items. This set is the scope that describes in what context the association may be considered valid.
[type]: A topic item, or null. The topic item represents the association type of the association.
[roles]: A non-empty set of association role items. The association role items represent the association roles that make up this association.
[reifier]: A topic item, or null. The topic item is the topic that reifies this information item.
Computed value: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.
[source locators]: A set of locator items. This is the set containing the source locators of the association item.
Equality rule: Association items are equal if the values of their [scope], [type], and [roles] properties are equal.
An association role connects two pieces of information within an association: the subject participating in the association, known as the association role player, and the subject defining the nature of its participation, known as the association role type.
Ed. Note:
The UML should declare that there is an inverse of rolePlayingTopic, which is roles.
Association role items represent association roles, and may have the following properties:
[role playing topic]: A topic item. This is the topic item that represents the association role player.
[type]: A topic item. This is the topic item that represents the association role type.
[reifier]: A topic item, or null. The topic item is the topic that reifies this information item.
Computed value: if there exists a topic item in whose [subject identifiers] property can be found a locator item equal to one in the [source locators] property of this information item that topic item is the value of the [reifier] property. If not, its value is null.
[source locators]: A set of locator items. This is the set containing the source locators of the association role item.
Equality rule: Association role items are equal if the values of their [type] and [role playing topic] properties are equal and they are contained in the [roles] property of the same association item.
Merging is a process applied to topic maps in order to reduce the number of redundant information items representing the same information. Merging is required to be performed in certain cases, but this is insufficient to guarantee that there will always be one topic per subject. Applications are therefore allowed to merge topics as they see fit.
Merging is triggered for information items of all types whenever an attempt is made to add an information item to a set that is equal to another already in that set.
Topics are merged whenever the [topics] property of the topic map item contains two equal topic items. The merging of the topic items is done by the following procedure. The two topic items to be merged are known as A and B.
Create a new topic item C.
Replace A by C wherever it appears in one of the following properties of some information item: [topics], [scope], [type], and [role playing topic].
Repeat for B.
Set C's [source locators] property to the union of the values of A and B's [source locators] properties.
Set C's [subject identifiers] property to the union of the values of A and B's [subject identifiers] properties.
Set C's [subject addresses] property to the union of the values of A and B's [subject addresses] properties.
Set C's [topic names] property to the union of the values of A and B's [topic names] properties.
Set C's [occurrences] property to the union of the values of A and B's [occurrences] properties.
Remove A and B from the topic map item's [topics] property.
Topic names are merged when the [topic names] property of a topic item contains two equal topic name items. The procedure for merging two topic name items A and B is given below.
Create a new topic name item C.
Set C's [source locators] value to the union of the value of the [source locators] properties of A and B.
Set C's [value] value to the value of the [value] property of A. B's value is equal that of A, and need therefore not be taken into account.
Set C's [scope] value to the value of the [scope] property of A. B's value is equal that of A, and need therefore not be taken into account.
Set C's [variants] value to the union of the [variants] properties of A and B.
Remove A and B from the parent topic item's [topic names] property, and add C in their place.
Variant items are merged whenever the [variants] property of a topic name item contains two equal variant items. Two variant items, A and B, are merged by following the procedure below.
Create a new variant item, C.
Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.
Set C's [value], [resource], and [scope] properties to the value of A's [value], [resource], and [scope] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.
Remove A and B from the parent topic name item's [variants] property.
Occurrence items are merged whenever the [occurrences] property of a topic item contains two equal occurrence items. Two occurrence items, A and B, are merged by following the procedure below.
Create a new occurrence item, C.
Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.
Set C's [value], [resource], [scope], and [type] properties to the value of A's [value], [resource], [scope], and [type] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.
Remove A and B from the parent topic item's [occurrences] property.
Association items are merged whenever the [associations] property of the topic map item contains two equal association items. To merge two association items, A and B, follow the procedure below.
Create a new association item, C.
Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.
Set C's [scope], [roles], and [type] properties to the value of A's [scope], [roles], and [type] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.
Remove A and B from the topic map item's [associations] property.
Association role items are merged whenever the [roles] property of an association item contains two equal association role items. Two association role items, A and B, are merged by following the procedure below.
Create a new association role item, C.
Set C's [source locators] property to the union of the values of A's and B's [source locators] properties.
Set C's [type] and [role playing topic] properties to the value of A's [type] and [role playing topic] properties, respectively. B's values are equal to those of A, and need therefore not be taken into account.
Remove A and B from the parent association's [roles] property.
A published subject indicator is a subject indicator that is published and maintained at an advertised location for the purposes of supporting topic map interchange and mergeability. A published subject is any subject for which there exists at least one published subject indicator. A published subject identifier is the subject identifier of a published subject indicator.
This section defines a number of published subjects in the expectation that many topic map applications will need them. These subjects form a central part of the topic map standard, yet there is no requirement that applications use them. Applications are free to define their own alternatives.
All published subjects defined by this Technical Specification are distinct.
A type is a set of individual subjects, each of which is an instance of the type. Types are generally used to represent sets of subjects which share some commonality, but the possible uses of types are not required, nor are their meanings. A type may itself be an instance of another type, and there is no limit to the number of types a subject may be an instance of. Scope applies to this association type in just the same way as it does to any other.
The type-instance relationship is not transitive. That is, if A is a type of which B is an instance, and B is a type of which C is an instance, it does not follow that C is an instance of A.
The type-instance relationship between two topic items can be asserted in topic maps using an association item that conforms to the following rules:
The [type] property must be set to a topic item that has in its
[subject identifiers] property a locator item with [notation] set to
"URI"
and [reference] set to
"http://psi.topicmaps.org/sam/1.0/#type-instance"
.
The [roles] property must contain exactly two association role items.
One of the association items in the [roles] property must have its
[type] property set to a topic whose [subject identifiers] property is
set to a locator item with [notation] set to "URI"
and
[reference] set to
"http://psi.topicmaps.org/sam/1.0/#type"
. The
[role player] property will then contain the topic item representing
the type.
One of the association items in the [roles] property must have its
[type] property set to a topic whose [subject identifiers] property is
set to a locator item with [notation] set to "URI"
and
[reference] set to
"http://psi.topicmaps.org/sam/1.0/#instance"
.
The [role player] property will then contain the topic item
representing the instance.
Associations that use one or more of the published subjects defined in this section, but which do not conform to these structural rules, are not considered to represent type-instance relationships.
Note:
Implementations are not required to actually represent the type-instance relationship using associations.
A type may be a subtype of another, which is then considered the supertype of the first. If B is the subtype of A, it follows that every instance of B is also an instance of A. The converse is not necessarily true. The relationship between a supertype and its subtypes is known as the supertype-subtype relationship. A type may have any number of subtypes and supertypes. Scope applies to this association type in just the same way as it does to any other.
Note:
This means that if 'a' is an instance of 'b' in scope 'Y' and 'X', and 'b' is a subtype of 'c' in scope 'Y' and 'Z' 'a' is an instance of 'c' only in the context of 'Y', 'X', and 'Z'.
The supertype-subtype relationship is transitive, which means that if B is a subtype of A, and C a subtype of B, C is also a subtype of A.
Note:
Loops in this relationship are allowed, and should be interpreted to mean that the instances of the types in the loop are the same. This does not, however, necessarily imply that the types are the same.
The supertype-subtype relationship between two types can be asserted in topic maps using an association item that conforms to the following rules:
The [type] property must be set to a topic item that has in its
[subject identifiers] property a locator item with [notation] set to
"URI"
and [reference] set to
"http://psi.topicmaps.org/sam/1.0/#supertype-subtype"
.
The [roles] property must contain exactly two association role items.
One of the association items in the [roles] property must have its
[type] property set to a topic whose [subject identifiers] property is
set to a locator item with [notation] set to "URI"
and
[reference] set to
"http://psi.topicmaps.org/sam/1.0/#supertype"
.
The [role player] property will then contain the topic item
representing the supertype.
One of the association items in the [roles] property must have its
[type] property set to a topic whose [subject identifiers] property is
set to a locator item with [notation] set to "URI"
and
[reference] set to
"http://psi.topicmaps.org/sam/1.0/#subtype"
.
The [role player] property will then contain the topic item
representing the subtype.
Associations that use one or more of the published subjects defined in this section, but which do not conform to these structural rules, are not considered to represent supertype-subtype relationships. Their interpretation is not defined.
Note:
Although these published subjects are included as part of this Technical Specification there is no requirement that applications must actually use them. Applications that wish to have different semantics for their supertype-subtype relationships are free to define their own published subjects for this purpose.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#sort
(notation
"URI"
), identifies the notion of "suitability of a
variant name for use as a sort key for a subject". A variant item that
has a topic with this subject identifier in its [scope] property
represents a variant name intended to be used as one of the possible
sort keys for the topic item it belongs to in contexts which are
indicated by the scope of the variant name.
Sort names will be sorted in Unicode code point order. Applications are expected to produce sort names that, when sorted with this algorithm, will give the sort order desired.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#display
(notation
"URI"
), identifies the notion of "suitability of a variant
name for use as a display name for a subject". A variant item that has
a topic with this subject identifier in its [scope] property
represents a variant name intended to be used as one of the possible
labels for the topic item it belongs to in contexts which are
indicated by the scope of the variant name.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#unique-characteristic
(notation "URI"
), identifies the type of topic
characteristic types that are unique. A topic characteristic whose
type is an instance of this type must be unique across all topics in
the topic map. This means that if two topics are found to have equal
information items representing such a topic characteristic they must
be merged according to the rules of 4.1 Merging topics.
This section describes published subjects for the main topic map constructs, useful as types for topics reifying topic map constructs of various kinds.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#topic-name
(notation
"URI"
) identifies the type of topic names, as described
in 3.5 Topic name items.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#variant
(notation
"URI"
) identifies the type of variant names, as described
in 3.6 Variant items.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#occurrence
(notation
"URI"
) identifies the type of occurrences, as described
in 3.7 Occurrence items.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#association
(notation
"URI"
) identifies the type of associations, as described
in 3.8 Association items.
The subject identifier
http://psi.topicmaps.org/sam/1.0/#association-role
(notation
"URI"
) identifies the type of association roles, as described
in 3.9 Association role items.
A topic map processor conforms to this standard provided it meets the requirements listed below.
The topic map processor must make all the information described in 3 Information item types available to applications, and document how its representation of topic map corresponds to the model defined in that section.
Information beyond that described by the model defined in this
Technical Specification may be provided, but none may be left out. It is permitted to
only allow locators that follow the "URI"
notation, and
to represent these as strings.
The topic map processor must be able detect and report all violations of the SAM constraints.
The topic map processor must detect all attempts to add duplicate values to set properties, and also perform all merges according to the rules of section 4 Merging.
It is not required that topic map processors treat computed properties differently from the other properties in any way. Their values must be the same as if they had been computed using the procedure defined in this Technical Specification.
This Technical Specification defines new published subjects to replace those defined in XTM 1.0. This section defines the correspondence between the old subject identifiers defined by XTM 1.0 with those defined in this Technical Specification.
In the table below, the subject identifiers in the left-hand column are equivalent with those in the right-hand column.
http://psi.topicmaps.org/sam/1.0/#type-instance | http://www.topicmaps.org/xtm/1.0/core.xtm#class-instance |
http://psi.topicmaps.org/sam/1.0/#type | http://www.topicmaps.org/xtm/1.0/core.xtm#class |
http://psi.topicmaps.org/sam/1.0/#instance | http://www.topicmaps.org/xtm/1.0/core.xtm#instance |
http://psi.topicmaps.org/sam/1.0/#supertype-subtype | http://www.topicmaps.org/xtm/1.0/core.xtm#superclass-subclass |
http://psi.topicmaps.org/sam/1.0/#supertype | http://www.topicmaps.org/xtm/1.0/core.xtm#superclass |
http://psi.topicmaps.org/sam/1.0/#subtype | http://www.topicmaps.org/xtm/1.0/core.xtm#subclass |
http://psi.topicmaps.org/sam/1.0/#sort | http://www.topicmaps.org/xtm/1.0/core.xtm#sort |
http://psi.topicmaps.org/sam/1.0/#display | http://www.topicmaps.org/xtm/1.0/core.xtm#display |
This section provides a guide to how the terminology of topic maps has evolved since the publication of ISO 13250:2000. Only changes in terminology are documented.
Term | Defined in | Comment |
---|---|---|
Added themes | 13250:2000 | Syntax-specific. |
Addressable information resource | XTM 1.0 | Synonymous with XTM 1.0 'resource', which is again synonymous with 'information resource' in this specification. Superfluous. |
Addressable subject | XTM 1.0 | Synonymous with XTM 1.0 'resource', which is again synonymous with 'information resource' in this specification. Superfluous. |
Association link | 13250:2000 | Synonymous with 'association', but also used to mean an
association element. The former is superfluous, the
latter syntax-specific. |
Association role | 13250:2000, SAM | XTM 1.0 replaced it by the concept of 'member'; SAM uses 'association role', not 'member'. |
Association role type | SAM | New. |
Association role player | SAM | New. |
Basic types | SAM | New. |
Bounded object set | 13250:2000 | Syntax-specific. |
Characteristic | XTM 1.0 | Synonym for 'topic characteristic'. Superfluous. |
Consistent topic map | XTM 1.0 | Synonym for what SAM calls 'topic map'. Superfluous. |
Deserialization | SAM | New. |
Display name | 13250:2000 | No longer a formal term. |
Facet | 13250:2000 | No longer a formal term. |
Facet link | 13250:2000 | No longer a formal term. |
Facet value | 13250:2000 | No longer a formal term. |
Facet type | 13250:2000 | No longer a formal term. |
Hub document | 13250:2000 | Syntax-specific. |
Information item | SAM | New. |
Information resource | SAM | Replaces XTM 1.0 'resource', which conflicted with RFC 2396 'resource'. |
Instance | SAM | New. |
Locator | SAM | New. |
Member | XTM 1.0 | Replaced by 'association role'. |
Merging | XTM 1.0, SAM | Introduced in XTM 1.0, generalized in SAM. |
Non-addressable subject | XTM 1.0 | Not needed. |
Notation | SAM | New. |
Occurrence | XTM 1.0, SAM | Replaces 'topic occurrence'. |
Occurrence role | 13250:2000 | Replaced by 'occurrence type'. |
Occurrence type | XTM 1.0, SAM | Replaces 'occurrence role'. |
Parameters | XTM 1.0 | Replaced by 'scope'. |
Processed topic map | XTM 1.0 | Synonym for 'topic map'. Superfluous. |
Processing requirements | XTM 1.0 | Not needed. |
PSI | XTM 1.0 | Synonym for 'published subject indicator'. Superfluous. |
Public subject descriptor | 13250:2000 | Replaced by 'published subject indicator'. |
Published subject | SAM | New. |
Published subject identifier | SAM | New. |
Published subject indicator | XTM 1.0, SAM | Replaces 'public subject descriptor'. |
Reification | XTM 1.0, SAM | New. |
Resource | XTM 1.0 | Conflicted with RFC 2396 'resource', so replaced by 'information resource'. |
Role | XTM 1.0 | Synonymous with 'association role'. Superfluous. |
Serialization | SAM | New. |
Sort key | 13250:2000 | No longer a formal term. |
Sort name | 13250:2000 | No longer a formal term. |
Source locator | SAM | New. |
Subject address | SAM | New. |
Subject descriptor | 13250:2000 | Replaced by 'subject indicator'. |
Subject identifier | SAM | New. |
Subject identity | XTM 1.0 | No longer a formal term. |
Subject indicator | XTM 1.0, SAM | Replaces 'subject descriptor'. |
Subject indicator | XTM 1.0, SAM | Replaces 'subject descriptor'. |
Ed. Note:
Finish this table.
Resolutions to the issues listed below have been proposed by the authors of this document, but have not been approved by the committee.
Where should the indicators for the subjects published in the new ISO 13250 be published?
Issue (constr-single-subject-address):
What happens when the single subject address constraint is violated?
Should base name items be merged, so that assertions made about one base name will also apply to all other base names that have the same identity? (This also applies to occurrences.)
Issue (prop-subj-address-class):
Are topics representing information resources allowed as types?
Is [label] a better name?
Issue (prop-variant-scope-superset):
ISO 13250:2000 allows a display name to have a scope that is a subset of that of the corresponding base name. This apparent contradiction needs to be resolved.
How does one uniquely identify the set of published subjects defined in SAM? Is there a need to do so? Is a published subject for these published subjects needed? (Does it include itself?)
Issue (psi-subclassing-loops):
Are subtype loops allowed?
Issue (psi-type-instance-scope):
What does it mean when type-instance associations are scoped? How is this to be interpreted, and what is the interaction with scoped supertype-subtype associations?
Issue (prop-subj-address-values):
Should this property only accept a single value?
Issue (prop-subj-address-scope):
Are topics representing resources allowed as themes in scopes?
The scope of ISO 13250 is currently restricted to only defining the issues related to the interchange of topic maps. Should that scope be extended so that the standard can also cover application-internal issues?
This definition of scope is different from that of XTM 1.0 and ISO 13250:2000, in that it explicitly says topic characteristic assignments are valid for each of the subjects in its scope individually. Is that acceptable?
Issue (scope-unconstrained-rep):
How should the unconstrained scope be represented?
Should it be possible to create topics that represent strings, and for it to be formally clear that these topics do represent particular strings? If so, how?
How does one determine which subjects are published subjects and which are not? Is it necessary for the SAM model to provide a mechanism for this at all?
Issue (term-subject-indicator-def):
If the subject identifier is a locator that does not refer to an information resource, what is the subject indicator then? This also applies to the subject address.
Should occurrences be allowed to have variants in the same way that topic names are?
Must locators really refer to information resources? Some URN schemes allow resources that are not information resources to be addressed. This affects the definitions of "information resource", "locator", as well as the [subject identifiers] and [subject addresses] properties.
Issue (locator-notation-support):
What locator notations, if any, must be supported?
If you reify a topic name, does that affect your allowed type? If you reify an association, must you inherit its type?
Below are listed all the issues that have been present in earlier versions of this document, but have since been resolved. They are included here for reference. Follow the links in order to find the resolutions, as well as background material on each issue.
Should all types be called classes, all classes types, or should both terms be used? Which terms should be used where?
Should we have the topic.[classes] property? It means either that classness cannot be scoped, or that classness has a double representation. The question is: is scoping of classness important? Does it cause problems for implementations and applications?
Should the UML diagrams be made normative?
Do the 'linktrav' and 'listtrav' attributes of the HyTM syntax have model significance?
Issue (prop-subj-address-name):
Is topic.[subject address] the right name for the property? We have [subject indicator], not [subject identifier], so why [subject address] rather than [subject resource]?
Issue (xtm-def-occurrence-type):
According to XTM 1.0 the default occurrence type is the "occurrence" published subject. Should this standard follow its lead? If so, what does it mean?
Should the "topic", "association", and "occurrence" PSIs be specified in the SAM? If so, what do they mean, and what is their function?
The term "subject address" does not correlate with the term "subject indicator", since "subject address" corresponds more clearly to "subject identifier". A better name should be found.
Do the definitions of the terms 'topic map processor' and 'topic map applications' have unwanted consequences for what software architectures can be conforming implementations of this standard?
Is the term 'theme' useful, or best forgotten?
Issue (xtm-implicit-constraints):
The XTM DTD contains a number of implicit constraints, such as that an addressable subject may not be used as a theme or a type. Should these constraints be mirrored in the SAM?
Do we need to define what the empty string is? What about non-empty string (used liberally throughout)?
Do we need to define what the empty set is? And the empty string?
Should we define an equality criterion for topic map items? There is no need for duplicate removal for topic maps, but on the other hand that would be what is needed to define the conformance requirements on serialization implementations.
Should topic map items have a [schema] property that may contain their schema definition(s)? This would make it clear where to find the topic map schema. On the other hand, the TMCL specification should perhaps have its own rules for specifying how to find the schema of a topic map. It may be better to keep the levels strictly apart.
Should the standard say as little as possible about the nature of subjects, or should it be more detailed in order to provide guidance to readers? The current text is detailed, but may be too much so.
Issue (term-subject-identity):
Is the term "subject identity" needed? It is defined in XTM 1.0, but it is not clear that there is any use for it. The XTM 1.0 definition is: "That which makes two subjects identical, or distinguishes one subject from another."
Issue (term-topic-characteristic-assignment):
Are topic characteristic assignments statements about the topics's subject, or about the topic?
Issue (merge-same-subj-ind-addr):
Is it allowed for a topic to have the same locator item both as subject identifier and as subject address? If it does, must not this mean that the topic has two subjects?
XTM 1.0 has a term "topic name", but it is not clear how it relates to the term "base name". Its use in XTM 1.0 seems to be inconsistent. Is the term useful, or should it be abandoned?
If it may not be null, why may it be empty?
ISO 13250:2000 does not restrict display/sort names to a single base name, by design. Is it acceptable for SAM to do so?
Issue (assoc-role-player-type):
Must both [role playing topic] and [role type] have values?
Should source locators of B be copied into A? If they are, it is implied that A is the same topic map as B, which is not true. Also, topics reifying B will then also reify A, which means that any statements made about B will also apply to A.
Should the standard state outright that "subject" and "resource" (as per RFC 2396) are the same thing? (Quote: "A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.")
Issue (merge-srcloc-vs-subjid):
What happens when the same locator appears as a source locator for one topic and as a subject identifier for another? Does that locator then become a source locator or a subject identifier of the merged topic? This question arises both under deserialization and under merging.
What happens if two different topic items reify the same item? Should they be merged?
Is it likely that the term "IRI" will replace "URI" in the foreseeable future? Does there need to be well-defined mechanism for adding new possible values for the [notation] property?
Issue (prop-reifier-computed):
Is it acceptable for the [reified] property to be computed, or must it be a fundamental property?
Should topic items have a [reifier] property? Should it be possible to reify topics? If so, how?
The text as written implies that processors must use a Unicode normalization form, without requiring any particular one. The Web Character Model requires Normalization Form C, as does the current XML 1.1 Working Draft. Requiring normalization improves string comparison, but imposes a possibly unwelcome burden on implementors.
For full internationalization it is necessary to support bidirectional text in names and occurrences. This requires that certain kinds of information be provided about the text.
Distinguish between properties which have containment semantics, and those which are references?
Issue (term-topic-characteristic-reified):
Does the thing reified by a topic count as a characteristic of the topic? It is the subject of the topic, so the question is perhaps whether we are interested in the characteristics of topics or subjects.
Do we need to define the term 'application' formally?
Issue (prop-srcloc-interchange):
None of the interchange syntaxes allow source locators to be interchanged. Is interchange of source locators a desired feature? Do the syntaxes need to be extended to accomodate source locators?
Does this standard need to define how sorting of topics is done? It is a highly fundamental operation. On the other hand, users may want flexibility in this regard.
If the topic naming constraint is not retained by the standard, is there then any need for this published subject? Or will the base name then take over the function previously fulfilled by this published subject?
Issue (locator-normalization):
If a locator syntax allows equivalent locators to be given different syntactical expressions normalization must be applied in order to take this into account. Where should the text that sets out this requirement go? Does it belong in this document, or in the syntax specifications?
Should it become possible for the scopes of topic characteristic assignments to have internal structure?
Should we define operations that describe how modification of SAM instances is done?
Should we speak about 'equality' of items, or 'equivalence'?
Issue (infinite-subject-spaces):
How should values from infinite subject spaces be represented in topic maps?
Should we name the type properties [association type], [role type], and [occurrence type], or should they all be called [type]?
Issue (term-subject-address-def):
At what level of interpretation does the topic represent the resource? Does it represent that storage location? The stream of bytes? The stream of bytes interpreted in some particular way? The standard must either leave the details open or clarify this. Note that it may be impossible to clarify when the interpretation of locators is left undefined.
Is a base locator property on the topic map item needed by other specifications?
Issue (subject-identity-establish):
ISO 13250 states that subject identity may be "inferred from the topic's characteristics." Does SAM need words to the same effect?
The definitions of 'base name' and basename.[value] are too naïve in the presence of the TNC. If we are to have the TNC they must change.
Should base names be allowed to have both types and scopes in the same way that occurrences do?
Should the subject identifiers defined by XTM 1.0 be retained as they are, or should new equivalent ones be defined to replace the originals?
Issue (topic-naming-constraint):
Should the standard retain the topic naming constraint?
Issue (err-constraint-violations):
Is it acceptable that constraint violations may be reported at any time?
The presence of a TMCL schema may allow applications to improve the result of merging topics/topic maps by providing enough information to allow implementations to do additional transformations and redundancy removal. How should the SAM specification deal with this possibility?