TITLE: | Structuring Scope Document |
SOURCE: | M. de Graauw |
PROJECT: | |
PROJECT EDITORS: | |
STATUS: | |
ACTION: | For information and discussion in Baltimore, MD. |
DATE: | 22 November 2002 |
SUMMARY: | |
DISTRIBUTION: | SC34 and Liaisons |
REFER TO: | |
SUPERCEDES: | |
REPLY TO: | Dr. James David Mason (ISO/IEC JTC1/SC34 Chairman) Y-12 National Security Complex Information Technology Services Bldg. 9113 M.S. 8208 Oak Ridge, TN 37831-8208 U.S.A. Telephone: +1 865 574-6973 Facsimile: +1 865 574-1896 E-mail: mailto:mxm@y12.doe.gov http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm Ms. Sara Hafele Desautels, ISO/IEC JTC 1/SC 34 Secretariat American National Standards Institute 25 West 43rd Street New York, NY 10036 Tel: +1 212 642-4937 Fax: +1 212 840-2298 E-mail: sdesaute@ansi.org |
Author | Marc de Graauw |
Version | 2 |
Date | November 20, 2002 |
Copyright Marc de Graauw 2002. The right is hereby given to all to reproduce and distribute this work in its entirety as long as the authorship of Marc de Graauw is recognized and this copyright notice is included.
There are three important aspects to scope:
In this piece I will study aspects 1 and 3, and only fleetingly pay attention to 2. It is not important for the examples what causes the merge to occur. Just assume there is a good reason for the merge to occur (like: you telling the TM engine to merge them).
First I will delve into scope and merging behaviour. This is the actual use case: what behaviour would I want to have. Next I will survey some possible ways of structuring scope and look at the consequences for the use case. The first attempt seems to fail: I have included it since the reasons why it fails are noteworthy. I do not see this as a concrete proposal for some structured scope scenario which I support, but rather as an exploration which should be viewed in conjunction with other proposals for structuring scope.
This paper was not conceived out of the blue. Steve Pepper and Geir Ove Grønmo, Bernard Vatant and Kal Ahmed all have explored structured scope. If one notes any similarities between my proposal(s) and theirs, one can be pretty sure I've stolen them. See below for a very short discussion of their proposals.
Thanks to Lars Marius Garshol for commenting on an earlier draft.
I will use an ultra-shorthand to denote topics, characteristics, scope and merging.
T, U, VERYNICETOPIC
Ta, Tb, Tsomename
Ta{X, Y}
=> : Ta{X}b{Y}, Ta{X}c{Z} =>
Ta{X}b{Y}c{Z}
The reason why scope and merging are important is the elimination of redundant characteristics when merging. I want to be able to have this merge occur:
Example 1: Merging should remove unnecessary scoped
topic characteristics |
Tfrance{EN}frankrijk{NL}, Ufrance{EN,FR}frankreich{DE} =>
Vfrance{EN,FR}frankrijk{NL}frankreich{DE} |
Topic T says 'france' is the English name for the topic and 'frankrijk' the Dutch name, topic U 'france' is the English and French name and 'frankreich' the German one, (the merged) topic V says it all. (Of course for country names one would prefer to use the country PSI's, but as an example this will do fine.) Note that this merge does not occur based on the TNC since the topics do not have identical scope sets on 'france'. I do not want this merge:
Example 2: Current merging leaves redundant topic
characteristics |
Tfrance{EN}frankrijk{NL}, Ufrance{EN,FR}frankreich{DE} =>
Vfrance{EN}france{EN,FR}frankrijk{NL}frankreich{DE} |
This says 'france' is the English name for this topic twice. Merging lots of topic maps with lots of shared characteristics will create an unentangable mess. This kind of merging behaviour is wanted when the scope is to be interpreted as: a topic characteristic is valid when ANY of the topics in the scope apply. In general:
Merging Rule 1: ANY-scopes on the same characteristic
merge to the union of the originating scopes |
Ta{X, Y}, Ua{X} => Va{X, Y} |
Since a is valid when ANY of {X, Y} apply, Ta{X, Y} already expresses Ta{X}.
When scope is interpreted as: a topic characteristic is valid when ALL of the topics in the scope apply, this merge is not wanted. Instead we would want:
Merging Rule 2: ALL-scopes on the same characteristic
merge to the smaller scope when one is a subset of the other |
Ta{X, Y}, Ua{X} => Va{X} |
Since a is valid when ALL of {X} apply, it is already implied that a is valid when ALL of {X, Y} apply. An example:
Example 3: Merging should remove unnecessary scoping
topics |
Tmass{EN}massa{NL,PHYSICS}, Umass{EN}massa{NL} =>
Vmass{EN}massa{NL} |
T says 'massa' is the Dutch translation of 'mass' in the realm of physics. U says 'massa' is always the Dutch equivalent of 'mass'. Assuming this is true, the extra condition that the realm is physics is no longer necessary and removed after the merge. It might be tempting to mirror the behaviour of ANY-scopes and assume ALL-scopes merge to the intersection (I did so at first drafting this paper).
Example 4: Merging ALL-scopes to the intersection of
both is wrong |
Tmass{EN}massa{NL,PHYSICS}, Umass{EN}massa{NL,DIETING} =>
Vmass{EN}massa{NL}
|
There is another relation between scope and merging which I shall call EXACT-scope. See Example 2 above. This is the kind of merging ISO13250:2000, XTM 1.0 and SAM support. Merging of topic characteristics only takes place when two scope sets match exactly:
Merging Rule 3: EXACT-scopes only merge characteristics
with equivalent scope sets |
Ta{X, Y}, Ua{X} => Va{X, Y}a{X}
|
Frankly I find it hard to find a good use case for this merging behaviour. It seems to me merging as shown for AND-scopes or merging as shown for ANY-scopes would be preferable in most circumstances. It also seems to me at odds with the interpretation of scope. ISO 13250:2000 views scopes as ANY-scopes ("a given scope is the union of the subjects of the set of themes used to specify that scope", "If it is desired to specify a scope which is the intersection (rather than the union) of two topics, this can be accomplished by creating a topic whose subject is that intersection, and then by using that topic as a theme.") and SAM interprets scopes as AND-scopes ("Formally, a scope is composed of a set of subjects that together define the context. That is, the topic characteristic is known to be valid only in contexts where all the subjects in the scope apply. "). XTM does not provide an interpretation for scope. Yet the merging mechanisms for topic characteristics merge as would be expected for EXACT-scopes.
So both ANY-scopes and AND-scopes have their own proper uses. Now how could ANY-scopes and AND-scopes be expressed?
The basic idea here is to retain the scope element in XTM, and to add types to the scoping topics to add extra information about the kind of scope and the kind of merging which is wanted. Ditto for HyTM and SAM, but I will use XTM in this writing.
1. ensures maximum backward compatibility with the ISO-interpretation of scope. However, 1. could read "interpret as ALL" and be in line with the current SAM proposal. At first sight this neatly yields the desired merging behaviours.
Example 5: Scoping topics with one shared type are
ANY-scopes |
EN, NL, FR and DE are topics of type LANGUAGE.
|
Example 6: Scoping topics with only distinct types are
ALL-scopes |
EN, NL are topics of type LANGUAGE. PHYSICS is a topic of
type DISCIPLINE.
|
Unfortunately things get messier when we add more types.
Example 7: Simple Theory of Types with multiple types: a
mess? |
P has types A and B
|
So it seems the Simple Theory of Types holds well when scoping topics only have a single type, but breaks down in multiple inheritance contexts. Now we could limit this problem if we only looked at types which contain some PSI. Then of course there could be a processing requirement which states scoping topics may only have a single type with this PSI. But still, it does not look like the most elegant of solutions.
ISO 13250:2000 says in Note 5: "If it is desired to specify a scope which is the intersection (rather than the union) of two topics, this can be accomplished by creating a topic whose subject is that intersection, and then by using that topic as a theme." Intersection and union are misnomers, but when we read this as: "If it is desired to specify a scope which is an ALL-scope (rather than an ANY-scope), this can be accomplished by creating a topic whose subject is that ALL-scope, and then by using that topic as a theme." If we follow this approach, the question boils down to how a processor could recognize a scoping topic to represent an ALL-scope and how to recognize the topics in the ALL-scope. This could be done using an association whose type is "ALL-scope" and whose members are the scoping topics comprising the ALL-scope. The association could be reified and then used as a scoping topic in another scope (which would be an ANY-scope).
Each individual scoping topic constitutes a complete context of validity in itself.
Example 8: Scoping topics are ANY-scopes |
Tfrance{EN}frankrijk{NL}, Ufrance{EN,FR}frankreich{DE} =>
Vfrance{EN,FR}frankrijk{NL}frankreich{DE}
|
Syntactically ANY-scopes would look just like scope looks now:
Syntax: ANY-scopes |
<!-- Scoping topics EN and FR omitted --> |
This means merging of topic characteristics would have to be different than currently specified in XTM 1.0 or the SAM. Since this is not desirable in general, it could be achieved by indicating on the Topic Map that it wants its topics to be subjected to Merging Rule 1 (though I am of the opinion that current merging, as described in Merging Rule 3 is counterintuitive in most (or all) contexts).
Example 9: ALL-scopes as associations |
The reified association of type ALL-scope is indicated as
ALL{NL,PHYSICS}
|
Syntactically ALL-scopes could look like this:
Syntax: ALL-scopes |
<topic id="mass"> |
More examples:
Example 10: Interpretation of more complex uses of
ALL-scopes |
Ta{ALL{P, Q}, ALL{R, S}}
|
Apparently the Good Ol' Iso Way is quite expressive. One cannot express such a thing as: this characteristic is only valid when X does not apply. Possibly a NOT-scope could be added. Possible use cases are: do not show this when the level of the user is 'beginner'. A set of processing rules could be:
Rule |
Example |
Merge ANY-scopes to their union |
Ta{X,Y}, Ua{X,Z} => Va{X,Y,Z} |
Eliminate equal ALL-scopes |
Ta{ALL{X,Y}, ALL{Y,X}} =>
Ta{ALL{X,Y}} |
Eliminate redundant ALL-scopes |
Ta{ALL{X,Y}},ALL{X}} => Ta{ALL{X}} |
Reduce single-topic ALL-scopes to
ANY-scopes |
Ta{ALL{X},Y} => Ta{X,Y} |
In Towards a general theory of scope Steve and Geir discuss scope and conclude a more structured scope might be needed. They - briefly - explore some variants:
Steve and Geir also explore 'context', which is a set of topics a user can use to filter or rank a Topic Map. Context is the counterpart of scope and rules could be established how certain kinds of contexts operate on certain kinds of scope.
Bernard Vatant's submitted some proposals to the SC34WG3 mailing list summer 2002: 1, 2, and 3. His approach starts from distinguishing the roles scoping topics play within a scope, i.e. language, region, time. Scoping topics with the same role will be ANY-scopes, the sets of scoping topics with the same role will be ALL-scopes. Bernard's example:
Example 11: Bernard Vatant: Using roles on scopes |
Ta{france, navarre, 1589-1610}
|
Bernard's approach has similarities to the Simple Theory of Types, but will not suffer from the same problems since it uses roles instead of types (the Simple Theory of Types actually was an investigation to see whether Bernard's approach would work with types instead of roles). Nevertheless, this approach is less expressive than the Good 'Ol ISO Way sketched above since it wouldn't allow ANY-scopes on scoping topics with different roles and wouldn't allow ALL-scopes on scoping topics with the same role. (The latter does not make sense if the scoping topics having the same role are mutually exclusive.) The use of roles for scoping topics is quite natural.
Kal has written a fairly complete proposal. Kal also distinguishes user context from scope and explores how to process an Topic Map with scoped characteristics in a certain user context. Kal also uses roles to further qualify scoping topics. The Good 'Ol ISO Way resembles Kal's approach a lot - no coincidence, since I took a good look at Kal's work before I sat down. They resemble each other so much in fact that when I commented on some aspects of his syntax, he scribbled a variant of his syntax proposal which basically is the Good 'Ol ISO Way with roles added.
Both the approaches I have sketched show a way to pursue a more structured scope. Both do so while maintaining near 100% backward compatibility: the way scope is now used is retained, and only an interpretation is provided for special cases: when scoping topics have a type of a certain kind, and when scoping topics are the reification of some kind of association. The Simple Theory of Types seems to fail. In more complex contexts it is not intuitive, and fixing it is not elegant. The Good Ol' ISO Way is promising. Then there are Bernard's and Kal's approaches, which to me seem at least as promising. The introduction of 'user context' I believe belongs firmly in the application domain, not in the standards (which doesn't mean exploring this notion is not useful). Some believe structured scope and the interpretation of scope belong entirely in the application domain. I would prefer the meaning of scope to be something which the standards describe, and I would draw the line standard/application between interpretation of scope/user context.
What next? It seems to me this all should not go into the SAM. It could however be used as input for the TMCL effort. It would be nice if TMCL supported implementing behaviour as described above. If have already stated reasons why the EXACT-scope behaviour of the current specs is unsatisfying. More specific, it seems at odds with an interpretation of 'normal' scope as "ANY" or "ALL". Most important, this paper hopefully drafts a use case for further pursuing structured scope and may be a building block in such an effort.