ISO/IEC JTC 1/SC 22/WG 20 N 527R |
Title: Disposition of comments on ballot JTC1/SC22 N 2466
ISO/IEC CD 14651, International String
Ordering
Date: 1997-07-01
Project: JTC 1.22.30.02.02
Cross Reference: SC22 N2364
Source: Alain LaBonté, Project editor,
on behalf of SC22/WG20
Status: Information required according to
directives by SC22 Secretariat
Action: For National Body Consideration
REVISED SUMMARY OF VOTING ON
Letter Ballot Reference No: SC22 N2364
Circulated by: JTC 1/SC22
Circulation Date: 01-20-1997
Closing Date: 04-24-1997
SUBJECT: CD Approval for CD 14651 - Information technology
International String Ordering - Method for Comparing Character
Strings and Description
of a Default Tailorable Ordering
The following responses have been
received on the subject of approval:
"P" Members supporting approval
without comment 10
"P" Members supporting approval
with comment 2
"P" Members not supporting
approval 4
"P" Members abstaining 2
"P" Members not voting
7
"O" Members supporting approval
without comment 1
"O" Members not supporting approval 1
"O" Members abstaining
1
ISO/IEC JTC1/SC22 LETTER
BALLOT SUMMARY
SUMMARY
OF VOTING AND COMMENTS RECEIVED
Approve Disapprove Abstain Comments Not Voting
'P' Members
Australia (X) ( ) ( ) ( ) ( )
Austria ( ) (X) ( ) (X) ( )
Belgium ( ) ( ) ( ) ( ) (X)
Brazil ( ) ( ) ( ) ( ) (X)
Canada (X) ( ) ( ) (X) ( )
China ( ) ( ) ( ) ( ) (X)
Czech Republic (X) ( ) ( ) ( ) ( )
Denmark (X) ( ) ( ) (X) ( )
Egypt ( ) ( ) ( ) ( ) (X)
Finland (X) ( ) ( ) ( ) ( )
France (X) ( ) ( ) ( ) ( )
Germany ( ) ( ) (X) (X) ( )
Ireland ( ) ( ) ( ) ( ) (X)
Japan ( ) (X) ( ) (X) ( )
Netherlands ( ) (X) ( ) (X) ( )
Norway (X) ( ) ( ) ( ) ( )
Romania (X) ( ) ( ) ( ) ( )
Russian Federation (X) ( ) ( ) ( ) ( )
Slovenia (X) ( ) ( ) ( ) ( )
Sweden ( ) ( ) ( ) ( ) (X)
Switzerland (X) ( ) ( ) ( ) ( )
UK ( ) ( ) (X) (X) ( )
Ukraine (X) ( ) ( ) ( ) ( )
USA (
) (X) ( ) (X) ( )
'O' Members
Argentina ( ) ( ) ( ) ( ) ( )
Bulgaria ( ) ( ) ( ) ( ) ( )
Cuba ( ) ( ) ( ) ( ) ( )
Greece ( ) ( ) ( ) ( ) ( )
Hungary ( ) ( ) ( ) ( ) ( )
Iceland ( ) ( ) ( ) ( ) ( )
India ( ) ( ) ( ) ( ) ( )
Indonesia ( ) ( ) ( ) ( ) ( )
Israel ( ) (X) ( ) (X) ( )
Italy ( ) ( ) ( ) ( ) ( )
Korea Republic (X) ( ) ( ) ( ) ( )
New Zealand ( ) ( ) ( ) ( ) ( )
Poland ( ) ( ) ( ) ( ) ( )
Portugal ( ) ( ) (X) ( ) ( )
Singapore ( ) ( ) ( ) ( ) ( )
Thailand ( ) ( ) ( ) ( ) ( )
Turkey ( ) ( ) ( ) ( ) ( )
Yugoslavia (
) ( ) ( ) ( ) ( )
US National Body comments
AF-1
The specification of the sorting algorithm must be made independently of a
programming model.
Sorting is a process that is used in an incredible variety of circumstances
and on widely different systems, including object-oriented systems. Care
should be taken in preparing the normative specifications for CD 14651 that
they are usable independent of a particular programming model, programming
language, or environment.
A language and environment independent
model will be used as the model to respect language and environment
independence as far as possible for the API description. Three
models are considered, brought by the participants at the Québec
meeting, the editor will have to make a choice respecting all
constraints expressed, for the final CD.
In particular, the descriptions of the sorting operations should be
expressed in an abstract form, specifying IN, OUT and RETURN parameters but "without"
language binding. Also, no parameters needed for the sorting operation may be
presumed to hide in some semi-opaque state, but rather they should always be
specified explicitly in
the description of the operation.
Solved by the previous. However,
in addition, SC22 requires that an actual binding to at least
one programming language be provided normatively.
If it is desired to show how the standard might be implemented in a POSIX
environment, that could be the subject of an informative annex. Function
bindings for POSIX could assume transparent access to locale data from the
POSIX locale model, if that is desired. The annex would specify how the
proposed POSIX functions make use of the abstract operations defined in the
normative part of the standard, and how their parameters are set either
explicitly or implicitly.
Conformance to POSIX is not required.
However it was decided long ago by a majority of experts that
the specification would use POSIX LC_COLLATE specification as
a starting point to avoid reinventing the wheel in ISO work. This
is what is done.
RLG 1:
The body of the standard includes material which belongs in an informative
annex, specifically the
"Tutorial on problems solved by this standard."
Accepted. Most of it (except
really introductory material) will be moved to an informative
annex.
RLG 2:
The order specified for two Cyrillic characters (p. 95-100 of the CD)
conflicts with the order in Table 2 of ISO/R9 and other sources (cited
below).
The characters in question are these two case pairs: CYRILLIC CAPITAL
LETTER
TSHE/CYRILLIC SMALL LETTER TSHE and CYRILLIC CAPITAL LETTER DZE/CYRILLIC
SMALL
LETTER DZE.
Cyrillic letter TSHE:
In the CD, TSHE follows KA WITH HOOK and precedes EL.
In ISO/R9 and other sources,
TSHE follows TE and precedes U.
Cyrillic letter DZE:
In the CD, DZE follows KOPPA and precedes CHE.
In ISO/R9 and other sources,
DZE follows ZE and precedes I.
Other differences in the order of Cyrillic characters between the CD and
Table
2 of ISO/R9 are either
not supported by the other sources or are arbitrary.
These tables will be checked with
the Irish national body which provided the data and corrected,
probably along the lines specified by the US national body.
RLG 3:
The order of scripts on p. 31 differs slightly from the order in ISO/IEC
10646. Specifically:
- Georgian follows Cyrillic; in ISO/IEC 10646, it follows Tibetan (pDAM-6)
- Hebrew follows Arabic, in ISO/IEC 10646, it follows Armenian (and
precedes Arabic).
These differences are not
explained.
Differences will be explained
as far as possible.
RLG 4:
Hangul is positioned between Tibetan and Cherokee (i.e., consistent with the
location of Hangul Jamo in ISO/IEC 10646). There is no explanation as to
why this position was chosen, rather than that of Hangul Syllables. Since
Korean may be written with a mixture of ideographs and Hangul syllables,
the Hangul Syllables position established by pDAM-5, immediately after the
CJK Unified Ideographs,
might be preferable.
That should be according to the
explanation unless some arbitrary order remains, in which case
it will be stated too.
HP 1
The outline of the document does not follow the well defined and established
method already used in other JTC1 standards. For example, the Introduction
is too big and the reader gets lost and might decide not to continue to
read the document. Usually such information belongs to an informative
annex otherwise it becomes
normative.
The structure of the document
followsthat of many other ISO/IEC standards. Most of current introduction
material will be moved to an informative annex.
HP 2
The structure of the document has the "Scope" clause on page 11. This
clause should come immediately after a newly written short Introduction
clause.
This is not required by ISO directives
and many other ISO standards have exactly the same structure as
the current one.
In addition, this clause needs clarifications. For example, does
it describes the APIs needed by applications to specify character string
ordering? It is also not clear what is meant by the phrase "full
repertoire of ISO/IEC 10646 (independently of coding)". The part that is
not clear in the previous
statement is the one in parenthesis.
It may appear unclear to some
people indeed. Better wording is welcome.
In addition, the "Scope" clause talks about a specific default ordering but
it is not clear as to where in the CD how it was derived or how it is
related to the APIs.
This will be clarified. The term
default will also be changed to common template
(to be tailored) and explanation added on the goals to be achieved.
HP 3
The "Conformance"
clause should follow immediately the "Scope" clause.
This is not required by ISO directives
and many other ISO standards have exactly the same structure as
the current one. ISO directives do not even require a conformance
clause for a standard.
It should be combined with the "Requirements" clause. It should be rewritten
to make easy to understand how to conform without having to go through the
syntax and content complexity
of the "Requirements" clause.
Conformance is difficult to determine from the document; the document
requires a table of precisely which features are required. Moreover, the
functions levels are, in general, independent of the previous level; there
is little reason to force all features of one level before the next higher
is reached.
Requirement clause will be completely
revisited.
Post handling is informative, and has no place in
conformance.
Post handling is not informative.
When prehandling made changes for comparison purposes, the post
handling phase is there to reestablish full predictibility in
cases, for examples, of collisions due only to the modifications
done in the prehandling phase when required.
HP 4
In the clause "Tailoring Mechanism", it is not clear at all as to what an
application developers needs to do to override the default ordering that is
specified in Annex 1.
An example of tailoring will be
given using ISO/IEC 10652 specification.
HP 5
May be it would be better to have this CD become a Technical Report rather
than a standard since it allows users to override the default ordering
proposed and there might be more users overriding the default, with an
undefined and nowhere described
mechanism, than what the CD proposes.
Not accepted. NP defined an IS
to be produced as the result of this work and the NP was approved
according to directives.
HP 6
Dependency on an unpublished standard 14652, Cultural Conventions
Specification is too high. Currently, 14652 is still in the CD stage as
mentioned in clause 2,
Normative References, of this CD (14651).
In summary, there is a lot of structural and technical fine tuning that is
necessary to make this document complete. If such an effort takes too much
time may be the industry could be served better if the proposal is modified
for publication as a TR rather an ISO standard. This work can be later
converted to an ISO publication when CD 14652, Cultural Conventions
Specification, is accepted
and is published as an ISO standard.
The goal is to synchronize ISO/IEC
14652 publication with ISO/IEC 14651 final publication. In the
meanwhile it is believed that ISO/IEC 14652 is already pretty
stable as most of it is an excerpt of ISO/IEC 9945-2 so that the
specification used in ISO/IEC 14651 should be independent of POSIX
conformance.
TG 1
The organization and nomenclature (e.g. COMPCAR) in unnecessarily obscure.
Names should be spelled
out completely for clarity.
Not accepted. Certain programming
languages have binding restriction on name lengths. Spelling function
names out is therefore typically not language-idependent in the
context of this standard.
TG 2
The requirement that the original string be recoverable is unnecessary; many
applications, such as databases, will have a sort key be an alternate field
in the record. They may only need to have a level 1 sort for their
application. In that case, storing the original string twice or requiring
internal structure that enables reconstruction is unnecessary and only
increases storage to no
purpose.
Accepted in principle. The requirement
was removed. Text will be checked for residual statements implying
such a requirement. If some are found they will be removed.
TG 3
Use of NBSP is in practice an unacceptable overload of its primary function.
Being able to functionally tailor just space and nbsp is in practice not
useful; in general a whole host of similar characters, punctuation and
symbols, behave the same
way.
The function parameters dealing
with this will be removed. Tailorability will always be possible
in the data though. The template will use NBSP according to Canadian
standards and CEN preliminary specifications with a possibility
of easy toggling in addition to full tailorability of the tables.
TG 4
The algorithm for comparison must be stated in terms of results, NOT a
specific mechanism.
Conformance clause will be revisited
to try to maximize consensus as per the Québec meeting
results.
TG 5
The format in Annex 1 is unnecessarily complex. It is impossible to assess
and recommend this standard where we cannot clearly determine the result
of the default sorting order rules in this annex. It forces use of a
whole separate notation for characters. To correct this, characters must
always be referred to by their full 10646 name for clarity, rather than
arbitrary notations such as AYEHS, AIGUT, POINN, QARNP, or many other
examples. Script names
should always be the 10646 block name.
This notation does not apply to
characters but to internal weights used for ordering at any given
level of precision. Character names are not standard-version-independent
while UCS identifiers are and SC22 has a requirement on unique
identifiers. Furthermore script names do not necessarily coincide
with blocks in ISO/IEC 10646 (the best example being the special
characters). The syntax is the syntax agreed by the group to build
on POSIX specifications without requiring POSIX conformance.
TG 6
The equivalencies of composed characters vs. composite character sequences;
e.g. a + umlaut and a-umlaut
can be stated much more succinctly.
This is controversial matter in
ISO and consensus can not be reached on equivalencing. However
it is possible to give an example of tailoring that will allow
to do so. This will be done.
TG 7
The relative ordering of characters cannot be determined from the character
lists, since they are not
even remotely in the resulting order.
Characters are currently presented
in the vast majority of cases in the exactly intended resulting
order. This is the case for special characters, and this is the
case for characters part of the scripts of the world specified
in this standard.
To correct this, the ordering of characters within a script must be presented in the
resulting order as much
as possible. Example:
<U0000> IGNORE;IGNORE;IGNORE;<U0000> % NULL
<U2400> IGNORE;IGNORE;IGNORE;<U2400> % SYMBOL FOR NULL
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING
<U2401> IGNORE;IGNORE;IGNORE;<U2401> % SYMBOL FOR START OF HEADING
<U0002> IGNORE;IGNORE;IGNORE;<U0002> % START OF TEXT
<U2402> IGNORE;IGNORE;IGNORE;<U2402> % SYMBOL FOR START OF TEXT
<U0003> IGNORE;IGNORE;IGNORE;<U0003> % END OF TEXT
<U2403> IGNORE;IGNORE;IGNORE;<U2403> % SYMBOL FOR END OF TEXT
...
The fourth column (in this case) determines the final ordering of the
characters, which is NOT
the order presented. It must be presented as:
<U0000> IGNORE;IGNORE;IGNORE;<U0000> % NULL
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING
<U0002> IGNORE;IGNORE;IGNORE;<U0002> % START OF TEXT
<U0003> IGNORE;IGNORE;IGNORE;<U0003> % END OF TEXT
...
<U2400> IGNORE;IGNORE;IGNORE;<U2400> % SYMBOL FOR NULL
<U2401> IGNORE;IGNORE;IGNORE;<U2401> % SYMBOL FOR START OF HEADING
<U2402> IGNORE;IGNORE;IGNORE;<U2402> % SYMBOL FOR START OF TEXT
<U2403> IGNORE;IGNORE;IGNORE;<U2403>
% SYMBOL FOR END OF TEXT
No corrcetion is required. The
order is not the order of ISO/IEC 10646. It is totally decoupled
from the coding tables.
TG 8
The Annex also does not make clear that the vast majority of its characters
are sorted in character code order. This requires the reader to visually
inspect every line to no purpose. These should be replaced one statement;
"Except where otherwise
noted, all symbols are sorted as:
<Uxxxx> IGNORE;IGNORE;IGNORE;<Uxxxx>"
The vast majority of characters
are not sorted in character code order. This is totally decoupled
and any resemblance is purely coincidental due to the decoupling
of ordering from coding.
TG 9
Annex 2
List #1 is superfluous. The statement should be that the words in List#2 in
any initial order, when
sorted will result in List #2.
The specific normative input is
deliberately designed to catch some implementation problems and
quickly detect non-conformance. Otherwise it would possible to
realy choose an initial order or even several initial orders that
will accidentally produce the required order. Of course other
orders might also give the same results and other lists can be
part of additional private test requirements.
The Netherlands' National
Body comments
The Netherlands vote negative on CD 14651. To turn our vote to positive
modifications shall be made in accordance with our comments. We reserve
our final position regarding
the CD until we have seen the Final CD.
Technical comments:
1. Remove Annex 1 and
all references to an International Default Order.
Annex 1 is and essential normative
part to this standard as per international consensus.
-- SC22 has no expertise in this field, and cannot check for correctness
Most NBs in SC22 are not able to check whether a proposed ordering
for a certain unfamiliar script is in agreement to actual practice
far from home. Those NBs that are familiar are not represented in
SC22, nor have been
asked for comment.
All NBs in SC22 have been asked on different occasions for input as well as countries not represented in ISO, and much data has been gathered on a lot of different scripts from many different sources, from multinaltional companies to minority groups to official standards bodies, before making up the tables. SC22 has expertise in this field, WG20 is a working group where this expertise exists. It can not deal with the 6000 languages of the world at once but more specifications will be added over time in the template if this data becomes available for more scripts. Ultimately JTC1 will vote on this standard and that will even widen the public review to more national bodies. Moreover, tailoring can be done for very vernacular languages.
-- Default order is an instrument of cultural imperialism.
In several countries more than one ordering rule is in use without
any agreed preference. Calling one of these the "default" is
imposing an extraneous pressure, and will involve interference with
national habits.
-- No need for a default.
The term default
will be replaced by common template. It shall be
tailorable, which is the best way to deal with this concern
constructively.
No country uses always all characters from 10646. They should not be
burdened with unwanted features. A method for supplying ordering
information for a given restricted character set to an API should be
contained in 14651
itself, without reference to 14652.
The scope says that the standard
is applicable to subsets as small as ISO/IEC 8859 parts. With
a binding to actual restricted coding, this is achieved.
2. Remove all references
to 14652.
-- Needless complexity should be avoided.
An ISO standard should be as independent as possible of other ISO
standards. If ordering information can only be supplied by way
of a complete set of cultural conventions, as specified in 14652,
there is involved an enormous overhead, and an obligation to NBs of
also having to specify non-ordering information which is irrelevant
to 14651, but nevertheless
required in this CD.
Specification of LOCALE categories
beyond ordering (e.g. LC_MONETARY) is not required to conform
to ISO/IEC 14652. The latter is for most of it an excerpt of ISO/IEC
9945-2 to make sure that the specification does not require conformance
to POSIX. Hence the overhead is considerably less than what the
NNB believed.
Editorial comments:
The text of this document leaves much to be desired regarding
precision of definition, clarity of presentation and conformance to
ISO directives part-3.
The NNI cannot give detailed comments here, nor offer replacement text as
doing so would require rewriting more than half of the document for which
we have no resources available. The NNI already gave some directions with
its vote on CD-registration,
but found almost no improvement in this CD.
That is purely ITTF matter. ITTF will correct any style not to the height of its publishing standards as it always does. Furthermore there is not even a requirement for a conformance clause in ISO directives part 3.
Austrian National Body
comments
ON (the Austrian NB) votes NO on CD Ballot SC22 N2364
(CD 14651 - Information technology - International String
Ordering - Method for Comparing Character Strings and
Description of a Default Tailorable Ordering) with the
following comments:
(1) It seems doubtful (to say the least) that a reasonable
Default Ordering for all -- or even most -- of the languages
of the world can be found. Consequently, there is reason to
doubt the usefulness of
the proposed International Standard.
Correct. There is no worldwide-recognized
ordering that can be used without tailoring. The common template
(term default will be changed) may require and allows tailoring
to local needs.
(2) The "Tutorial" contained in the Introduction should be
moved to an informative annex; it should not remain in the
main part of the document which would have to be considered
normative.
Accepted.
(3) Even though there is a "Tutorial", the proposed methods
do not seem to be well explained. It could at least be
expected that one should be able to read and understand the
tables in Annex 1 without
having to consult other sources.
ISO/IEC CD 14652 is a normative
reference and is an important complement to ISO/IEC 14651. Although
most of the syntax comes directly from POSIX standard ISO/IEC
9945-2, it was made a separate standard so that POSIX conformance
should not be required to implement the ordering standard, which
also has some extra features.
For an example, see page 51 where a rather poor comment, in
itself encoded, supposedly explains the structure of the
following tables by cryptically stating:
"% <Uxxxx> <Base>;<Accent>;<Case>;<Special>"
The sudden change of typeface on the same page seems equally
confusing und unmotivated
(except possibly by line length).
Typeface change was a bug with
the printing software. These problems will be corrected before
final publication by ITTF if some incidentally remain at this
point.
Also, it seems that a more detailed description of a
possible practical implementation
could prove helpful.
The tutorial gives a kengthy explanation
of an actual implementation. This explanation, though, is considered
already very long by some national bodies and will be moved to
an informative annex.
(4) The "Benchmark" in Annex 2 adds to the general confusion
by showing the "sorted" version to be (in excerpt):
"vice-president's"
"offices"
"vice-presidents'"
"offices"
The problem obviously lies in automatic line breaks and can
easily be corrected, but seems to raise the question whether
similar errors have been introduced in areas which are very
difficult -- if not impossible -- to check. To mention the
most prominent example, some errors in Annex 1 might never
be found because this part of the document can hardly be
checked exhaustively.
Line break problems' cause is
known and apparently a reliable solution has been found. This
will be corrected to satisaction.
(5) It is rather difficult to determine the necessity of
text that is not present. ON does therefore not feel able
to decide on Annexes F,
G, and H.
(6) The document has obviously been translated from French
to English, which would not be a problem if the process had
been completed. For a counterexample see the description
of procedures chbin1 and chbin2 on page 18. Also, the name
of procedure sign_espace (on page 19) seems to be partially
French.
Partly translated texts will be
fixed. Programming symbols are symbols though and are not French
nor English per se. « Sign » is an English
word while « espace » is a French word, but
« sign_espace » is neither French nor English.
(7) The document does not appear to have been spell-checked.
Some examples:
p. 19: "precedenceof" should be "precedence of"
p.109: "deafult" should be "default"
p.114: "standaredized"
should be "standardized"
That will be fixed in the next
version. There was no English spell-checker available to the editor
at time of releasing the CD. This is unfortunate indeed but it
is the pragmatic inconvenient of using national versions of proprietary
word processing software (which only provides by default a dictionary
in the user's language) as allowed by ISO.
(8) Anticipating the answer that ON experts should actively
participate in the process of correction and development of
the document in question, ON states that expert resources
in this area are too limited at this time. However, this
does not imply that any
document can be accepted. Sorry.
We take note.
UK National Body comments
> The UK ABSTAINS on this ballot, due to lack of participation in this area.
> The UK would however like to bring the following issues to the attention of
> SC 22 :
>
> - a tutorial on problems solved is inappropriate for an IS; either the
> document should be
a TR or the tutorial moved to an appendix.
Accepted. This will be moved to
an annex.
> - the statement on page 10 about information being obtainable from
> Alain LaBonte' is
also inappropriate for a formal document.
No such statement is made. This
is a reference indicating a publication title, a publisher, an
author, and ISBN number for a publically available document. This
document is also freely available (mirrored) on the WWW on at
least four different sites in France and in Canada. A search with
any major global search engine will allow to retrieve it easily
online. The documement also exist on paper and ISBN reference
is sufficient to retrieve it at least in the official national
library where this document is registered as per international
conventions.
> There are also a number of minor points:
>
> - there are a disturbingly high number of elementary typographical
> errors (e.g. p 18 'starings' (strings); 'compariosn', 'aat'; also mixed
> languages in chbin1, chbin2 heading). On page 19 there are French
> quotation marks rather than
English ones.
That will be fixed in the next
version. There was no English spell-checker available to the editor
at time of releasing the CD. This is unfortunate indeed. Furthermore
in the French version of WinWord, it was not easily possible to
enter English quotes because of unfortunate automatisms in Word.
Please accept our apologies for this editorial problem.
> - p 25 there is a
reference to section 5.8, which does not exist.
That will be fixed and reference
will be made to ISO/IEC 14652 instead, which took back the info
that was located in this section in an earlier draft.
> - subprogramme is consistently spelled thus, although `subprogram' is
> the correct form in both US and UK (don't know about Canada, Australia
> etc).
Accepted.
Japanese National Body
comments
Japan disapproves CD 14651
proposed in SC22 N2364.
The CD is not mature enough to proceed to DIS from view point of
completeness as a JTC1 standard as follows.
- not precise enough tuned yet from technical view point,
- still not reaching a consensus on the expected ordering result.
- high dependency on ISO/IEC 14652 which is not in CD stage. and
- style of the document
does not meet the JTC1 requirement
Therefore, because of high dependency of this CD on ISO/IEC 14652, Japan
requests to wait and synchronize the review and ballot of CD 14651 until
CD 14652 is registered, or to change the scope of the standard to
"ordering result"
only and move API part to i18n API project.
Thus, Japan sees absolutely
no reason why we need to proceed to DIS now.
Comment detail.
1. Style (major editorial)
The CD is very different from the what ISO/JTC1 directive requires, (and
also different from the template provided by ITTF and many of JTC1
standards) For example, there are very high dependency on font selection
(usage of bold, slant, point size variation and/or unnecessary type face
mixture. are prohibited).
This is pure ITTF editorial matter
and will be ultimately corrected before final publication.
The Definition clause need to have sub-clause
for each terms, two groups of annex --one for normative and another for
informative. Review and rewrite all text according to ISO/JTC1 directive
and template supplied by
ITTF.
Not accepted. Different ISO/IEC
JTC1 directives are in conflict (necessity to have alphabetical
order in both French and English version and if numbering is used
the same numbers should be used in both versions, not necessarily
in sequence [in sequence of the original language - furthermore
we have seen cases where ITTF itself numbers definitions differently
in Englsih and French versions, breaking its own structural equivalence
rules, without anybody complaining although this is unfortunate.
If ITTF wants to take on itself this anomaly at time of final
publication, then be it so, but the editor will not do it at this
point]). This is felt unnecessary to introduce numbering as some
English speakers indicated that will not accept
to have an alphabetic sequence not in numerical clause order.
2. Relation with ISO/IEC 14652. (General process)
The syntax and semantics of Annex 1 are not defined in this draft and are
depending on ISO/IEC 14652 which is not available yet. Synchronize the
project with ISO/IEC 14652 development -- wait for decision until CD 14652
is available at least, or, if it is not accepted, move related part of
the ISO/IEC 14652 into
this CD..
Final publication will be synchronized.
However most syntax and semantics of ISO/IEC 14652 is a pure excerpt
of ISO/IEC 9945-2 so that conformace to POSIX should not be a
requirement. This syntax and semantics is widely known in SC22,
except some minor additions required for multiscript ordering
and tailoring.
3. Tutorial (major editorial)
Heavy tutorial clause at the beginning is not a thing to do, move them to
appropriate place and rewrite them to fit the new place. In addition,
there are many "information only" text in main clauses (such as clause
5.3). Remove them out from main (and mostly normative) part of the
standard, and place them (if really necessary) to appropriate related
place(s).
Tutorial will be moved to its
original location in the first drafts, i.e. in an informative
annex.
4. Scope (major technical)
Describe what are this standard defines clearly and straight forward way.
For example, change the word "a method" to much clear specific word (which
is API).
Accepted.
Once above change is made, it may affect on the title of the
standard. Also the word "Default Tailorable Ordering" does not have
logical meaning. One possibility of the new title would be "API with
default order for International
string ordering".
Title will be changed to something
else.
Last part of 2nd bullet (on an order which is culturally---of that script)
should be removed because "order which is acceptable culturally" is not a
scope of this standard. This part should be re-written something like
"The default order is aiming for easy understanding of non-casual user of
the script, cultural correctness/acceptance is not a purpose of the
default order. The correctness/acceptance by the casual (or native) user
to be provided by tailoring
by the user or as a country profile".
Rationale: Above has been an agreement on the project scope from the
beginning. There were many discussions of impracticalness of having a
single default order which may satisfy all of cultures. The conclusion
has been it is not practical to have such an ideal default order, and it
was said that "this is why tailoring is needed". Japan, then, did not
request culturally correctness
for ordering.
To satisfy Japan, the text will
be modified to respect the spirit of document WG20/N526 prepared
by the Japanese expert present, which was welcome.
Same story for French, since
French ordering is so sophisticated no outsider understand it easily,
therefore, it is not practical to use true French order as international
default order, it may causes mis-understanding of peoples of other
cultures. Such sophisticated ordering (such as French) can be
satisfactorily supported by tailoring anyway. (See clause 4.2.7 of DTR
11017, This IS is not
i18n per 4.2.6 nor 4.2.4. This IS is aiming 4.2.7)
All this will be tailorable from
the common template and by agreement with CEN, different toggles
will be offered to simplify the users' task.
5. Definitions (major technical)
5-1, Each definitions should
have separated sub-clause number.
Not accepted for reasons explained
above.
5-2. API: Initial text of "for purpose of..... standard" is not
necessary.
Accepted.
5-3. equivalence: Too much, make it almost 1/3 by eliminating
"informative"
texts with in this definition. (for example: last 4 lines)
Accepted.
5-4. field, first order talken, fourth order talken level, level, second
order talken, transformation, third order talken: Eliminate "informative"
explanations.
« Level » will be removed from « n order token level ».
Most experts present felt that
informative text was useful and will be preserved except when
superfluous.
5-5. posthandling, prehandling : Those definition should be moved to the
related clause.
Not accepted.
5-6 telephone-book-type transformation: This term need not be defined
in Definitions because it appears only once in Introduction (5th para.,
Page 5). Although Japan considers that the paragraph is understandable
in itself, we propose to change the first sentence to:
More generally, specific
requirements exist for a kind of complex
transformation
-- e.g. phonetic transformation adopted in some telephone-book systems
because telephone-book ordering means differ from culture to culture, so,
this wording may confuse
the user.
Definition for « telephone-book-type transformation » will be removed.
New text for « transformation »
will be drafted.
6. Conformance (major technical)
6-1. Conformance clause(s) should come after the scope clause it should
not be after the requirements clause. The location of the conformance
clause is inviting difficulty of understanding of each conformance levels
clearly.
Reason (rationale) why conformance clause should be clause 2:
If requirement is simple and no leveling are employed, the conformance
clause can be any place in theory. Note that ISO/IEC directive part-3
does not require "conformance clause" even. However, in case of ISO/IEC
CD 14651, the condition is different, it should be clause 2.
Since 14651 is a very complicated multilevel standard. the scope clause
can not cover all what "scope' clause should say. The conformance, in
particular, the clean and clear "levels" descriptions are acting, in
reality, as a sub-scope clauses as well as real conformance descriptions.
If it does not come after "scope" clause, it is almost impossible for the
user of the standard to understand "what are defined in this standard and
how to read the standard
efficiently and accurately".
Not accepted. Editorial as per
ISO/IEC JTC1 Directives. Other standards use a totally different
approach and some comments have been recently received in SC18/WG9
for another standard saying exactly the opposite of what is written
here and asking that the conformance clause be placed as the last
clause of the standard instead of after the scope statement like
requested here, on procedural grounds (grounds which do not exist
in any case).
6-2, Conformance clause should have exact pointer(s) for the conformance
requirement (clause and sub-clause numbers). Umbrella conformance for
buried requirements with in main clauses (like this CD) should not be
used. (Current CD is too unkindly for reader)
6-3. In case of leveled conformance, provide a sub-clause to explain
what those levels are much straight way. (Too many indirect explanation
now).
6-3-1. Conformance level-1 should be defined as "Generic API only. And
should not make some of the parameters as "option". The option causes
in-compatibility problems between conforming level-1 APIs. Further
define two options (not parameter option s), one for COMPCAR and another
for COMPBIN + CARABIN.
6-3-2 Conformance level-2 should be defined and stated as "Generic API
and table format"
6-3-3 Conformance level-3: Change prehandling to requirement for string
input as normative. Thus prehandling is out of scope of this standard
(remove 5.1.2 at least). Then, change the description of this
conformance level accordantly.
Conformance clause will be limited
to two conformance levels and be made much more simple and friendly
as agreed by experts present in Québec after a long discussion.
By the way, in current text, normative
clause (5.1.2) is reefers
informative annex. This is prohibited practice.
It only says that for an example
to see the informative annex, there is no prescriptive statement.
6-3-4 Conformance level-4. Remove the word "possibility". then
resultant might be "Add API an access method for specific table.
6-4. Add a concept of
conformance for "ordering result only"
Conformance clause rewritten.
6-5 Add a method to specify partial conformance of ordering result, for
example, a method to state "every thing but Japanese repertoire are
conforming this default order and Japanese repertoire are per JIS" would
be a real life use of this standard. (as one of sub-set of the ordering
result only conformance)
It is already explicitly allowed
to use subsets by tailoring. This method is therefore not necessary
as the requirement is already satisfied..
6-6, Add a method to swap
the order of the scripts, but still the orders within each scripts
are conforming default order.
That possibility is already offered
by ISO/IEC 14652 tailoring statements.
6-7, Add a method to state only selected scripts in comment 6-6 are
conforming the default
order.
This can easily be done by tailoring
(restricted CHARMAP or restricted binding to a subset of the repertoire).
6-8, Maintain compatibility with POSIX and C. Providing independent
conformance level may be
one of the choice to respond for this comment. .
After discussion, it has been
established that POSIX allows creation of extra syntax if required.
So compatibility to POSIX is maintained.
6-9, Remove all of "best
guess" dependency. Write exactly what is needed. For example,
there is no description what "default order" is. There
is default table and API (and conformance levels), so best guess
may be use the "default table" with the APIs.
The notion of default will be
changed in line with Japanese expert input. New conformance clause
will also simplify the standard a lot.
7. Requirements (major technical)
7-1. There are many options in one conformance level, those should be
another levels of conformance if those are really necessary.
7-2. The "Toggle" mechanism, which is realized by parameters
"order_accent", "order_case" and "sign_escape", should be removed
because:
1) it contradicts with the concept of the locale mechanism -- it allows an
ordering regardless of the ordering table defined as a locale,
2) the concepts of "case" and "accents" are specific to some scripts and
they are not defined in this draft where these script-dependent concepts
have been resolved into universal rules in tables.
Instead of the current "Toggle" mechanism, Japan proposes to reconsider
the specification of ordering tables, which will be defined in ISO/IEC
14652, so as to enable variants of the default table be defined more
flexibly -- for example, by introducing som e preprocessing elements
#define ...
#ifdef ...
#include ...
etc.
See disposition of Canadian and
Danish comments.
7-3. table
To specify a name of an ordering table in COMPCAR and CARABIN as a
parameter "table" will put a heavy burden on implementations. At runtime
the processes COMPCAR and CARABIN should check every time whenever the
table is changed from that of the previous call and/or the table should
be compiled.
There are two alternatives to this problem:
1) to remove the parameter "table" from the two processes and define a
new process "set_collating_table" which has a parameter "table",
2) to define a new process "open_table" which has an input parameter
"table" and returns a pointer to a protected structure derived from that
"table" while the parameter "table" in the two process is changed to
"table_pointer"
Agreed in principle. Programming-language-specific data types such as pointer will be avoided though.
.
7-4 "chbin1" and "chbin2" in COMPCAR are not necessary. Further more,
options within an API specification
does not make any sense at all.
Accepted.
7-5. The whole contents of 5.3 should be removed or put into an
informative annex because those contents are to be defined in ISO/IEC
14652 in the current framework.
Accepted.
7-6. Add text for the case where characters are not encoded in ISO/IEC
10646. Some character set, e.g. ISO 6937 are not in ISO/IEC 10646, and
some do not have conversion table (or same character names) with ISO/IEC
10646 (yet).
There must be a binding between
ISO/IEC 10646 and actual coding in some way as specified in a
note after 1st paragraph of the requirements. Also CHARMAPS play
this rôle as specified in ISO/IEC 14652.
8. Data table (such as Annex A) (major technical)
8-1. Japan confirms a principle of default order table as:
- The default order is non-native user friendly (easy to understand,
simple rule, less exceptions)
- Cultural correctness for the native user of the script should be done
by tailoring. APIs and data format should have enough room for the
necessary tailoring.
- Therefore, cultural correctness of the default order is not a goal of
this standard.
This remains a goal as far as
possible. Term Default changed to Common
template.
Based on the principle above, Japanese proposal on
Japanese scripts are not correct for Japanese view, however, it is easy
for the people who are not familiar with Japanese scripts.
8-2 Collation for HIRAGANA and KATAKANA
Japan proposes to add a set of collating rules for HIRAGANA and KATAKANA
attached..
Accepted.
The order defined in Attachment is different from one defined in JIS X
4061 which was published in February 1997. The main differences in
handling of a prolonged sound mark <U30FC>. Roughly speaking, JIS X 4061
replaces the prolonged sound mark with the vowel of the most recent
letter, while Attachment neglects the prolonged sound mark at first in
the same way as a hyphen.
The second difference is handling of the iteration marks <U309D>, <U309E>,
<U30FD>, <U30FE>. Roughly speaking, JIS X 4061 replaces the iteration
marks with the most recent KANA letter, while Attachment handles the
iteration marks as they
are.
The reasons for proposing Attachment are as follows:
1) JIS X 4061 cannot be realized by LC_COLLATE representation
unless some rules using regular expression, which will put a heavy
burden on implementations,
are introduced,
2) ordering results of JIS X 4061 are hard to understand for
foreigners without knowledge of how letter sequences are
pronounced -- it is not
cross-culture friendly,
3) ordering results of Attachment are easy to understand for
foreigners without knowledge of pronunciation of letter sequences
and even in Japan, a number of encyclopedia order their items in
the same way as Attachment
does -- it is cross-culture friendly,
Reason accepted. SC22/WG20 is
also pleased that the scheme presented by Japan is also an actual
scheme used in Japan in some actual context.
8-3 Consideration on Compatibility characters of ISO/IEC 10646.
Consideration on the compatibility characters are missing. At least,
following are needed. 8-3-1 UFF00-FF9F, FFE0-FFE8
Handle those characters as same as equivalent characters in A-zone.
8-3-2 F900-FA0D, FA10, FA12, FA15-FA1E, FA20, FA22, FA25, FA26,
FA2A-FA2D of ISO/IEC 10646-1 Handle those characters as same as
equivalent characters in I-zone.
8-4 FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27-FA29 of
ISO/IEC 10646-1 and future addition of CJK ideographs (ext-A and B).
Merge them with I-zone characters with defined rule. Provide informative
annex which describe the rule (radical, number of the stroke and so
on.....)
8-5 Character combination type symbols.
For those characters which are made up combination of two or more
Japanese characters such as 3300-336F, Handle those as if those are
string of independent characters.
Accepted. Will be implemented
if input expected from Japanese National Body is received for
June 15, 1997.
8-6. Symbols of character(s) and symbol(s)
Symbols with character(s) should be handled one of following methods.
a) Character(s) and symbol(s) like "short form" of normal writing such as
2480 which is looked like "( 13 )". Split the symbol as if it is a
normal string.
b) Character(s) and symbols can not split into one unambiguous sequence
such as 2470 which the
circle can be either before or after character 17.
We will try to make sure that
symbols used as numeric characters not be inconsistent with other
classification.
Handle as if it is a special form of the character(s) part of the symbol.
8-7. Symbols for making combining sequence such as 20E0.
Follow the rule proposed at 8-6 above, the process might be different
from the method for combining sequences.
8-8. Japan expect many countries have same kinds of comments above.
Japan request, therefore, confirmation of specific to the data table to
be circulated to all JTC1 member countries (not only SC22 p-member) for
review.
Review beyond SC22 will be mandatorily
be made at DIS ballot.
9. Other comments
Japan recognizes many editorial issues as well as technical issues which
are not on this ballot comment, too many major technical comments (and
may be more to expect) does not give us a time to scan all of them.
Japan thinks the minor editorial comment are unnecessary components of
this ballot comments because of un-matureness of the CD 14651.
Anyway, the text should be rewritten totally for full acceptance of the
technical comments.
We take note.
Danish National Body comments
The Danish ballot is: Yes,
with general and technical comments
The comments are directed towards the english version of the text,
although the same comments
can be done wrt. the French text.
1. The overall technical contents of CD 14652 is sound, and as agreed
by the working group, and
thus we can accept the document as a CD.
General comments:
2. There is too much emphasis on the "binary sorting string" concept.
The concept of just comparing two strings should be catered for
overall in the document. Some places only sorting on binary prepared
strings are possible, to reach the functionality. Also there should be ample
warnings a number of places on the binary sorting string concept, as it
is culturally dependent, that is it is dependent on the sorting specification
used to produce the binary representation. Storing data in the precompiled
binary string representation should thus be recommended only for monocultural
environments, and that is actually environments that we should advise against,
having internationalization
as our goal.
Some text will be added to reduce
emphasis as far as possible.
3. Formal description language, such as ISO 11404 or IDL of ISO 13788 (PCTE)
should be used in the specification of the APIs. The description of
the APIs lack a number of specifications now, including description of the
types of the parameters, and specifications of how to bind to programming
languages, that are inherent in the 11404 and 13788 specification languages.
We are willing to help rewriting the API sepcifications in light of this
comment.
An improved formal method not
refering to external standards and easy to undertand while remaining
general and maximally language-independent will be used as agreed
by experts present at the Québec meeting.
4. We recommend that a thin binding method be used, as demonstrated
in other API papers of WG20. We can provide text for this, in conjunction
with text to address the
problems mentioned in comment 3.
Denmark to provide text before
June 15 1997.
5. The APIs have 3 parameters, that should not occur in the API, because
all localisation should be done via the locale. These are the parameters
order_accents, order_case
and sign_espace of the COMPCAR and CARABIN functions.
Accepted.
6. The LC_COLLATE specification in 14652 format should be readily useable
and referenceable, without need for retailoring. The different options,
as expressed by the parameters of the 3 parameters in our comment 5, should
be available as different LC_COLLATE specifications each with a well-defined
name.
A single common template will
be developed which will be enabled for easily using single line
toggles for each parameter in any source LOCALE using the template
as a model.
7. The definitions in section 3 should be numbered and not ordered
alfabetically (in either
English or French).
Not accepted. According to ISO
directives, alphabetical order is necessary. Numbering will not
be done unless ITTF decides to do so at final publishing stage
for reasons given previously in the current document.
8. The definitions are too centered about a precompiled sorting string
concept. Terminology should also be applicable to comparisons on the
string encoding. Terms that should be useable with plain string comparisons
include: equivalence, ordering
key, ordering subkey.
Emphasis ont this will be reduced
as far as possible.
9. The technical specifications should be aligned with 14652. especially
hexadecimal symbolix ellipses
"..".
Accepted.
10. The names of the APIs
should be less French-oriented.
Not accepted.
11. The tables should use names established from the POSIX locale
work, such as ISO/IEC 9945-2 annex G names or 14652 names from the
repertoiremap, especially
when not using <Uxxxx> names.
All character names use
standardized UCS identifiers. Other symbols for weights are internal
to the standard and are never referenced outside those tables.
This is editorial matter and is not considered a priority.
12. A number of scripts have not been ordered properly, such as hiragana and
katakana and thai.
Kanas have been defined as an
outcome of Japanese comments. Other scripts' reserved order does
not harm and maybe useful for future development .
13. A reversability function from binary sort strings to character strings
seems to be missing.
Comment withdrawn by Denmark.
14. There are some spelling errors, and we suggest a spell-checker be used
for production of further
documents.
Accepted.
Technical comments:
15. page 5: first paragraph: It is not always required to transform, for example
"4" into a number of strings, sometimes it is only necessary to transform
it into one string. Thus change "requires" to "may require" and "is hence"
to "may thus be".
Accepted.
16. Page 5, last paragraph and following prargraphs: Too much emphasis on the
precompiled sorted character
sting data type. This is
not a general type as noted in our comment 2.
Text will be added to say reduce
emphasis as far as possible.
17. Page 8, Add after "Scandinavian" "and several other". This incudes languages
like Polish, Finnish, Hungarian,
Turkish, and many others.
Accepted.
18. Page 14: "subprogramme" - rather use the word "function". All APIs in this
standard are functions. All references to "subprogrammess" should be
changed to "functions"
in the standard.
References to subprogrammes
will be changed to procedures to be more language independent.
19. page 15, first paragraph: we recommend that only uppercase characters be
used in hexadecimal numbers,
and this is also the specification in CD 14652.
Accepted.
20. Page 15, last paragraph: it seems like it is a requirement that a LC_COLLATE
specification, like the default, can be tailored on the fly. This is not
recommendable, as it would take quite some processing time, and thus delay
the processing considerable. On the fly tailoring should thus not be a
requirement.
Term invoke changed
for use. Term using changed to built.
21. Page 16, 5.1.1 last paragraph: use the name of the API (COMPCAR)
instead of the number "API
1".
Accepted.
22. Page 17: last paragraph: the names of the functions should be used for
the binding. Of cause the names of the functions may vary for the
different programming languages, but the names are more than "only
indicative".
Restrictions exist for programming
language procedure names. Therefore if this comment is taken into
account short names as used in the current standard are advisable.
23. Page 20: The COMPCAR function seems to miss a result value on
whether the first string was lexiographically less, equal or greater
than the second string. We propose the values -1, 0 and 1 for the three
possiblities, in line with current C practice. Also return values seems to
be missing for the other
functions.
Current C practice is language-specific.
However binding to C should be able to use such a result as a
returned function value.
24. Page 21: It should not be normatively required that COMPCAR be equivalent
to CARABIN and COMPBIN. CARABIN produces output that is not necessary for
some use of COMPCAR.
The binary output strings will
be removed as a result of this function.
25. Page 21, last paragraph: It should not be prescribed that there be
binary strings used for comparisons, in the COMPCAR function. Also the
"default" table mentioned here is the global locale, and not the 14651 default.
This should be clarified,
maybe using "global" instead of "default".
Accepted.
26. Page 22: all parameters should be spelled out, and references to other
APIs when defining the
parameters should be avoided.
Parameter explanations will be
repeated instead of refering to other procedures' parameter descriptions.
27. Page 25 second paragraph: the default table cannot be used per se, as it
needs tailoring. See our
comment 6 on how to solve this.
Tailoring facilities will be mandated.
28. Page 27, first paragraph: this description is very oriented towards
the binary sort string. Descriptions also valid for COMPCAR method
without binary sort strings should be present. We would request a separate
descripti on how COMPCAR can be implemented, especially pointing out that only
comparison of the first (few) characters are necessary in many cases, and
that generating binary
sort strings is typically not necessary.
Emphasis will be reduced as far
as possible.
29. Page 27: level 1: Some non-letters, for example Kana, may have more than
one character at the first
level.
Term Normally
changed to Generally. Terms each character changed
to basic characters.
30. Page 27: note of 5.3.2.1: Combining accents may have ignore at level 1,
and then values at level
2. Should that not lead to full predictability?
If meaningful results are to be
achieved with an implementation which processes senseful equivalences,
there is then indeed a loss of result predictibility unless an
extra level is added (which makes processing more complex and
less economical for little adavantage). Otherwise, predictability
can be achieved but it can at the limit make no sense as unrelated
objects are then compared.
31. Page 29: level one:
Use the API names instead of "SUBPROGRAMME"
Accepted.
32. Page 29: what is the difference between level 2 and 4? In traditional
locale invocation there is not that difference, but some other
difference. Maybe level
4 should always be required.
Conformance will be considerably
simplified and number of conformance levels reduced.
33. Page 31: COLL_WEIGHT_MAX
is not a directive of 14652.
We take note.
34. Page 31: Some scripts are not (yet) in IS 10646, for example the Yi and
Canadian syllable scripts.
This is true. There is no harm.
35. Page 31: We should assure that comments are allowable all the places used
here according to 14652,
and possibly change 14652 to allow them.
Good remark to be done to ISO/IEC
14652 if this is lacking in the latter.
36. Page 41-51: a number of the symbols defined here are also defined later.
Example <a8> defined on page 46 and page 79. This is not allowed according
to 14652 (giving a symbol
two weights).
This will be fixed.
37. Page 111: (4) There needs to be a strong warning that binary strings stored
cannot be used internationally for culturally correct sorting,
as they are stored in a
localized form. Or we should simply advise against it.
Accepted.
38. Page 112: the text
seems obsolete, as these concepts have been proven.
Current Annex A will be removed.
39. Page 115: Also list ISO/IEC 9945-2 POSIX shell and utilities, especially
annex G, as a source.
Accepted.
40. Page 118, paragraph
7: There is only a need for 4 levels, not 5.
Accepted.
41. Page 118, paragraph 7:
Is it necessary to have an extra level for 10646 conformance level 3? Maybe
in some cases but not generally. When sorting the combining characters
per se, there is no need
for a further level.
See disposition of comment 30
above.
42. Page 119: paragraph 9: We thought this was proven not to be true. Or is this
some implementation guideline
(which then should noted as such).
Paragraph 9 will be removed.
43. Page 120: Annex I should be explained further, especially how it fits into
the internationalization
model.
Annex I will be changed to avoid
using elemnts that are irrelevant to parameters used in the API
or the tables.
Israeli National Body comments
The SII votes NO on CD 14651. If items 1, 2 and 3 were to be accepted,
our vote would become YES.
1. Hebrew Accents
The Hebrew accents (UO591 to UO5AF), Meteg (UO5BD) and Upper Dot (UO5C4)
do not participate in the string ordering process. They relate, in fact,
to the whole word, rather than to the letter to which they are attached,
and are never used in the lexicographic order or in any other ordering of
Hebrew texts.
- The Hebrew accents should be removed from the list of collating
symbols, page 35, and from
page 45.
- On page 56 they should
all be defined as:
- IGNORE; IGNORE;
IGNORE; IGNORE;
Not accepted. The predictibility
requirement is essential to this standard. It does not harm absence
of predictibility requirement. The reverse harms other international
requirements.
2. Composite characters
and combining characters.
It seems that combining characters do not sort and compare as equivalent
to their precomposed encoding. For instance, the two strings "Gu:nther"
and "Gu:nther", the first coded with UOOFC, the second with UOO75 followed
by UO3O8, are equivalent and should not be distinguished but are not
equivalent in the CD. The particular coding used is an artifact, possibly
not under the control of
the user, and is normally meaningless.
This is correct. Equivalences
can be made absolutely equal by prehandling, which is part of
the general model of ordering.
3. Introduction, page 6, last paragraph: "If two equivalent strings are
not absolutely identical,
then the tie must be broken."
This sentence is not acceptable. If two strings are equivalent they
should be treated as such. For example, Hebrew strings that are
equivalent but have different
accents.
Different levels of precision
exist. What is equivalent up to a certain number of precision
may not be equivalent beyond, at a higher precidion level..
4. Introduction, page
4 (Editorial):
The introduction begins with a negative statement and continues with a
criticism of past practices. The SII suggests it should be preferable to
begin with a positive statement describing what the standard is and what
are its benefits.
Most of the current introduction
will be moved to an informative annex and this will diminish emphasis
on this statement which will not present the standard.
5. Tutorial, page 7 (Editorial).
The tutorial would be better
placed in an informative appendix.
Accepted.
6. Page 35 (Editorial).
The comment should be qubuts
(the s is mussing).
Accepted.
Canadian National Body
comments
The Canadian NB votes YES to the question asking if this CD is satisfactory to be voted on as a Draft International Standard with the following comments that have to be accommodated.
___________________
Technical comments:
___________________
1.Add a note at the end
of 5.1.2 to indicate that escape sequences should be filtered
out or transformed during the prehandling phase:
Accepted.
2.At least in annex 1 put the following note: "In this default ordering
table, a number of scripts are missing, in some cases due to lack of data at
time of editing, in other cases due to the non-inclusion of those scripts in
10646 at time of publishing. It is the intent of ISO/IEC to complete
ordering of those scripts explicitly in the default whenever data becomes
available by way of amendments to this international standard. In the
meantime, implementers are encouraged to take advantage of tailoring to meet
user sorting requirements. If the default table is not tailored for
unspecified characters, then an implicit order is assigned in the following
table, which might not
meet user requirements of a particular community".
Accepted.
3. The toggles should be
moved from the API to the pre-handling phase
A liaison statement from WG14 has informed us that having toggles in the
API clashed with the C/POSIX i18n model, in which the APIs should be
culturally neutral and all cultural/localization data should be given via
the locale.
In order to preserve the model, yet retain the ease of use of the toggles
for tailoring the table,
we propose the following:
p. 19 and 23: Remove the order_accents, order_case and sign_espace
parameters and modify
the text accordingly.
Accepted.
We disagree with the intentional mistakes made for implementing the toggles
in the table data in annex 1. We strongly suggest that a syntactic
preprocessing mechanism handling toggles specifically be implemented in
ISO/IEC 14651 and documented in ISO/IEC 14652. Where the intentional
mistakes have been done in the default table, replace the occurrences by
conditional statements
similar to these :
a) for space/NBSP toggling:
#if ToggleSpace
<00A0> IGNORE... [definition of NBSP]
#else
<0020> IGNORE... [definition of SPACE]
.
.
.
#if ToggleSpace
<0020> <espace>... [definition of SPACE]
#else
<0040> <espace>...
[definition of NBSP]
b) for accent scanning toggling:
#if ToggleAccents
order_start <La>;forward;forward...
# else
order_start <La>;forward;backward...
c) for precedence of case toggling (the idea is here in one occurrence to omit
or put the MIN symbol before CAP in sequence and in the other occurrence
to put or omit the MIN symbol after CAP in sequence. The CAP symbol
occurs somewhere between the two statements below):
#ifnot ToggleCapsMin <MIN>
.
.
.
#if ToggleCapsMin <MIN>
With such a syntax, toggles "ToggleSpace", "ToggleAccents" and
"ToggleCapsMin" shall be turned ON by a specific statement specified in
ISO/IEC 14652. If not turned ON, they are OFF by default and need not be
mentioned in the tailoring.
Before final compilation of the table in prehandling, the toggles shall
modify the table and
thereby affect the ordering as intended.
Accepted in principle. Syntax
will be slightly modified.
Editorial comments
________________
I. Make the introduction smaller and the explanations in an informative
annex. Essential paragraphs that should stay in the introduction are
suggested to be the
1st, 2nd, 6th, 8th and 10th.
Accepted.
II. In 5.1.2 after "to transform fields" add "and unstructured records into
structured fields".
Accepted.
III. Annex I Dialog Box should be only applicable to what is in ISO/IEC 14651.
Irrelevant elements such as system direction, bidirectional options and
text behaviour should
be removed.
Accepted.
IV. p. 17, put explanatory parenthesis after each subprogramme name the first
time it is used.
Accepted.
V.p.18 change "parameters
de sortie" to "output parameters".
Accepted.
VI. p.26 rephrase 5.3.2 to make understood the relationship between symbolic
tables and numeric tokens. Indicate how subkeys are arranged to form a
key and suggest how to use that key to make binary comparisons (with
special considerations for the delimiters between keys and a note, mentioning
that there is a possibility
to avoid them, with references).
Accepted.
VII. Put the tutorial as
an informative annex.
Accepted.
VIII. Remove or complete incomplete references in Annex C, i.e. references to
non-documents.
Accepted.
IX.p.51 Title for "Control characters" should read "Control characters and
associated symbols;
caractères de commande et symboles associés".
Accepted.