ISO/IEC JTC 1/SC 22/WG 20 N 527R

ISO

ORGANISATION INTERNATIONALE DE NORMALISATION

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION

ÌÅÆÄÓÍÀÐÎÄÍÀß ÎÐÃÀÍÈÇÀÖÈß ÏÎ ÑÒÀÍÄÀÐÒÈÇÀÖÈÈ

CEI (IEC)

COMMISSION ÉLECTROTECHNIQUE INTERNATIONALE

INTERNATIONAL ELECTROTECHNICAL COMMISSION

ÌÅÆÄÓÍÀÐÎÄÍÀß ÇËÅÊÒÐÎÒÅÕÍÈ×ÅÑÊÀß ÊÎÌÈÑÑÈß

Title: Disposition of comments on ballot JTC1/SC22 N 2466

ISO/IEC CD 14651, International String Ordering

Date: 1997-07-01

Project: JTC 1.22.30.02.02

Cross Reference: SC22 N2364

Source: Alain LaBonté, Project editor, on behalf of SC22/WG20

Status: Information required according to directives by SC22 Secretariat

Action: For National Body Consideration

REVISED SUMMARY OF VOTING ON

Letter Ballot Reference No: SC22 N2364

Circulated by: JTC 1/SC22

Circulation Date: 01-20-1997

Closing Date: 04-24-1997

SUBJECT: CD Approval for CD 14651 - Information technology

International String Ordering - Method for Comparing Character

Strings and Description of a Default Tailorable Ordering

The following responses have been received on the subject of approval:

"P" Members supporting approval

without comment 10

"P" Members supporting approval

with comment 2

"P" Members not supporting approval 4

"P" Members abstaining 2

"P" Members not voting 7

"O" Members supporting approval

without comment 1

"O" Members not supporting approval 1

"O" Members abstaining 1

ISO/IEC JTC1/SC22 LETTER BALLOT SUMMARY

SUMMARY OF VOTING AND COMMENTS RECEIVED

Approve Disapprove Abstain Comments Not Voting

'P' Members

Australia (X) ( ) ( ) ( ) ( )

Austria ( ) (X) ( ) (X) ( )

Belgium ( ) ( ) ( ) ( ) (X)

Brazil ( ) ( ) ( ) ( ) (X)

Canada (X) ( ) ( ) (X) ( )

China ( ) ( ) ( ) ( ) (X)

Czech Republic (X) ( ) ( ) ( ) ( )

Denmark (X) ( ) ( ) (X) ( )

Egypt ( ) ( ) ( ) ( ) (X)

Finland (X) ( ) ( ) ( ) ( )

France (X) ( ) ( ) ( ) ( )

Germany ( ) ( ) (X) (X) ( )

Ireland ( ) ( ) ( ) ( ) (X)

Japan ( ) (X) ( ) (X) ( )

Netherlands ( ) (X) ( ) (X) ( )

Norway (X) ( ) ( ) ( ) ( )

Romania (X) ( ) ( ) ( ) ( )

Russian Federation (X) ( ) ( ) ( ) ( )

Slovenia (X) ( ) ( ) ( ) ( )

Sweden ( ) ( ) ( ) ( ) (X)

Switzerland (X) ( ) ( ) ( ) ( )

UK ( ) ( ) (X) (X) ( )

Ukraine (X) ( ) ( ) ( ) ( )

USA ( ) (X) ( ) (X) ( )

'O' Members

Argentina ( ) ( ) ( ) ( ) ( )

Bulgaria ( ) ( ) ( ) ( ) ( )

Cuba ( ) ( ) ( ) ( ) ( )

Greece ( ) ( ) ( ) ( ) ( )

Hungary ( ) ( ) ( ) ( ) ( )

Iceland ( ) ( ) ( ) ( ) ( )

India ( ) ( ) ( ) ( ) ( )

Indonesia ( ) ( ) ( ) ( ) ( )

Israel ( ) (X) ( ) (X) ( )

Italy ( ) ( ) ( ) ( ) ( )

Korea Republic (X) ( ) ( ) ( ) ( )

New Zealand ( ) ( ) ( ) ( ) ( )

Poland ( ) ( ) ( ) ( ) ( )

Portugal ( ) ( ) (X) ( ) ( )

Singapore ( ) ( ) ( ) ( ) ( )

Thailand ( ) ( ) ( ) ( ) ( )

Turkey ( ) ( ) ( ) ( ) ( )

Yugoslavia ( ) ( ) ( ) ( ) ( )

US National Body comments

AF-1

The specification of the sorting algorithm must be made independently of a

programming model.

Sorting is a process that is used in an incredible variety of circumstances

and on widely different systems, including object-oriented systems. Care

should be taken in preparing the normative specifications for CD 14651 that

they are usable independent of a particular programming model, programming

language, or environment.

A language and environment independent model will be used as the model to respect language and environment independence as far as possible for the API description. Three models are considered, brought by the participants at the Québec meeting, the editor will have to make a choice respecting all constraints expressed, for the final CD.

In particular, the descriptions of the sorting operations should be

expressed in an abstract form, specifying IN, OUT and RETURN parameters but "without"

language binding. Also, no parameters needed for the sorting operation may be

presumed to hide in some semi-opaque state, but rather they should always be

specified explicitly in the description of the operation.

Solved by the previous. However, in addition, SC22 requires that an actual binding to at least one programming language be provided normatively.

If it is desired to show how the standard might be implemented in a POSIX

environment, that could be the subject of an informative annex. Function

bindings for POSIX could assume transparent access to locale data from the

POSIX locale model, if that is desired. The annex would specify how the

proposed POSIX functions make use of the abstract operations defined in the

normative part of the standard, and how their parameters are set either

explicitly or implicitly.

Conformance to POSIX is not required. However it was decided long ago by a majority of experts that the specification would use POSIX LC_COLLATE specification as a starting point to avoid reinventing the wheel in ISO work. This is what is done.

RLG 1:

The body of the standard includes material which belongs in an informative

annex, specifically the "Tutorial on problems solved by this standard."

Accepted. Most of it (except really introductory material) will be moved to an informative annex.

RLG 2:

The order specified for two Cyrillic characters (p. 95-100 of the CD)

conflicts with the order in Table 2 of ISO/R9 and other sources (cited

below).

The characters in question are these two case pairs: CYRILLIC CAPITAL

LETTER

TSHE/CYRILLIC SMALL LETTER TSHE and CYRILLIC CAPITAL LETTER DZE/CYRILLIC

SMALL

LETTER DZE.

Cyrillic letter TSHE:

In the CD, TSHE follows KA WITH HOOK and precedes EL.

In ISO/R9 and other sources, TSHE follows TE and precedes U.

Cyrillic letter DZE:

In the CD, DZE follows KOPPA and precedes CHE.

In ISO/R9 and other sources, DZE follows ZE and precedes I.

Other differences in the order of Cyrillic characters between the CD and

Table

2 of ISO/R9 are either not supported by the other sources or are arbitrary.

These tables will be checked with the Irish national body which provided the data and corrected, probably along the lines specified by the US national body.

RLG 3:

The order of scripts on p. 31 differs slightly from the order in ISO/IEC

10646. Specifically:

- Georgian follows Cyrillic; in ISO/IEC 10646, it follows Tibetan (pDAM-6)

- Hebrew follows Arabic, in ISO/IEC 10646, it follows Armenian (and

precedes Arabic).

These differences are not explained.

Differences will be explained as far as possible.

RLG 4:

Hangul is positioned between Tibetan and Cherokee (i.e., consistent with the

location of Hangul Jamo in ISO/IEC 10646). There is no explanation as to

why this position was chosen, rather than that of Hangul Syllables. Since

Korean may be written with a mixture of ideographs and Hangul syllables,

the Hangul Syllables position established by pDAM-5, immediately after the

CJK Unified Ideographs, might be preferable.

That should be according to the explanation unless some arbitrary order remains, in which case it will be stated too.

HP 1

The outline of the document does not follow the well defined and established

method already used in other JTC1 standards. For example, the Introduction

is too big and the reader gets lost and might decide not to continue to

read the document. Usually such information belongs to an informative

annex otherwise it becomes normative.

The structure of the document followsthat of many other ISO/IEC standards. Most of current introduction material will be moved to an informative annex.

HP 2

The structure of the document has the "Scope" clause on page 11. This

clause should come immediately after a newly written short Introduction

clause.

This is not required by ISO directives and many other ISO standards have exactly the same structure as the current one.

In addition, this clause needs clarifications. For example, does

it describes the APIs needed by applications to specify character string

ordering? It is also not clear what is meant by the phrase "full

repertoire of ISO/IEC 10646 (independently of coding)". The part that is

not clear in the previous statement is the one in parenthesis.

It may appear unclear to some people indeed. Better wording is welcome.

In addition, the "Scope" clause talks about a specific default ordering but

it is not clear as to where in the CD how it was derived or how it is

related to the APIs.

This will be clarified. The term default will also be changed to common template (to be tailored) and explanation added on the goals to be achieved.

HP 3

The "Conformance" clause should follow immediately the "Scope" clause.

This is not required by ISO directives and many other ISO standards have exactly the same structure as the current one. ISO directives do not even require a conformance clause for a standard.

It should be combined with the "Requirements" clause. It should be rewritten

to make easy to understand how to conform without having to go through the

syntax and content complexity of the "Requirements" clause.

Conformance is difficult to determine from the document; the document

requires a table of precisely which features are required. Moreover, the

functions levels are, in general, independent of the previous level; there

is little reason to force all features of one level before the next higher

is reached.

Requirement clause will be completely revisited.

Post handling is informative, and has no place in

conformance.

Post handling is not informative. When prehandling made changes for comparison purposes, the post handling phase is there to reestablish full predictibility in cases, for examples, of collisions due only to the modifications done in the prehandling phase when required.

HP 4

In the clause "Tailoring Mechanism", it is not clear at all as to what an

application developers needs to do to override the default ordering that is

specified in Annex 1.

An example of tailoring will be given using ISO/IEC 10652 specification.

HP 5

May be it would be better to have this CD become a Technical Report rather

than a standard since it allows users to override the default ordering

proposed and there might be more users overriding the default, with an

undefined and nowhere described mechanism, than what the CD proposes.

Not accepted. NP defined an IS to be produced as the result of this work and the NP was approved according to directives.

HP 6

Dependency on an unpublished standard 14652, Cultural Conventions

Specification is too high. Currently, 14652 is still in the CD stage as

mentioned in clause 2, Normative References, of this CD (14651).

In summary, there is a lot of structural and technical fine tuning that is

necessary to make this document complete. If such an effort takes too much

time may be the industry could be served better if the proposal is modified

for publication as a TR rather an ISO standard. This work can be later

converted to an ISO publication when CD 14652, Cultural Conventions

Specification, is accepted and is published as an ISO standard.

The goal is to synchronize ISO/IEC 14652 publication with ISO/IEC 14651 final publication. In the meanwhile it is believed that ISO/IEC 14652 is already pretty stable as most of it is an excerpt of ISO/IEC 9945-2 so that the specification used in ISO/IEC 14651 should be independent of POSIX conformance.

TG 1

The organization and nomenclature (e.g. COMPCAR) in unnecessarily obscure.

Names should be spelled out completely for clarity.

Not accepted. Certain programming languages have binding restriction on name lengths. Spelling function names out is therefore typically not language-idependent in the context of this standard.

TG 2

The requirement that the original string be recoverable is unnecessary; many

applications, such as databases, will have a sort key be an alternate field

in the record. They may only need to have a level 1 sort for their

application. In that case, storing the original string twice or requiring

internal structure that enables reconstruction is unnecessary and only

increases storage to no purpose.

Accepted in principle. The requirement was removed. Text will be checked for residual statements implying such a requirement. If some are found they will be removed.

TG 3

Use of NBSP is in practice an unacceptable overload of its primary function.

Being able to functionally tailor just space and nbsp is in practice not

useful; in general a whole host of similar characters, punctuation and

symbols, behave the same way.

The function parameters dealing with this will be removed. Tailorability will always be possible in the data though. The template will use NBSP according to Canadian standards and CEN preliminary specifications with a possibility of easy toggling in addition to full tailorability of the tables.

TG 4

The algorithm for comparison must be stated in terms of results, NOT a

specific mechanism.

Conformance clause will be revisited to try to maximize consensus as per the Québec meeting results.

TG 5

The format in Annex 1 is unnecessarily complex. It is impossible to assess

and recommend this standard where we cannot clearly determine the result

of the default sorting order rules in this annex. It forces use of a

whole separate notation for characters. To correct this, characters must

always be referred to by their full 10646 name for clarity, rather than

arbitrary notations such as AYEHS, AIGUT, POINN, QARNP, or many other

examples. Script names should always be the 10646 block name.

This notation does not apply to characters but to internal weights used for ordering at any given level of precision. Character names are not standard-version-independent while UCS identifiers are and SC22 has a requirement on unique identifiers. Furthermore script names do not necessarily coincide with blocks in ISO/IEC 10646 (the best example being the special characters). The syntax is the syntax agreed by the group to build on POSIX specifications without requiring POSIX conformance.

TG 6

The equivalencies of composed characters vs. composite character sequences;

e.g. a + umlaut and a-umlaut can be stated much more succinctly.

This is controversial matter in ISO and consensus can not be reached on equivalencing. However it is possible to give an example of tailoring that will allow to do so. This will be done.

TG 7

The relative ordering of characters cannot be determined from the character

lists, since they are not even remotely in the resulting order.

Characters are currently presented in the vast majority of cases in the exactly intended resulting order. This is the case for special characters, and this is the case for characters part of the scripts of the world specified in this standard.

To correct this, the ordering of characters within a script must be presented in the

resulting order as much as possible. Example:

<U0000> IGNORE;IGNORE;IGNORE;<U0000> % NULL

<U2400> IGNORE;IGNORE;IGNORE;<U2400> % SYMBOL FOR NULL

<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING

<U2401> IGNORE;IGNORE;IGNORE;<U2401> % SYMBOL FOR START OF HEADING

<U0002> IGNORE;IGNORE;IGNORE;<U0002> % START OF TEXT

<U2402> IGNORE;IGNORE;IGNORE;<U2402> % SYMBOL FOR START OF TEXT

<U0003> IGNORE;IGNORE;IGNORE;<U0003> % END OF TEXT

<U2403> IGNORE;IGNORE;IGNORE;<U2403> % SYMBOL FOR END OF TEXT

...

The fourth column (in this case) determines the final ordering of the

characters, which is NOT the order presented. It must be presented as:

<U0000> IGNORE;IGNORE;IGNORE;<U0000> % NULL

<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING

<U0002> IGNORE;IGNORE;IGNORE;<U0002> % START OF TEXT

<U0003> IGNORE;IGNORE;IGNORE;<U0003> % END OF TEXT

...

<U2400> IGNORE;IGNORE;IGNORE;<U2400> % SYMBOL FOR NULL

<U2401> IGNORE;IGNORE;IGNORE;<U2401> % SYMBOL FOR START OF HEADING

<U2402> IGNORE;IGNORE;IGNORE;<U2402> % SYMBOL FOR START OF TEXT

<U2403> IGNORE;IGNORE;IGNORE;<U2403> % SYMBOL FOR END OF TEXT

No corrcetion is required. The order is not the order of ISO/IEC 10646. It is totally decoupled from the coding tables.

TG 8

The Annex also does not make clear that the vast majority of its characters

are sorted in character code order. This requires the reader to visually

inspect every line to no purpose. These should be replaced one statement;

"Except where otherwise noted, all symbols are sorted as:

<Uxxxx> IGNORE;IGNORE;IGNORE;<Uxxxx>"

The vast majority of characters are not sorted in character code order. This is totally decoupled and any resemblance is purely coincidental due to the decoupling of ordering from coding.

TG 9

Annex 2

List #1 is superfluous. The statement should be that the words in List#2 in

any initial order, when sorted will result in List #2.

The specific normative input is deliberately designed to catch some implementation problems and quickly detect non-conformance. Otherwise it would possible to realy choose an initial order or even several initial orders that will accidentally produce the required order. Of course other orders might also give the same results and other lists can be part of additional private test requirements.


The Netherlands' National Body comments

The Netherlands vote negative on CD 14651. To turn our vote to positive

modifications shall be made in accordance with our comments. We reserve

our final position regarding the CD until we have seen the Final CD.

Technical comments:

1. Remove Annex 1 and all references to an International Default Order.

Annex 1 is and essential normative part to this standard as per international consensus.

-- SC22 has no expertise in this field, and cannot check for correctness

Most NBs in SC22 are not able to check whether a proposed ordering

for a certain unfamiliar script is in agreement to actual practice

far from home. Those NBs that are familiar are not represented in

SC22, nor have been asked for comment.

All NBs in SC22 have been asked on different occasions for input as well as countries not represented in ISO, and much data has been gathered on a lot of different scripts from many different sources, from multinaltional companies to minority groups to official standards bodies, before making up the tables. SC22 has expertise in this field, WG20 is a working group where this expertise exists. It can not deal with the 6000 languages of the world at once but more specifications will be added over time in the template if this data becomes available for more scripts. Ultimately JTC1 will vote on this standard and that will even widen the public review to more national bodies. Moreover, tailoring can be done for very vernacular languages.

-- Default order is an instrument of cultural imperialism.

In several countries more than one ordering rule is in use without

any agreed preference. Calling one of these the "default" is

imposing an extraneous pressure, and will involve interference with

national habits.

-- No need for a default.

The term default will be replaced by common template. It shall be tailorable, which is the best way to deal with this concern constructively.

No country uses always all characters from 10646. They should not be

burdened with unwanted features. A method for supplying ordering

information for a given restricted character set to an API should be

contained in 14651 itself, without reference to 14652.

The scope says that the standard is applicable to subsets as small as ISO/IEC 8859 parts. With a binding to actual restricted coding, this is achieved.

2. Remove all references to 14652.

-- Needless complexity should be avoided.

An ISO standard should be as independent as possible of other ISO

standards. If ordering information can only be supplied by way

of a complete set of cultural conventions, as specified in 14652,

there is involved an enormous overhead, and an obligation to NBs of

also having to specify non-ordering information which is irrelevant

to 14651, but nevertheless required in this CD.

Specification of LOCALE categories beyond ordering (e.g. LC_MONETARY) is not required to conform to ISO/IEC 14652. The latter is for most of it an excerpt of ISO/IEC 9945-2 to make sure that the specification does not require conformance to POSIX. Hence the overhead is considerably less than what the NNB believed.

Editorial comments:

The text of this document leaves much to be desired regarding

precision of definition, clarity of presentation and conformance to

ISO directives part-3.

The NNI cannot give detailed comments here, nor offer replacement text as

doing so would require rewriting more than half of the document for which

we have no resources available. The NNI already gave some directions with

its vote on CD-registration, but found almost no improvement in this CD.

That is purely ITTF matter. ITTF will correct any style not to the height of its publishing standards as it always does. Furthermore there is not even a requirement for a conformance clause in ISO directives part 3.

Austrian National Body comments

ON (the Austrian NB) votes NO on CD Ballot SC22 N2364

(CD 14651 - Information technology - International String

Ordering - Method for Comparing Character Strings and

Description of a Default Tailorable Ordering) with the

following comments:

(1) It seems doubtful (to say the least) that a reasonable

Default Ordering for all -- or even most -- of the languages

of the world can be found. Consequently, there is reason to

doubt the usefulness of the proposed International Standard.

Correct. There is no worldwide-recognized ordering that can be used without tailoring. The common template (term default will be changed) may require and allows tailoring to local needs.

(2) The "Tutorial" contained in the Introduction should be

moved to an informative annex; it should not remain in the

main part of the document which would have to be considered

normative.

Accepted.

(3) Even though there is a "Tutorial", the proposed methods

do not seem to be well explained. It could at least be

expected that one should be able to read and understand the

tables in Annex 1 without having to consult other sources.

ISO/IEC CD 14652 is a normative reference and is an important complement to ISO/IEC 14651. Although most of the syntax comes directly from POSIX standard ISO/IEC 9945-2, it was made a separate standard so that POSIX conformance should not be required to implement the ordering standard, which also has some extra features.

For an example, see page 51 where a rather poor comment, in

itself encoded, supposedly explains the structure of the

following tables by cryptically stating:

"% <Uxxxx> <Base>;<Accent>;<Case>;<Special>"

The sudden change of typeface on the same page seems equally

confusing und unmotivated (except possibly by line length).

Typeface change was a bug with the printing software. These problems will be corrected before final publication by ITTF if some incidentally remain at this point.

Also, it seems that a more detailed description of a

possible practical implementation could prove helpful.

The tutorial gives a kengthy explanation of an actual implementation. This explanation, though, is considered already very long by some national bodies and will be moved to an informative annex.

(4) The "Benchmark" in Annex 2 adds to the general confusion

by showing the "sorted" version to be (in excerpt):

"vice-president's"

"offices"

"vice-presidents'"

"offices"

The problem obviously lies in automatic line breaks and can

easily be corrected, but seems to raise the question whether

similar errors have been introduced in areas which are very

difficult -- if not impossible -- to check. To mention the

most prominent example, some errors in Annex 1 might never

be found because this part of the document can hardly be

checked exhaustively.

Line break problems' cause is known and apparently a reliable solution has been found. This will be corrected to satisaction.

(5) It is rather difficult to determine the necessity of

text that is not present. ON does therefore not feel able

to decide on Annexes F, G, and H.

(6) The document has obviously been translated from French

to English, which would not be a problem if the process had

been completed. For a counterexample see the description

of procedures chbin1 and chbin2 on page 18. Also, the name

of procedure sign_espace (on page 19) seems to be partially

French.

Partly translated texts will be fixed. Programming symbols are symbols though and are not French nor English per se. « Sign » is an English word while « espace » is a French word, but « sign_espace » is neither French nor English.

(7) The document does not appear to have been spell-checked.

Some examples:

p. 19: "precedenceof" should be "precedence of"

p.109: "deafult" should be "default"

p.114: "standaredized" should be "standardized"

That will be fixed in the next version. There was no English spell-checker available to the editor at time of releasing the CD. This is unfortunate indeed but it is the pragmatic inconvenient of using national versions of proprietary word processing software (which only provides by default a dictionary in the user's language) as allowed by ISO.

(8) Anticipating the answer that ON experts should actively

participate in the process of correction and development of

the document in question, ON states that expert resources

in this area are too limited at this time. However, this

does not imply that any document can be accepted. Sorry.

We take note.

UK National Body comments

> The UK ABSTAINS on this ballot, due to lack of participation in this area.

> The UK would however like to bring the following issues to the attention of

> SC 22 :

>

> - a tutorial on problems solved is inappropriate for an IS; either the

> document should be a TR or the tutorial moved to an appendix.

Accepted. This will be moved to an annex.

> - the statement on page 10 about information being obtainable from

> Alain LaBonte' is also inappropriate for a formal document.

No such statement is made. This is a reference indicating a publication title, a publisher, an author, and ISBN number for a publically available document. This document is also freely available (mirrored) on the WWW on at least four different sites in France and in Canada. A search with any major global search engine will allow to retrieve it easily online. The documement also exist on paper and ISBN reference is sufficient to retrieve it at least in the official national library where this document is registered as per international conventions.

> There are also a number of minor points:

>

> - there are a disturbingly high number of elementary typographical

> errors (e.g. p 18 'starings' (strings); 'compariosn', 'aat'; also mixed

> languages in chbin1, chbin2 heading). On page 19 there are French

> quotation marks rather than English ones.

That will be fixed in the next version. There was no English spell-checker available to the editor at time of releasing the CD. This is unfortunate indeed. Furthermore in the French version of WinWord, it was not easily possible to enter English quotes because of unfortunate automatisms in Word. Please accept our apologies for this editorial problem.

> - p 25 there is a reference to section 5.8, which does not exist.

That will be fixed and reference will be made to ISO/IEC 14652 instead, which took back the info that was located in this section in an earlier draft.

> - subprogramme is consistently spelled thus, although `subprogram' is

> the correct form in both US and UK (don't know about Canada, Australia

> etc).

Accepted.

Japanese National Body comments

Japan disapproves CD 14651 proposed in SC22 N2364.

The CD is not mature enough to proceed to DIS from view point of

completeness as a JTC1 standard as follows.

- not precise enough tuned yet from technical view point,

- still not reaching a consensus on the expected ordering result.

- high dependency on ISO/IEC 14652 which is not in CD stage. and

- style of the document does not meet the JTC1 requirement

Therefore, because of high dependency of this CD on ISO/IEC 14652, Japan

requests to wait and synchronize the review and ballot of CD 14651 until

CD 14652 is registered, or to change the scope of the standard to

"ordering result" only and move API part to i18n API project.

Thus, Japan sees absolutely no reason why we need to proceed to DIS now.

Comment detail.

1. Style (major editorial)

The CD is very different from the what ISO/JTC1 directive requires, (and

also different from the template provided by ITTF and many of JTC1

standards) For example, there are very high dependency on font selection

(usage of bold, slant, point size variation and/or unnecessary type face

mixture. are prohibited).

This is pure ITTF editorial matter and will be ultimately corrected before final publication.

The Definition clause need to have sub-clause

for each terms, two groups of annex --one for normative and another for

informative. Review and rewrite all text according to ISO/JTC1 directive

and template supplied by ITTF.

Not accepted. Different ISO/IEC JTC1 directives are in conflict (necessity to have alphabetical order in both French and English version and if numbering is used the same numbers should be used in both versions, not necessarily in sequence [in sequence of the original language - furthermore we have seen cases where ITTF itself numbers definitions differently in Englsih and French versions, breaking its own structural equivalence rules, without anybody complaining although this is unfortunate. If ITTF wants to take on itself this anomaly at time of final publication, then be it so, but the editor will not do it at this point]). This is felt unnecessary to introduce numbering as some English speakers indicated that will not accept to have an alphabetic sequence not in numerical clause order.

2. Relation with ISO/IEC 14652. (General process)

The syntax and semantics of Annex 1 are not defined in this draft and are

depending on ISO/IEC 14652 which is not available yet. Synchronize the

project with ISO/IEC 14652 development -- wait for decision until CD 14652

is available at least, or, if it is not accepted, move related part of

the ISO/IEC 14652 into this CD..

Final publication will be synchronized. However most syntax and semantics of ISO/IEC 14652 is a pure excerpt of ISO/IEC 9945-2 so that conformace to POSIX should not be a requirement. This syntax and semantics is widely known in SC22, except some minor additions required for multiscript ordering and tailoring.

3. Tutorial (major editorial)

Heavy tutorial clause at the beginning is not a thing to do, move them to

appropriate place and rewrite them to fit the new place. In addition,

there are many "information only" text in main clauses (such as clause

5.3). Remove them out from main (and mostly normative) part of the

standard, and place them (if really necessary) to appropriate related

place(s).

Tutorial will be moved to its original location in the first drafts, i.e. in an informative annex.

4. Scope (major technical)

Describe what are this standard defines clearly and straight forward way.

For example, change the word "a method" to much clear specific word (which

is API).

Accepted.

Once above change is made, it may affect on the title of the

standard. Also the word "Default Tailorable Ordering" does not have

logical meaning. One possibility of the new title would be "API with

default order for International string ordering".

Title will be changed to something else.

Last part of 2nd bullet (on an order which is culturally---of that script)

should be removed because "order which is acceptable culturally" is not a

scope of this standard. This part should be re-written something like

"The default order is aiming for easy understanding of non-casual user of

the script, cultural correctness/acceptance is not a purpose of the

default order. The correctness/acceptance by the casual (or native) user

to be provided by tailoring by the user or as a country profile".

Rationale: Above has been an agreement on the project scope from the

beginning. There were many discussions of impracticalness of having a

single default order which may satisfy all of cultures. The conclusion

has been it is not practical to have such an ideal default order, and it

was said that "this is why tailoring is needed". Japan, then, did not

request culturally correctness for ordering.

To satisfy Japan, the text will be modified to respect the spirit of document WG20/N526 prepared by the Japanese expert present, which was welcome.

Same story for French, since

French ordering is so sophisticated no outsider understand it easily,

therefore, it is not practical to use true French order as international

default order, it may causes mis-understanding of peoples of other

cultures. Such sophisticated ordering (such as French) can be

satisfactorily supported by tailoring anyway. (See clause 4.2.7 of DTR

11017, This IS is not i18n per 4.2.6 nor 4.2.4. This IS is aiming 4.2.7)

All this will be tailorable from the common template and by agreement with CEN, different toggles will be offered to simplify the users' task.

5. Definitions (major technical)

5-1, Each definitions should have separated sub-clause number.

Not accepted for reasons explained above.

5-2. API: Initial text of "for purpose of..... standard" is not

necessary.

Accepted.

5-3. equivalence: Too much, make it almost 1/3 by eliminating

"informative" texts with in this definition. (for example: last 4 lines)

Accepted.

5-4. field, first order talken, fourth order talken level, level, second

order talken, transformation, third order talken: Eliminate "informative"

explanations.

« Level » will be removed from « n order token level ».

Most experts present felt that informative text was useful and will be preserved except when superfluous.

5-5. posthandling, prehandling : Those definition should be moved to the

related clause.

Not accepted.

5-6 telephone-book-type transformation: This term need not be defined

in Definitions because it appears only once in Introduction (5th para.,

Page 5). Although Japan considers that the paragraph is understandable

in itself, we propose to change the first sentence to:

More generally, specific requirements exist for a kind of complex

transformation

-- e.g. phonetic transformation adopted in some telephone-book systems

because telephone-book ordering means differ from culture to culture, so,

this wording may confuse the user.

Definition for « telephone-book-type transformation » will be removed.

New text for « transformation » will be drafted.

6. Conformance (major technical)

6-1. Conformance clause(s) should come after the scope clause it should

not be after the requirements clause. The location of the conformance

clause is inviting difficulty of understanding of each conformance levels

clearly.

Reason (rationale) why conformance clause should be clause 2:

If requirement is simple and no leveling are employed, the conformance

clause can be any place in theory. Note that ISO/IEC directive part-3

does not require "conformance clause" even. However, in case of ISO/IEC

CD 14651, the condition is different, it should be clause 2.

Since 14651 is a very complicated multilevel standard. the scope clause

can not cover all what "scope' clause should say. The conformance, in

particular, the clean and clear "levels" descriptions are acting, in

reality, as a sub-scope clauses as well as real conformance descriptions.

If it does not come after "scope" clause, it is almost impossible for the

user of the standard to understand "what are defined in this standard and

how to read the standard efficiently and accurately".

Not accepted. Editorial as per ISO/IEC JTC1 Directives. Other standards use a totally different approach and some comments have been recently received in SC18/WG9 for another standard saying exactly the opposite of what is written here and asking that the conformance clause be placed as the last clause of the standard instead of after the scope statement like requested here, on procedural grounds (grounds which do not exist in any case).

6-2, Conformance clause should have exact pointer(s) for the conformance

requirement (clause and sub-clause numbers). Umbrella conformance for

buried requirements with in main clauses (like this CD) should not be

used. (Current CD is too unkindly for reader)

6-3. In case of leveled conformance, provide a sub-clause to explain

what those levels are much straight way. (Too many indirect explanation

now).

6-3-1. Conformance level-1 should be defined as "Generic API only. And

should not make some of the parameters as "option". The option causes

in-compatibility problems between conforming level-1 APIs. Further

define two options (not parameter option s), one for COMPCAR and another

for COMPBIN + CARABIN.

6-3-2 Conformance level-2 should be defined and stated as "Generic API

and table format"

6-3-3 Conformance level-3: Change prehandling to requirement for string

input as normative. Thus prehandling is out of scope of this standard

(remove 5.1.2 at least). Then, change the description of this

conformance level accordantly.

Conformance clause will be limited to two conformance levels and be made much more simple and friendly as agreed by experts present in Québec after a long discussion.

By the way, in current text, normative

clause (5.1.2) is reefers informative annex. This is prohibited practice.

It only says that for an example to see the informative annex, there is no prescriptive statement.

6-3-4 Conformance level-4. Remove the word "possibility". then

resultant might be "Add API an access method for specific table.

6-4. Add a concept of conformance for "ordering result only"

Conformance clause rewritten.

6-5 Add a method to specify partial conformance of ordering result, for

example, a method to state "every thing but Japanese repertoire are

conforming this default order and Japanese repertoire are per JIS" would

be a real life use of this standard. (as one of sub-set of the ordering

result only conformance)

It is already explicitly allowed to use subsets by tailoring. This method is therefore not necessary as the requirement is already satisfied..

6-6, Add a method to swap the order of the scripts, but still the orders within each scripts are conforming default order.

That possibility is already offered by ISO/IEC 14652 tailoring statements.

6-7, Add a method to state only selected scripts in comment 6-6 are

conforming the default order.

This can easily be done by tailoring (restricted CHARMAP or restricted binding to a subset of the repertoire).

6-8, Maintain compatibility with POSIX and C. Providing independent

conformance level may be one of the choice to respond for this comment. .

After discussion, it has been established that POSIX allows creation of extra syntax if required. So compatibility to POSIX is maintained.

6-9, Remove all of "best guess" dependency. Write exactly what is needed. For example, there is no description what "default order" is. There is default table and API (and conformance levels), so best guess may be use the "default table" with the APIs.

The notion of default will be changed in line with Japanese expert input. New conformance clause will also simplify the standard a lot.

7. Requirements (major technical)

7-1. There are many options in one conformance level, those should be

another levels of conformance if those are really necessary.

7-2. The "Toggle" mechanism, which is realized by parameters

"order_accent", "order_case" and "sign_escape", should be removed

because:

1) it contradicts with the concept of the locale mechanism -- it allows an

ordering regardless of the ordering table defined as a locale,

2) the concepts of "case" and "accents" are specific to some scripts and

they are not defined in this draft where these script-dependent concepts

have been resolved into universal rules in tables.

Instead of the current "Toggle" mechanism, Japan proposes to reconsider

the specification of ordering tables, which will be defined in ISO/IEC

14652, so as to enable variants of the default table be defined more

flexibly -- for example, by introducing som e preprocessing elements

#define ...

#ifdef ...

#include ...

etc.

See disposition of Canadian and Danish comments.

7-3. table

To specify a name of an ordering table in COMPCAR and CARABIN as a

parameter "table" will put a heavy burden on implementations. At runtime

the processes COMPCAR and CARABIN should check every time whenever the

table is changed from that of the previous call and/or the table should

be compiled.

There are two alternatives to this problem:

1) to remove the parameter "table" from the two processes and define a

new process "set_collating_table" which has a parameter "table",

2) to define a new process "open_table" which has an input parameter

"table" and returns a pointer to a protected structure derived from that

"table" while the parameter "table" in the two process is changed to

"table_pointer"

Agreed in principle. Programming-language-specific data types such as pointer will be avoided though.

.

7-4 "chbin1" and "chbin2" in COMPCAR are not necessary. Further more,

options within an API specification does not make any sense at all.

Accepted.

7-5. The whole contents of 5.3 should be removed or put into an

informative annex because those contents are to be defined in ISO/IEC

14652 in the current framework.

Accepted.

7-6. Add text for the case where characters are not encoded in ISO/IEC

10646. Some character set, e.g. ISO 6937 are not in ISO/IEC 10646, and

some do not have conversion table (or same character names) with ISO/IEC

10646 (yet).

There must be a binding between ISO/IEC 10646 and actual coding in some way as specified in a note after 1st paragraph of the requirements. Also CHARMAPS play this rôle as specified in ISO/IEC 14652.

8. Data table (such as Annex A) (major technical)

8-1. Japan confirms a principle of default order table as:

- The default order is non-native user friendly (easy to understand,

simple rule, less exceptions)

- Cultural correctness for the native user of the script should be done

by tailoring. APIs and data format should have enough room for the

necessary tailoring.

- Therefore, cultural correctness of the default order is not a goal of

this standard.

This remains a goal as far as possible. Term Default changed to Common template.

Based on the principle above, Japanese proposal on

Japanese scripts are not correct for Japanese view, however, it is easy

for the people who are not familiar with Japanese scripts.

8-2 Collation for HIRAGANA and KATAKANA

Japan proposes to add a set of collating rules for HIRAGANA and KATAKANA

attached..

Accepted.

The order defined in Attachment is different from one defined in JIS X

4061 which was published in February 1997. The main differences in

handling of a prolonged sound mark <U30FC>. Roughly speaking, JIS X 4061

replaces the prolonged sound mark with the vowel of the most recent

letter, while Attachment neglects the prolonged sound mark at first in

the same way as a hyphen.

The second difference is handling of the iteration marks <U309D>, <U309E>,

<U30FD>, <U30FE>. Roughly speaking, JIS X 4061 replaces the iteration

marks with the most recent KANA letter, while Attachment handles the

iteration marks as they are.

The reasons for proposing Attachment are as follows:

1) JIS X 4061 cannot be realized by LC_COLLATE representation

unless some rules using regular expression, which will put a heavy

burden on implementations, are introduced,

2) ordering results of JIS X 4061 are hard to understand for

foreigners without knowledge of how letter sequences are

pronounced -- it is not cross-culture friendly,

3) ordering results of Attachment are easy to understand for

foreigners without knowledge of pronunciation of letter sequences

and even in Japan, a number of encyclopedia order their items in

the same way as Attachment does -- it is cross-culture friendly,

Reason accepted. SC22/WG20 is also pleased that the scheme presented by Japan is also an actual scheme used in Japan in some actual context.

8-3 Consideration on Compatibility characters of ISO/IEC 10646.

Consideration on the compatibility characters are missing. At least,

following are needed. 8-3-1 UFF00-FF9F, FFE0-FFE8

Handle those characters as same as equivalent characters in A-zone.

8-3-2 F900-FA0D, FA10, FA12, FA15-FA1E, FA20, FA22, FA25, FA26,

FA2A-FA2D of ISO/IEC 10646-1 Handle those characters as same as

equivalent characters in I-zone.

8-4 FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27-FA29 of

ISO/IEC 10646-1 and future addition of CJK ideographs (ext-A and B).

Merge them with I-zone characters with defined rule. Provide informative

annex which describe the rule (radical, number of the stroke and so

on.....)

8-5 Character combination type symbols.

For those characters which are made up combination of two or more

Japanese characters such as 3300-336F, Handle those as if those are

string of independent characters.

Accepted. Will be implemented if input expected from Japanese National Body is received for June 15, 1997.

8-6. Symbols of character(s) and symbol(s)

Symbols with character(s) should be handled one of following methods.

a) Character(s) and symbol(s) like "short form" of normal writing such as

2480 which is looked like "( 13 )". Split the symbol as if it is a

normal string.

b) Character(s) and symbols can not split into one unambiguous sequence

such as 2470 which the circle can be either before or after character 17.

We will try to make sure that symbols used as numeric characters not be inconsistent with other classification.

Handle as if it is a special form of the character(s) part of the symbol.

8-7. Symbols for making combining sequence such as 20E0.

Follow the rule proposed at 8-6 above, the process might be different

from the method for combining sequences.

8-8. Japan expect many countries have same kinds of comments above.

Japan request, therefore, confirmation of specific to the data table to

be circulated to all JTC1 member countries (not only SC22 p-member) for

review.

Review beyond SC22 will be mandatorily be made at DIS ballot.

9. Other comments

Japan recognizes many editorial issues as well as technical issues which

are not on this ballot comment, too many major technical comments (and

may be more to expect) does not give us a time to scan all of them.

Japan thinks the minor editorial comment are unnecessary components of

this ballot comments because of un-matureness of the CD 14651.

Anyway, the text should be rewritten totally for full acceptance of the

technical comments.

We take note.



Danish National Body comments

The Danish ballot is: Yes, with general and technical comments

The comments are directed towards the english version of the text,

although the same comments can be done wrt. the French text.

1. The overall technical contents of CD 14652 is sound, and as agreed

by the working group, and thus we can accept the document as a CD.

General comments:

2. There is too much emphasis on the "binary sorting string" concept.

The concept of just comparing two strings should be catered for

overall in the document. Some places only sorting on binary prepared

strings are possible, to reach the functionality. Also there should be ample

warnings a number of places on the binary sorting string concept, as it

is culturally dependent, that is it is dependent on the sorting specification

used to produce the binary representation. Storing data in the precompiled

binary string representation should thus be recommended only for monocultural

environments, and that is actually environments that we should advise against,

having internationalization as our goal.

Some text will be added to reduce emphasis as far as possible.

3. Formal description language, such as ISO 11404 or IDL of ISO 13788 (PCTE)

should be used in the specification of the APIs. The description of

the APIs lack a number of specifications now, including description of the

types of the parameters, and specifications of how to bind to programming

languages, that are inherent in the 11404 and 13788 specification languages.

We are willing to help rewriting the API sepcifications in light of this

comment.

An improved formal method not refering to external standards and easy to undertand while remaining general and maximally language-independent will be used as agreed by experts present at the Québec meeting.

4. We recommend that a thin binding method be used, as demonstrated

in other API papers of WG20. We can provide text for this, in conjunction

with text to address the problems mentioned in comment 3.

Denmark to provide text before June 15 1997.

5. The APIs have 3 parameters, that should not occur in the API, because

all localisation should be done via the locale. These are the parameters

order_accents, order_case and sign_espace of the COMPCAR and CARABIN functions.

Accepted.

6. The LC_COLLATE specification in 14652 format should be readily useable

and referenceable, without need for retailoring. The different options,

as expressed by the parameters of the 3 parameters in our comment 5, should

be available as different LC_COLLATE specifications each with a well-defined

name.

A single common template will be developed which will be enabled for easily using single line toggles for each parameter in any source LOCALE using the template as a model.

7. The definitions in section 3 should be numbered and not ordered

alfabetically (in either English or French).

Not accepted. According to ISO directives, alphabetical order is necessary. Numbering will not be done unless ITTF decides to do so at final publishing stage for reasons given previously in the current document.

8. The definitions are too centered about a precompiled sorting string

concept. Terminology should also be applicable to comparisons on the

string encoding. Terms that should be useable with plain string comparisons

include: equivalence, ordering key, ordering subkey.

Emphasis ont this will be reduced as far as possible.

9. The technical specifications should be aligned with 14652. especially

hexadecimal symbolix ellipses "..".

Accepted.

10. The names of the APIs should be less French-oriented.

Not accepted.

11. The tables should use names established from the POSIX locale

work, such as ISO/IEC 9945-2 annex G names or 14652 names from the

repertoiremap, especially when not using <Uxxxx> names.

All character names use standardized UCS identifiers. Other symbols for weights are internal to the standard and are never referenced outside those tables. This is editorial matter and is not considered a priority.

12. A number of scripts have not been ordered properly, such as hiragana and

katakana and thai.

Kanas have been defined as an outcome of Japanese comments. Other scripts' reserved order does not harm and maybe useful for future development .

13. A reversability function from binary sort strings to character strings

seems to be missing.

Comment withdrawn by Denmark.


14. There are some spelling errors, and we suggest a spell-checker be used

for production of further documents.

Accepted.

Technical comments:

15. page 5: first paragraph: It is not always required to transform, for example

"4" into a number of strings, sometimes it is only necessary to transform

it into one string. Thus change "requires" to "may require" and "is hence"

to "may thus be".

Accepted.

16. Page 5, last paragraph and following prargraphs: Too much emphasis on the

precompiled sorted character

sting data type. This is not a general type as noted in our comment 2.

Text will be added to say reduce emphasis as far as possible.

17. Page 8, Add after "Scandinavian" "and several other". This incudes languages

like Polish, Finnish, Hungarian, Turkish, and many others.

Accepted.

18. Page 14: "subprogramme" - rather use the word "function". All APIs in this

standard are functions. All references to "subprogrammess" should be

changed to "functions" in the standard.

References to subprogrammes will be changed to procedures to be more language independent.

19. page 15, first paragraph: we recommend that only uppercase characters be

used in hexadecimal numbers, and this is also the specification in CD 14652.

Accepted.

20. Page 15, last paragraph: it seems like it is a requirement that a LC_COLLATE

specification, like the default, can be tailored on the fly. This is not

recommendable, as it would take quite some processing time, and thus delay

the processing considerable. On the fly tailoring should thus not be a

requirement.

Term invoke changed for use. Term using changed to built.

21. Page 16, 5.1.1 last paragraph: use the name of the API (COMPCAR)

instead of the number "API 1".

Accepted.

22. Page 17: last paragraph: the names of the functions should be used for

the binding. Of cause the names of the functions may vary for the

different programming languages, but the names are more than "only

indicative".

Restrictions exist for programming language procedure names. Therefore if this comment is taken into account short names as used in the current standard are advisable.

23. Page 20: The COMPCAR function seems to miss a result value on

whether the first string was lexiographically less, equal or greater

than the second string. We propose the values -1, 0 and 1 for the three

possiblities, in line with current C practice. Also return values seems to

be missing for the other functions.

Current C practice is language-specific. However binding to C should be able to use such a result as a returned function value.

24. Page 21: It should not be normatively required that COMPCAR be equivalent

to CARABIN and COMPBIN. CARABIN produces output that is not necessary for

some use of COMPCAR.

The binary output strings will be removed as a result of this function.

25. Page 21, last paragraph: It should not be prescribed that there be

binary strings used for comparisons, in the COMPCAR function. Also the

"default" table mentioned here is the global locale, and not the 14651 default.

This should be clarified, maybe using "global" instead of "default".

Accepted.

26. Page 22: all parameters should be spelled out, and references to other

APIs when defining the parameters should be avoided.

Parameter explanations will be repeated instead of refering to other procedures' parameter descriptions.

27. Page 25 second paragraph: the default table cannot be used per se, as it

needs tailoring. See our comment 6 on how to solve this.

Tailoring facilities will be mandated.

28. Page 27, first paragraph: this description is very oriented towards

the binary sort string. Descriptions also valid for COMPCAR method

without binary sort strings should be present. We would request a separate

descripti on how COMPCAR can be implemented, especially pointing out that only

comparison of the first (few) characters are necessary in many cases, and

that generating binary sort strings is typically not necessary.

Emphasis will be reduced as far as possible.

29. Page 27: level 1: Some non-letters, for example Kana, may have more than

one character at the first level.

Term Normally changed to Generally. Terms each character changed to basic characters.

30. Page 27: note of 5.3.2.1: Combining accents may have ignore at level 1,

and then values at level 2. Should that not lead to full predictability?

If meaningful results are to be achieved with an implementation which processes senseful equivalences, there is then indeed a loss of result predictibility unless an extra level is added (which makes processing more complex and less economical for little adavantage). Otherwise, predictability can be achieved but it can at the limit make no sense as unrelated objects are then compared.

31. Page 29: level one: Use the API names instead of "SUBPROGRAMME"

Accepted.

32. Page 29: what is the difference between level 2 and 4? In traditional

locale invocation there is not that difference, but some other

difference. Maybe level 4 should always be required.

Conformance will be considerably simplified and number of conformance levels reduced.

33. Page 31: COLL_WEIGHT_MAX is not a directive of 14652.

We take note.

34. Page 31: Some scripts are not (yet) in IS 10646, for example the Yi and

Canadian syllable scripts.

This is true. There is no harm.

35. Page 31: We should assure that comments are allowable all the places used

here according to 14652, and possibly change 14652 to allow them.

Good remark to be done to ISO/IEC 14652 if this is lacking in the latter.

36. Page 41-51: a number of the symbols defined here are also defined later.

Example <a8> defined on page 46 and page 79. This is not allowed according

to 14652 (giving a symbol two weights).

This will be fixed.



37. Page 111: (4) There needs to be a strong warning that binary strings stored

cannot be used internationally for culturally correct sorting,

as they are stored in a localized form. Or we should simply advise against it.

Accepted.

38. Page 112: the text seems obsolete, as these concepts have been proven.

Current Annex A will be removed.

39. Page 115: Also list ISO/IEC 9945-2 POSIX shell and utilities, especially

annex G, as a source.

Accepted.

40. Page 118, paragraph 7: There is only a need for 4 levels, not 5.

Accepted.

41. Page 118, paragraph 7:

Is it necessary to have an extra level for 10646 conformance level 3? Maybe

in some cases but not generally. When sorting the combining characters

per se, there is no need for a further level.

See disposition of comment 30 above.

42. Page 119: paragraph 9: We thought this was proven not to be true. Or is this

some implementation guideline (which then should noted as such).

Paragraph 9 will be removed.

43. Page 120: Annex I should be explained further, especially how it fits into

the internationalization model.

Annex I will be changed to avoid using elemnts that are irrelevant to parameters used in the API or the tables.



Israeli National Body comments

The SII votes NO on CD 14651. If items 1, 2 and 3 were to be accepted,

our vote would become YES.

1. Hebrew Accents

The Hebrew accents (UO591 to UO5AF), Meteg (UO5BD) and Upper Dot (UO5C4)

do not participate in the string ordering process. They relate, in fact,

to the whole word, rather than to the letter to which they are attached,

and are never used in the lexicographic order or in any other ordering of

Hebrew texts.

- The Hebrew accents should be removed from the list of collating

symbols, page 35, and from page 45.

- On page 56 they should all be defined as:

- IGNORE; IGNORE; IGNORE; IGNORE;

Not accepted. The predictibility requirement is essential to this standard. It does not harm absence of predictibility requirement. The reverse harms other international requirements.

2. Composite characters and combining characters.

It seems that combining characters do not sort and compare as equivalent

to their precomposed encoding. For instance, the two strings "Gu:nther"

and "Gu:nther", the first coded with UOOFC, the second with UOO75 followed

by UO3O8, are equivalent and should not be distinguished but are not

equivalent in the CD. The particular coding used is an artifact, possibly

not under the control of the user, and is normally meaningless.

This is correct. Equivalences can be made absolutely equal by prehandling, which is part of the general model of ordering.

3. Introduction, page 6, last paragraph: "If two equivalent strings are

not absolutely identical, then the tie must be broken."

This sentence is not acceptable. If two strings are equivalent they

should be treated as such. For example, Hebrew strings that are

equivalent but have different accents.

Different levels of precision exist. What is equivalent up to a certain number of precision may not be equivalent beyond, at a higher precidion level..

4. Introduction, page 4 (Editorial):

The introduction begins with a negative statement and continues with a

criticism of past practices. The SII suggests it should be preferable to

begin with a positive statement describing what the standard is and what

are its benefits.

Most of the current introduction will be moved to an informative annex and this will diminish emphasis on this statement which will not present the standard.

5. Tutorial, page 7 (Editorial).

The tutorial would be better placed in an informative appendix.

Accepted.

6. Page 35 (Editorial).

The comment should be qubuts (the s is mussing).

Accepted.


Canadian National Body comments

The Canadian NB votes YES to the question asking if this CD is satisfactory to be voted on as a Draft International Standard with the following comments that have to be accommodated.

___________________

Technical comments:

___________________

1.Add a note at the end of 5.1.2 to indicate that escape sequences should be filtered out or transformed during the prehandling phase:

Accepted.

2.At least in annex 1 put the following note: "In this default ordering

table, a number of scripts are missing, in some cases due to lack of data at

time of editing, in other cases due to the non-inclusion of those scripts in

10646 at time of publishing. It is the intent of ISO/IEC to complete

ordering of those scripts explicitly in the default whenever data becomes

available by way of amendments to this international standard. In the

meantime, implementers are encouraged to take advantage of tailoring to meet

user sorting requirements. If the default table is not tailored for

unspecified characters, then an implicit order is assigned in the following

table, which might not meet user requirements of a particular community".

Accepted.

3. The toggles should be moved from the API to the pre-handling phase

A liaison statement from WG14 has informed us that having toggles in the

API clashed with the C/POSIX i18n model, in which the APIs should be

culturally neutral and all cultural/localization data should be given via

the locale.

In order to preserve the model, yet retain the ease of use of the toggles

for tailoring the table, we propose the following:

p. 19 and 23: Remove the order_accents, order_case and sign_espace

parameters and modify the text accordingly.

Accepted.

We disagree with the intentional mistakes made for implementing the toggles

in the table data in annex 1. We strongly suggest that a syntactic

preprocessing mechanism handling toggles specifically be implemented in

ISO/IEC 14651 and documented in ISO/IEC 14652. Where the intentional

mistakes have been done in the default table, replace the occurrences by

conditional statements similar to these :

a) for space/NBSP toggling:

#if ToggleSpace

<00A0> IGNORE... [definition of NBSP]

#else

<0020> IGNORE... [definition of SPACE]

.

.

.

#if ToggleSpace

<0020> <espace>... [definition of SPACE]

#else

<0040> <espace>... [definition of NBSP]

b) for accent scanning toggling:

#if ToggleAccents

order_start <La>;forward;forward...

# else

order_start <La>;forward;backward...

c) for precedence of case toggling (the idea is here in one occurrence to omit

or put the MIN symbol before CAP in sequence and in the other occurrence

to put or omit the MIN symbol after CAP in sequence. The CAP symbol

occurs somewhere between the two statements below):

#ifnot ToggleCapsMin <MIN>

.

.

.

#if ToggleCapsMin <MIN>

With such a syntax, toggles "ToggleSpace", "ToggleAccents" and

"ToggleCapsMin" shall be turned ON by a specific statement specified in

ISO/IEC 14652. If not turned ON, they are OFF by default and need not be

mentioned in the tailoring.

Before final compilation of the table in prehandling, the toggles shall

modify the table and thereby affect the ordering as intended.

Accepted in principle. Syntax will be slightly modified.


Editorial comments

________________

I. Make the introduction smaller and the explanations in an informative

annex. Essential paragraphs that should stay in the introduction are

suggested to be the 1st, 2nd, 6th, 8th and 10th.

Accepted.

II. In 5.1.2 after "to transform fields" add "and unstructured records into

structured fields".

Accepted.

III. Annex I Dialog Box should be only applicable to what is in ISO/IEC 14651.

Irrelevant elements such as system direction, bidirectional options and

text behaviour should be removed.

Accepted.

IV. p. 17, put explanatory parenthesis after each subprogramme name the first

time it is used.

Accepted.

V.p.18 change "parameters de sortie" to "output parameters".

Accepted.

VI. p.26 rephrase 5.3.2 to make understood the relationship between symbolic

tables and numeric tokens. Indicate how subkeys are arranged to form a

key and suggest how to use that key to make binary comparisons (with

special considerations for the delimiters between keys and a note, mentioning

that there is a possibility to avoid them, with references).

Accepted.

VII. Put the tutorial as an informative annex.

Accepted.

VIII. Remove or complete incomplete references in Annex C, i.e. references to

non-documents.

Accepted.

IX.p.51 Title for "Control characters" should read "Control characters and

associated symbols; caractères de commande et symboles associés".

Accepted.