Collection of reactions to the WG20 convenor's "Personal thoughts about the future of WG20"

SC22/WG20 N775
L2/00-308

Collection of reactions to the WG20 convenor's
"Personal thoughts about the future of WG20"

Part 2, from September 7 through September 13, 2000

Akio Kido suggested that I collect all reactions to my proposal about the future of WG20 in one document for easy reference. Due to the interest in this subject, it became a rather lengthy document and I decided to put a linked index in front of it – that allows you to go straight to the contribution that interests you. I did not do any formatting – please apologize, if text in html does not look as good as it could be, but I wanted to maintain the original form of the e-mails the way I received them.

The document got too long – I had to split it into parts:

Parts	SC22/WG20	NCITS/L2 - UTC
Part 1, from August 30 – September 6, 2000	N774	L2/00-307
Part 2, from September 6 on …	N775	L2/00-308

Index with the latest document on top: (Status September 13, 2000)

National Body	Name	Date	Content	Supports N3164
USA	Ken Whistler	2000-09-13	Character properties. 9945-2	Y
USA	Ken Whistler	2000-09-12	WG20 projects	Y
Norway	Keld Simonsen	2000-09-12	Comments on WG20 projects	N
W3C	Martin Dürst	2000-09-11	I18N in W3C	Y
Norway	Keld Simonsen	2000-09-07	Answer to Whistler's e-mail	N
Canada	Glen Seeds	2000-09-07	Value of I18N	N
France	Antoine Leca	2000-09-07	LC_CTYPE in POSIX	?

Individual contributions on e-mail:

France, Antoine Leca, September 7, 2000

From: Kenneth Whistler [kenw@sybase.com]

> 3. Character Properties

> The most contentious issue regarding DTR 14652 is the effort to

> extend LC_CTYPE to cover the repertoire of ISO 10646-1. The contending

> positions effectively reflect a worldview divide among the participants

> regarding character properties:

> Position A: Character properties have not traditionally been covered

> by character encoding standards, and have not been viewed as the

> domain of the ISO committee responsible for encoding characters: SC2.

> Instead, character properties are an implementation issue, traditionally

> dealt with in the standards most directly concerned with character

> implementation -- namely the formal language standards -- and are

> dealt with in ISO by the working groups under SC22. In the context

> of 14652, the appropriate place to define character properties is

> LC_CTYPE, where the properties would be usable in a POSIX context as

> part of locale definitions.

May I point out that POSIX in this area just provide two things:

- a portable way to "formalize" LC_CTYPE (the localedef mechanism), which

is the very thing that PDTR 14652 is improving; this is covered by

Ken's previous discussion, as I see things;

- a mandatory implementation of the minimum subset, the "POSIX" locale,

which he inherited really from Unix V7 ff., but formally that he

inherits from the C Standard.

As such, one may also consider involving WG14. Furthermore, the new revision

of the C standard provides some support for the UCS. If this extensions

are used (and this is a pre-requesite for them to be used in POSIX context),

then it would be a natural extension in the next amendment/revision of the

C Standard to provide mandatory rules for the character properties: for

example, to somewhat require iswupper(L'\u0410') to return nonezero;

currently, this is not the case (nothing is required here).

Furthermore, I had a discussion within the POSIX group some months ago.

As a result, in the "POSIX" locale, iswupper(L'\u0410') is expected to return 0.

</sidenote>

Canada, Glen Seeds, September 7, 2000

Sounds like we've started another really interesting thread.

My reactions on postings so far:

- I agree with those who say that trying to develop an API in a horizontal

group makes no sense. That should be left to the individual programming

language groups.

- I also agree that we should try to avoid invention wherever possible, and

search for existing practice that can be codified. Where invention is

unavoidable, we should try to keep it at as high a level as possible.

- I disagree that this topic is outside the proper province of programming

languages. All such specifications include libraries or similar facilities

that, while not central to the language syntax and semantics, are needed in

order to make the average programmer's life tenable, and to facilitate

common solutions to common problems. This includes things such as I/O and

string handling. i18n is another thing of this type.

- I agree that the single most important issue is enabling conformance to

10646. I don't agree that this is the only important issue. Handling of

other cultural conventions in a standard way is also extremely important.

- I don't agree that leaving this to individual vendors would be a

reasonable way to address this need. As a user, it costs my company a great

deal to have to work around the differences between vendors in this area,

and having standard solutions that they all conform to would be of

considerable value to us. I have to tell you that in the face of the absence

of this, we are adopting the same approach as was described for Metaphor: we

are forbidding use of the vendors' facilities, and implementing our own. We

are not at all happy about having to do this, and are critical of vendors'

slow adoption of things such as UTF-8 and 14651.

- I don't agree that the differences between the different approaches in

different programming languages is in the same category as the problem

above. However, I would like to make a point here that has not been made

yet:

The most significant objective that an i18n standardization group could

achieve is a specification for a minimum *set of cultural issues* where

conforming systems should support variability, and a standard way if

*interchanging the encoding rules* for that variability. This is where

international expertise is most needed and most effective. It is also where

existing vendors tend to have the fewest opinions and vested interests that

bog down the standardization process. The Austin Group in particular has

said that they are waiting for direction from ISO before doing any further

work in these areas.

(It's also the area where they get themselves into the most trouble on their

own. A good example of this is the current Austin Group discussion on

"collation order" versus "collation sequence" in regular expressions.)

We have already achieved a lot in this area in the form of 10646 and 14651.

It's unfortunate that the next step, 14652, failed to become anything beyond

a TR. As a user, I have a strong interest in seeing this work go forward. I

can't imagine a better place than SC22/WG20. The areas most affected are the

PL's and OS's, but none of them have the expertise to put together a

statement of generic issues and solutions.

Having said all that, I agree the lack of new progress shows that a review

is in order. We should ask the other groups, especially those in SC22, what

their concerns are in this area, and what sort of process they would buy

into that would allow us to move forward.

/glen

Norway, Keld Simonsen, September 7, 2000

On Wed, Sep 06, 2000 at 09:43:45AM -0400, Winkler, Arnold F wrote:

> From: Kenneth Whistler [kenw@sybase.com]

> Sent: Friday, September 01, 2000 4:40 PM

> Subject: Some technical issues regarding the future of SC22/WG20

> ================================================================

> Arnold Winkler has recently raised a number of issues regarding the future

> of SC22/WG20 and the standards that it maintains or has under

> development, for consideration at the upcoming SC22 plenary in Nara.

> Chief among the issues he raised is whether WG20 is now at the

> end of its useful life, and whether it should be sunsetted, with

> its various projects redistributed over time to other committees as

> appropriate for maintenance.

As I have written in another message, I see WG20 as just now

being able to get to work on real issues. WG20 has been in

the process of taking control over its subject, producing

standards that were the best in SC22 on the subject, but

in essence not much better than say, C, C++, Ada, Fortran, COBOL

or POSIX specifications.

There is a long way to go, if we want ISO standards to be leading in

the field, eg in the area of APIs. Current widespread APIs on the

market like Microsoft NT APIs or IBM ICU have maybe 3 times as many

APIs that what we have worked on in WG20.

This is very normal that ISO standards do not contain all functionality,

and for WG20 the work item was actually restricted to a specific

quite small set of functionality when the NP was accepted.

> 1. Collation

> Furthermore, among the active participants in WG2 are the experts

> on collation (with implementation experience) who actually ended

> up authoring much of the content of 14651. Comparable experience is

> not obviously available in the SC22 committees other than WG20.

> Furthermore, because of the current close working relationship

> between WG2 and the Unicode Technical Committee, WG2 is also the

> best place to maintain a standard that should stay in synch with

> the Unicode Collation Algorithm maintained by the UTC, to prevent

> unanticipated "drift" between the two standards.

The argument that the sorting expertise is in SC2 is a myth.

I do not encounter sorting experts in SC2 - beyound the ones I already

know in SC22. And some SC22 experts I rarely see in SC2.

Furthermore it is important that there be a strong realtion to SC22

producers so there not be a "drift" from other SC22 sorting

specifications in this area, such as POSIX or C specs.