SC22/WG20 N775
L2/00-308

Collection of reactions to the WG20 convenor's
"Personal thoughts about the future of WG20"

Part 2, from September 7 through September 13, 2000

 

 

Akio Kido suggested that I collect all reactions to my proposal about the future of WG20 in one document for easy reference.  Due to the interest in this subject, it became a rather lengthy document and I decided to put a linked index in front of it – that allows you to go straight to the contribution that interests you.  I did not do any formatting – please apologize, if text in html does not look as good as it could be, but I wanted to maintain the original form of the e-mails the way I received them.

 

The document got too long – I had to split it into parts:

 

Parts

SC22/WG20

NCITS/L2 - UTC

Part 1, from August 30 – September 6, 2000

N774

L2/00-307

Part 2, from September 6 on …

N775

L2/00-308

 

Index with the latest document on top: (Status September 13, 2000)

 

National Body

Name

Date

Content

Supports
N3164

USA

Ken Whistler

2000-09-13

Character properties. 9945-2

Y

USA

Ken Whistler

2000-09-12

WG20 projects

Y

Norway

Keld Simonsen

2000-09-12

Comments on WG20 projects

N

W3C

Martin Dürst

2000-09-11

I18N in W3C

Y

Norway

Keld Simonsen

2000-09-07

Answer to Whistler's e-mail

N

Canada

Glen Seeds

2000-09-07

Value of I18N

N

France

Antoine Leca

2000-09-07

LC_CTYPE in POSIX

?

 

 

 

Individual contributions on e-mail:

 

 

France, Antoine Leca, September 7, 2000

 

From: Kenneth Whistler [kenw@sybase.com]

>

> 3. Character Properties

>

> The most contentious issue regarding DTR 14652 is the effort to

> extend LC_CTYPE to cover the repertoire of ISO 10646-1. The contending

> positions effectively reflect a worldview divide among the participants

> regarding character properties:

>

> Position A: Character properties have not traditionally been covered

> by character encoding standards, and have not been viewed as the

> domain of the ISO committee responsible for encoding characters: SC2.

> Instead, character properties are an implementation issue, traditionally

> dealt with in the standards most directly concerned with character

> implementation -- namely the formal language standards -- and are

> dealt with in ISO by the working groups under SC22. In the context

> of 14652, the appropriate place to define character properties is

> LC_CTYPE, where the properties would be usable in a POSIX context as

> part of locale definitions.

 

 

May I point out that POSIX in this area just provide two things:

- a portable way to "formalize" LC_CTYPE (the localedef mechanism), which

  is the very thing that PDTR 14652 is improving; this is covered by

  Ken's previous discussion, as I see things;

- a mandatory implementation of the minimum subset, the "POSIX" locale,

  which he inherited really from Unix V7 ff., but formally that he

  inherits from the C Standard.

 

As such, one may also consider involving WG14. Furthermore, the new revision

of the C standard provides some support for the UCS. If this extensions

are used (and this is a pre-requesite for them to be used in POSIX context),

then it would be a natural extension in the next amendment/revision of the

C Standard to provide mandatory rules for the character properties: for

example, to somewhat require iswupper(L'\u0410') to return nonezero;

currently, this is not the case (nothing is required here).

<sidenote>

Furthermore, I had a discussion within the POSIX group some months ago.

As a result, in the "POSIX" locale, iswupper(L'\u0410') is expected to return 0.

</sidenote>

 

 

Canada, Glen Seeds, September 7, 2000

 

Sounds like we've started another really interesting thread.

My reactions on postings so far:

 

- I agree with those who say that trying to develop an API in a horizontal

group makes no sense. That should be left to the individual programming

language groups.

 

- I also agree that we should try to avoid invention wherever possible, and

search for existing practice that can be codified. Where invention is

unavoidable, we should try to keep it at as high a level as possible.

 

- I disagree that this topic is outside the proper province of programming

languages.  All such specifications include libraries or similar facilities

that, while not central to the language syntax and semantics, are needed in

order to make the average programmer's life tenable, and to facilitate

common solutions to common problems. This includes things such as I/O and

string handling. i18n is another thing of this type.

 

- I agree that the single most important issue is enabling conformance to

10646. I don't agree that this is the only important issue. Handling of

other cultural conventions in a standard way is also extremely important.

 

- I don't agree that leaving this to individual vendors would be a

reasonable way to address this need. As a user, it costs my company a great

deal to have to work around the differences between vendors in this area,

and having standard solutions that they all conform to would be of

considerable value to us. I have to tell you that in the face of the absence

of this, we are adopting the same approach as was described for Metaphor: we

are forbidding use of the vendors' facilities, and implementing our own. We

are not at all happy about having to do this, and are critical of vendors'

slow adoption of things such as UTF-8 and 14651.

 

- I don't agree that the differences between the different approaches in

different programming languages is in the same category as the problem

above. However, I would like to make a point here that has not been made

yet:

 

The most significant objective that an i18n standardization group could

achieve is a specification for a minimum *set of cultural issues* where

conforming systems should support variability, and a standard way if

*interchanging the encoding rules* for that variability. This is where

international expertise is most needed and most effective. It is also where

existing vendors tend to have the fewest opinions and vested interests that

bog down the standardization process. The Austin Group in particular has

said that they are waiting for direction from ISO before doing any further

work in these areas.

 

(It's also the area where they get themselves into the most trouble on their

own.  A good example of this is the current Austin Group discussion on

"collation order" versus "collation sequence" in regular expressions.)

 

We have already achieved a lot in this area in the form of 10646 and 14651.

It's unfortunate that the next step, 14652, failed to become anything beyond

a TR. As a user, I have a strong interest in seeing this work go forward. I

can't imagine a better place than SC22/WG20. The areas most affected are the

PL's and OS's, but none of them have the expertise to put together a

statement of generic issues and solutions.

 

Having said all that, I agree the lack of new progress shows that a review

is in order. We should ask the other groups, especially those in SC22, what

their concerns are in this area, and what sort of process they would buy

into that would allow us to move forward.

 

   /glen

 

 

Norway, Keld Simonsen, September 7, 2000

 

On Wed, Sep 06, 2000 at 09:43:45AM -0400, Winkler, Arnold F wrote:

>

> From: Kenneth Whistler [kenw@sybase.com]

> Sent: Friday, September 01, 2000 4:40 PM

> Subject: Some technical issues regarding the future of SC22/WG20

>

> ================================================================

>

> Arnold Winkler has recently raised a number of issues regarding the future

> of SC22/WG20 and the standards that it maintains or has under

> development, for consideration at the upcoming SC22 plenary in Nara.

> Chief among the issues he raised is whether WG20 is now at the

> end of its useful life, and whether it should be sunsetted, with

> its various projects redistributed over time to other committees as

> appropriate for maintenance.

 

As I have written in another message, I see WG20 as just now

being able to get to work on real issues. WG20 has been in

the process of taking control over its subject, producing

standards that were the best in SC22 on the subject, but

in essence not much better than say, C, C++, Ada, Fortran, COBOL

or POSIX specifications.

 

There is a long way to go, if we want ISO standards to be leading in

the field, eg in the area of APIs. Current widespread APIs on the

market like Microsoft NT APIs or IBM ICU have maybe 3 times as many

APIs that what we have worked on in WG20.

 

This is very normal that ISO standards do not contain all functionality,

and for WG20 the work item was actually restricted to a specific

quite small set of functionality when the NP was accepted.

 

> 1. Collation

>

> Furthermore, among the active participants in WG2 are the experts

> on collation (with implementation experience) who actually ended

> up authoring much of the content of 14651. Comparable experience is

> not obviously available in the SC22 committees other than WG20.

> Furthermore, because of the current close working relationship

> between WG2 and the Unicode Technical Committee, WG2 is also the

> best place to maintain a standard that should stay in synch with

> the Unicode Collation Algorithm maintained by the UTC, to prevent

> unanticipated "drift" between the two standards.

 

The argument that the sorting expertise is in SC2 is a myth.

I do not encounter sorting experts in SC2 - beyound the ones I already

know in SC22. And some SC22 experts I rarely see in SC2.

 

Furthermore it is important that there be a strong realtion to SC22

producers so there not be a "drift" from other SC22 sorting

specifications in this area, such as POSIX or C specs.

 

> 3. Character Properties

>

> The most contentious issue regarding DTR 14652 is the effort to

> extend LC_CTYPE to cover the repertoire of ISO 10646-1. The contending

> positions effectively reflect a worldview divide among the participants

> regarding character properties:

>

> Position A: Character properties have not traditionally been covered

> by character encoding standards, and have not been viewed as the

> domain of the ISO committee responsible for encoding characters: SC2.

> Instead, character properties are an implementation issue, traditionally

> dealt with in the standards most directly concerned with character

> implementation -- namely the formal language standards -- and are

> dealt with in ISO by the working groups under SC22. In the context

> of 14652, the appropriate place to define character properties is

> LC_CTYPE, where the properties would be usable in a POSIX context as

> part of locale definitions.

>

> Position B: Character properties for the *universal* character set --

> namely ISO 10646 (= Unicode) are inherent to *characters*, and should

> *not* be defined in locales. The locale model and LC_CTYPE were an

> attempt to provide a mechanism for dealing with properties of characters

> in alternate encodings, but that model does not scale well for dealing

> with properties for the universal repertoire of 10646. Furthermore,

> it is inappropriate to assert that character properties are defined

> in locales, and are thus subject to locale-specific variation, since

> such a position would lead to inconsistent and inexplicable differences

> in application behavior, depending on locale, in ways that have

> no bearing on the usually understood issues of locale-specific

> formatting differences, etc. Because character properties are closely

> tied to the characters themselves, responsibility for defining them

> should belong with the character encoding committees, rather than

> with the language committees -- and thus in SC2, rather than SC22.

>

> It is clear that among the rather large community of implementers

> of 10646 (= Unicode), Position B has much more widespread support

> than Position A. Position A is, however, a vocally held minority

> opinion among those committed to the extension of the POSIX framework.

 

On the other hand, in the UNIX/POSIX/C circles Position A is much

more widespread.  Position B is voiced very actively by a small

group of about 20 companies in the Unicode consortium.

In terms of machines actually employing the two different positions,

there is about 20 million or more in the UNIX/Linux community using

it in the Position A way, while Position B is only standard on

Windows 2000 which has less than 10 millions systems installed.

 

However, the difference between Position A and B is in practice

not big. Most agree that attributes are associated to characters,

however there are some culturally dependent character properties,

such as the Turkish mappings between uppercase and lowercase

for the letter "I" and display of native digits.

 

> In point of actual fact, the *real* work on standardization of

> 10646 character properties is being done almost entirely

> by the Unicode Technical Committee, which for years now has been

> publishing machine-readable tables of character properties and

> associated technical reports that are in widespread implementation

> in many products. A very few character properties, most notably

> "combining" and "mirroring", are also formally maintained by SC2/WG2 in

> ISO 10646 itself, and those properties are tracked in parallel by

> the UTC.

 

There has also been a lot of work going on in POSIX circles,

with character properties for more that 20.000 characters

already defined in the POSIX.2 standard that was finished

in 1992. It is maybe a sign of how well researched the Unicode

specifications are that this fact is still unnoticed by prominent

Unicode people.

 

> On balance, it would seem far preferable to conclude that within

> JTC1 any responsibility for character properties should belong

> to SC2, rather than SC22. Once again, this is a matter of expertise

> regarding the huge number of characters in 10646. That expertise

> is in SC2, and not in SC22. And the implementation experience

> regarding character properties resides in the UTC, which has a

> firm working relationship with SC2, but no close ties to SC22.

 

Again, the existence of SC2 experts in this area is a myth.

I believe that Unicode has experts, but they are as well connected

to WG20 as to SC2, having C liaison status in both groups.

Furthermore the Unicode technical committee chairman, Arnold Winkler,

is the convener of WG20. No high-ranking Unicode officers have

the same level of office in SC2.

 

SC2 has for a long time said that they were only into the encoding

of characters, not the meaning. I think still this is a reasonable

approach.

 

 

> Regarding LC_CTYPE in particular, the maintenance or extension of

> LC_CTYPE should be remanded to WG15, along with all of DTR 14652,

> but with the following recommendations: Rather than attempting to

> independently extend LC_CTYPE definitions to cover 10646, a mechanism

> should be developed whereby POSIX implementations using LC_CTYPE

> can make use of the more widespread and better researched and

> reviewed character property definitions developed by the UTC, in

> cooperation with SC2/WG2's development of 10646. This should be

> done by *reference*, rather than by enumerating lists of characters

> in SC22 standards or TR's, because of the danger of those lists

> getting out of synch or introducing errors that cause interoperability

> problems. Furthermore, this practice of dealing with character

> properties by reference to UTC and/or SC2 developed standards

> for them, should be recommended to *all* the SC22 committees, as

> the generic way to deal with character properties in formal

> language standards.

 

As said before, POSIX specs are more widespread than Unicode's,

in therms of systems employing them, and it seems like they may be

better researched, as they have included Unicode specifications

in their research, while Unicode still to this date is unaware of

their bigger competitor...

>

> 4. Internationalization API Standard

>

The i18n API project is another WG20 project to take control of

the subject of i18n, to become masters of our own house.

It is admittedly not very advanced, compared to some industry

APIs, this is partly due to SC22's decision to make a restricted

API. It is, however, with more functionalities than most

programming languages standardized in SC22, and aimed to take a

lead for SC22 standardization in this area.

 

> No one in WG20 but the project editor seems to be doing any active

> work to develop the API standard for internationalization, and the

> committee feedback to date has largely been that the quality of

> the drafts is poor. Fundamental questions regarding the nature

> of the API design have not been resolved. Furthermore, there has

> been a lot of hand-waving over the issue of how closely tied the

> proposed API is to the locale extension constructs of DTR 14652.

> The API under development for 15435 is locale-centric, in that

> it requires information in an "FDCC-set" defined a la DTR 14652,

> assuming API behavior will depend on that information, resident

> in some implementation-defined "database".

 

> Modern internationalization libraries have largely eschewed that

> kind of locale-centric design as too constrained, instead breaking up

> the problem of internationalization support into more modular

> designs that separate out different aspects of the problems

> involved.

 

Some modern i18n libraries still use locale-centric behaviour,

including POSIX compatible systems. As POSIX compatible

operating systems are the only major operating systems

gaining significant market shares these days, it cannot be all

that bad.  The i18n system of POSIX furthermore has facilities

so that you can orchestrate you own localization, which is

a virtue of the model. This is only recently that these mechanisms

have been taken up eg in microsoft systems, while posix systems

have done this for years. Java also have very similar concepts,

although they may maintain that it is completely different.

Seen from a users perspective, i18n using the POSIX model works

very well, in my personal experience.

 

The POSIX model is extensible, and is the only ISO standardized

model.

 

> Furthermore, the proposed API standard aspires to platform

> independent design. That, however, inappropriately conflates the

> issue of designing appropriate behavior for internationalization

> with the problem of designing appropriately abstracted API's

> for that behavior on distinct platforms. In actual practice,

> implementers are tending to make use of available libraries that

> surface correct internationalization behavior (such as the

> ICU classes) and then writing whatever wrappers are necessary to

> abstract that behavior into their systems. The days of trying

> to define complex behavior via ISO API standards, to be rolled

> out by language compiler vendors in standard C libraries and such,

> are being overtaken by object-oriented design and software

> component models.

 

Portablility across platforms is one of SC22's hallmarks,

and we achive it well with other standards such as the programming

language standards. The situations that is described above is

just the ones SC22 is set up to solve.

 

> At this point, WG20's project 15435 should just be abandoned as

> a well-intentioned but obsolete project that has no demonstrated

> need or support for its development.

 

The 15435 standard is primarily set up for other PL standards.

And furthermore, it is already implemented on major platforms

in major compilers (GNU C/C++)

 

> 6. Identifiers

 

WG20 was quite capable of producing the annex on 10176 on identifiers

and quite successful in getting it adopted by the Programming

Languages. WG20 has thus demonstrated it capabilities in

this area and there is no need to move the subject to somebody else.

WG20 even succeded to get Unicode to adopt the specifications.

 

> This entire issue, is, by the way, also of intense interest to

> the Database standards arena, where it is of direct relevance

> to the SQL standard, for example. So the SC22 working groups are

> not the only JTC1 groups with an interest in standard,

> interoperable results in this area for 10646 characters.

 

WG20 has liaison to the SQL WG, and furthermore acts as a focal

point for i18n for all of JTC 1, according to JTC 1 decisions.

 

Kind regards

Keld Simonsen

 

 

W3C, Martin Dürst, September 11, 2000

 

From: Martin J. Duerst [duerst@w3.org]

Sent: Monday, September 11, 2000 2:10 AM

To: John Hill, ISO/IEC JTC1 SC22 Chair

Cc: Lisa Rajchel ISO/IEC JTC1 SC22 Secretariat at ANSI

     Arnold Winkler, ISO/IEC JTC1 SC22 WG20 Convener

 

Type of document: Liaison Contribution

Subject: Future directions for WG20

 

For consideration at the Nara meeting of SC22

 

 

W3C herewith supports Arnold Winkler's recent proposal for the

future of SC22/WG20.

 

The experience with the internationalization of a wide range of

specifications at W3C strongly shows the following:

 

- The range of specifications with internationalization needs

   extends far beyond programming languages and includes document

   and data formats and protocols.

 

- Programming languages become more and more diverse, and most

   of a program's internationalization functionality is handled

   as part of libraries (input/output and user interface) where

   diversity is even bigger than in the programming language core.

 

- Internationalization cannot be done in isolation, but needs to

   be done by the committee responsible for the 'base' standard,

   with the participation, contribution, and review from

   internationalization experts. The main common base is the

   universal character set (ISO/IEC 10646).

 

 

With respect to the current work items of SC22/WG20, our input

is as follows:

 

- Sorting/Collation Standard (14651): The standard itself is close

   to completion, and should be completed by SC22/WG20. SC2/WG2 is the

   optimal place for further work on the data needed for the standard.

 

- List of characters for identifiers (Appendix to TR 10176):

   Again SC2/WG2 is the optimal place to extend this work to

   newly encoded characters.

 

- API for Internationalization (15435): Given the large variance

   across programming languages, and the increased importance of

   libraries and user interface components, a general API for

   internationalization is highly inappropriate.

 

- Registry for cultural conventions (ISO/IEC 15897): A good

   documentation on cultural conventions is very helpful for

   implementers of all kinds of information technology. In order

   to be of real value, the registry should:

   - Make the full information available on the World Wide Web.

   - Accept incomplete contributions (e.g. when only part

     of some cultural conventions are known or established).

   - Provide a full revision history for official registrations.

   - Accept contributions not only from the relevant national

     bodies, but also from the general public (and e.g. label

     them as 'not verified').

   - Accept multiple contributions for the same locale

     (and label them appropriately).

   - Besides registered information, provide pointers to related

     information elsewhere, in print or on the WWW.

   Once the registry is set up appropriately, the task of

   WG20 in this area can be considered completed.

 

 

The Type C Liaison between SC22/WG20 and the World Wide Web

Consortium (W3C), in particular the W3C Internationalization

Working Group (SC22 N3073) has been established to coordinate

internationalization issues between these two groups. Completion

of the current SC22/WG20 tasks as proposed by Arnold Winkler

and as discussed above, and transfer of the remaining character-

related responsibilities to SC2/WG2 completely satisfy the

needs of W3C and simplify the interaction between W3C and

ISO/IEC TC1 in the area of internationalization, because

W3C has already established a liaison with SC2/WG2.

 

 

Yours sincerely,   Martin J. Dürst.

 

 

Norway, Keld Simonsen, September 12, 2000

 

 

Arnold Winkler has recently raised a number of issues regarding the future

of SC22/WG20 and the standards that it maintains or has under

development, for consideration at the upcoming SC22 plenary in Nara.

Chief among the issues he raised is whether WG20 is now at the

end of its useful life, and whether it should be sunsetted, with

its various projects redistributed over time to other committees as

appropriate for maintenance.

 

However, I see WG20 as just now

being able to get to work on real issues. WG20 has been in

the process of taking control over its subject, producing

standards that were the best in SC22 on the subject, but

in essence not much better than say, C, C++, Ada, Fortran, COBOL

or POSIX specifications.

 

There is a long way to go, if we want truly internationalized,

portable applications, and ISO standards to be leading in

the field, here in the area of APIs. Current widespread APIs on the

market like Microsoft NT APIs or IBM ICU have maybe 3 times as many

APIs that what we have worked on in WG20.

 

This is very normal that ISO standards do not contain all functionality,

and for WG20 the work item was actually restricted to a specific

quite small set of functionality when the NP was accepted.

 

In general, I think the standardization of APIs and formats for data

specifications are best done in SC22, which standardizes

libraries, and also interacts with the many ISO programming languages.

 

Moving WG20 activities into SC2, as Arnold Winkler proposes,

would be an error, IMHO.

APIs are not in the scope of SC2. Neither are sorting or

character attributes. And sorting and character attributes

have for a long time been a SC22 issue, viz. C, and other

programming languages islower(), isupper() etc.

 

In the following I will give some comments on each of WG20's projects.

 

1. Collation

 

The argument that the sorting expertise is in SC2 is a myth.

The only sorting expert I encounter in SC2 - beyond the ones I already

know in SC22, is Michael Everson. And a number of SC22 experts that

always come the the WG20 meetings (at least during the last 2 years)

comes less regularily to SC2 meetings, this includes Ken Whistler,

Marc Küster, Kent Karlsson, Takata-San, and myself.

 

3. Character Properties

 

One school of thought, represented foremost by Unicode people,

think that character properties, such as what is a letter, digit,

and cpecial character, is an inherent property of the character itself

and cannot be changed, while another school thinks that character

properties may be culturally dependent, as per a C/C++/POSIX locale.

In terms of machines actually employing the two different positions,

there is about 20 million or more in the UNIX/Linux community using

it in the locale way, while the Unicode way is only standard on

Windows 2000 which has less than 10 millions systems installed.

 

However, the difference between the two schools of thought is in practice

not big. Most agree that attributes are associated to characters in a

fixed way, however there are some culturally dependent character

properties, such as the Turkish mappings between uppercase and lowercase

for the letter "I" and display of native digits.

 

On character properties, there has been some work going on

in Unicode, but also work going on in POSIX circles,

with character properties for more that 20.000 characters

already defined in the POSIX.2 standard that was finished

in 1992. It seems that this work has not till this date been

noticed by prominent Unicode people.

 

There is also a myth that the existence of experts in this area is

foremost in SC2.  I believe that Unicode has experts, but they are as

well connected to WG20 as to SC2, having C liaison status in both

groups. Beyond the Unicode people I see very few experts in SC2 on

this matter. On the other hand there are experts in SC22, including

experts in the different language WGs, the POSIX WG, and myself.

That Unicode should be less conntected to WG20 than to SC2 is for

me hard to fnderstand, with Unicode having C category liaison both

places, and furthermore the Unicode technical committee chairman,

Arnold Winkler, being the convener of WG20. No high-ranking Unicode

officers have the same level of office in SC2.

 

SC2 has for a long time said that they were only into the encoding

of characters, not the meaning. I think still this is a reasonable

approach.

 

3. cultural conventions specification standard, TR 14652

 

As said before, POSIX specs are more widespread than Unicode's,

in terms of systems employing them, and it seems like they may be

better researched, as they have included Unicode specifications

in their research, while Unicode still to this date is unaware of

their bigger competitor...

 

4. Internationalization API Standard

 

Some modern i18n libraries use locale-centric behaviour,

including POSIX compatible systems. As POSIX compatible

operating systems are the only major operating systems

gaining significant market shares these days, it cannot be all

that bad.  The i18n system of POSIX furthermore has facilities

so that you can orchestrate you own localization, which is

a virtue of the model. This is only recently that these mechanisms

have been taken up eg in microsoft systems, while posix systems

have done this for years. Java also have very similar concepts,

although they may maintain that it is completely different.

Seen from a users perspective, i18n using the POSIX model works

very well, in my personal experience.

 

The POSIX model is extensible, and is the only ISO standardized

model.

 

Portablility across platforms is one of SC22's hallmarks,

and we achive it well with other standards such as the programming

language standards. Also in the area of i18n SC22 and JTC 1

shouldstrve for applications portablilty.

 

The 15435 standard is primarily set up for other PL standards.

And furthermore, it is already implemented on major platforms

in major compilers (GNU C/C++).

 

6. Identifiers

 

WG20 was quite capable of producing the annex on 10176 on identifiers

and quite successful in getting it adopted by the Programming

Languages. WG20 has thus demonstrated it capabilities in

this area and there is no need to move the subject to somebody else.

WG20 even succeded to get Unicode to adopt the specifications.

 

WG20 has liaison to many parties inside and outside of SC22,

including the SQL WG, and furthermore acts as a focal

point for i18n for all of JTC 1, according to JTC 1 decisions.

 

Kind regards

Keld Simonsen

 

 

USA, Ken Whistler, September 12, 2000

 

Keld responded to a number of the concerns I had surfaced on

behalf of the U.S. committee. Here are some countercomments

which may lead into the discussion which is sure to ensue during

the upcoming Malvern meeting of WG20.

 

> > From: Kenneth Whistler [kenw@sybase.com]

> > Sent: Friday, September 01, 2000 4:40 PM

> > Subject: Some technical issues regarding the future of SC22/WG20

> >

> > ================================================================

> >

> > Arnold Winkler has recently raised a number of issues regarding the future

> > of SC22/WG20 and the standards that it maintains or has under

> > development, for consideration at the upcoming SC22 plenary in Nara.

> > Chief among the issues he raised is whether WG20 is now at the

> > end of its useful life, and whether it should be sunsetted, with

> > its various projects redistributed over time to other committees as

> > appropriate for maintenance.

>

> As I have written in another message, I see WG20 as just now

> being able to get to work on real issues. WG20 has been in

> the process of taking control over its subject, producing

> standards that were the best in SC22 on the subject, but

> in essence not much better than say, C, C++, Ada, Fortran, COBOL

> or POSIX specifications.

 

This is, unfortunately, a sad commentary on the quality of the

I18N work coming out of WG20 to date, and I concur with Keld's

assessment!

 

>

> There is a long way to go, if we want ISO standards to be leading in

> the field, eg in the area of APIs. Current widespread APIs on the

> market like Microsoft NT APIs or IBM ICU have maybe 3 times as many

> APIs that what we have worked on in WG20.

 

...and much greater sophistication, as well as precision of definition.

And you neglected to mention Java in this list.

 

As for the presuppostion here, that ISO standards should be leading

this field, see below. I agree with the essential assessment

that WG20 is *way* behind. But I differ with Keld in that I don't

think there is any feasible way for WG20 to do a decent job of

providing an I18N API standard.

 

>

> This is very normal that ISO standards do not contain all functionality,

> and for WG20 the work item was actually restricted to a specific

> quite small set of functionality when the NP was accepted.

 

I don't think there was any "specific quite small set of

functionality" defined in the NP. All along, the coverage of

15435 has essentially been precisely what the editor intended

it to be; I see no evidence of principled direction from the

committee that set or constrained the initial scope of the

proposed standard.

 

>

> > 1. Collation

> >

> > Furthermore, among the active participants in WG2 are the experts

> > on collation (with implementation experience) who actually ended

> > up authoring much of the content of 14651. Comparable experience is

> > not obviously available in the SC22 committees other than WG20.

> > Furthermore, because of the current close working relationship

> > between WG2 and the Unicode Technical Committee, WG2 is also the

> > best place to maintain a standard that should stay in synch with

> > the Unicode Collation Algorithm maintained by the UTC, to prevent

> > unanticipated "drift" between the two standards.

>

> The argument that the sorting expertise is in SC2 is a myth.

> I do not encounter sorting experts in SC2 - beyound the ones I already

> know in SC22. And some SC22 experts I rarely see in SC2.

 

Perhaps this is a result of attending more to SC2 committee matters

per se, rather than to WG2 or its liaison relation to the UTC.

 

Here are some examples: 4 experts on Myanmar sorting issues

at WG2 in London; 1 expert on Tibetan sorting at WG2 in London,

and *megabytes* of Tibetan input on a UTC hosted discussion list;

input on Kannada sorting from an expert just last week at the

International Unicode Conference; numerous other Indic inputs

from Jeroen Hellingham and other experts on the Unicode discussion

lists; Chinese input on Yi sorting issues in WG2 in London,

Fukuoka, and Beijing; participation from Arabic and Syriac

experts; Joe Becker; Asmus Freytag; Tex Texin (implemented at

Progress); Gary Richards (implemented at NCR); implementers

from Oracle; the designers and implementers of sorting in Java;

the designers and implementers of sorting in the IBM ICU; and

last, but not least, the designers and implementers of ML sorting

at Microsoft.

 

Would you care to make a corresponding, explicit list of the

SC22 experts in sorting that you rarely see in SC2, and what

their contributions might be to solving issues that must be

faced in extending the 14651 tables to cover such scripts as

Myanmar, Khmer, Mongolian, and Yi?

 

> Furthermore it is important that there be a strong realtion to SC22

> producers so there not be a "drift" from other SC22 sorting

> specifications in this area, such as POSIX or C specs.

 

Cute, but irrelevant. The standards to maintain in synch now

are ISO 14651 (when it is published), and the Unicode Collation

Algorithm. Everything else constitute defined deltas from those

standards, if you are talking about the tables to specify

ordering. If, on the other hand, you are talking about drift

in API's, that is also irrelevant, since my claim is that WG20

should not be making an API in this area.

 

>

> > 3. Character Properties

 

I'll pick up this topic separately.

 

> > 4. Internationalization API Standard

> >

> The i18n API project is another WG20 project to take control of

> the subject of i18n, to become masters of our own house.

> It is admittedly not very advanced, compared to some industry

> APIs, this is partly due to SC22's decision to make a restricted

> API. It is, however, with more functionalities than most

> programming languages standardized in SC22, and aimed to take a

> lead for SC22 standardization in this area.

 

This point was addressed by the W3C contribution on this topic from

Martin Dürst.

 

It is one thing to set general direction and requirements for

internationalization of programming languages, as in TR 11017

and TR 10176, but it is quite another to set out to create and

standardize an API in this area. The approach that WG20 is taking

flies in the face of good practice in API design: it has no

clear set of requirements to begin with, it has no guiding

architecture for the specifics of the API, and it has no well-defined

relationship to the *particular* language standards it is supposedly

being developed for.

 

The editor of 15435 has been pointedly ignoring the message from

internationalization experts from the OS and tools vendors that

no such ISO standard is needed or desired, and instead seems to

be listening primarily to the GNU C/C++ developers and to

plaintive calls from other SC22 working groups hoping that

WG20 will *solve* their internationalization problems. Even the

Linux internationalization experts have rejected involvement with

15435, and that is a community that the editor indirectly keeps

pointing to in order to justify WG20 projects.

 

>

> > Modern internationalization libraries have largely eschewed that

> > kind of locale-centric design as too constrained, instead breaking up

> > the problem of internationalization support into more modular

> > designs that separate out different aspects of the problems

> > involved.

>

> Some modern i18n libraries still use locale-centric behaviour,

> including POSIX compatible systems. As POSIX compatible

> operating systems are the only major operating systems

> gaining significant market shares these days, it cannot be all

> that bad.

 

Excuse me, but unless you have something strange in mind, you are

talking here about the growth in popularity of Linux. But the

Linux I18N group has rejected 15435 as an approach to dealing

with international of Linux. How is that an argument for WG20

continuing work on 15435?

 

> The i18n system of POSIX furthermore has facilities

> so that you can orchestrate you own localization, which is

> a virtue of the model. This is only recently that these mechanisms

> have been taken up eg in microsoft systems, while posix systems

> have done this for years. Java also have very similar concepts,

> although they may maintain that it is completely different.

> Seen from a users perspective, i18n using the POSIX model works

> very well, in my personal experience.

 

I am afraid this is looking at the problem with rose-colored

microscopes.

 

It is generally acknowledged that Unix systems have the least

flexible internationalization, least complete localization, and

least advanced Unicode support of all the major platforms. That

doesn't mean the Unix implementers aren't working on it, but from

an end-user's point of view they don't hold a candle to what is

available on Microsoft, Apple, or Java platforms.

 

Sure the POSIX model lets a Unix *developer* "orchestrate your

own localization", but you have to be a programmer, a standards

reader, and a system administrator as well to do so on most

systems. Most Unix end users are simply enslaved to whatever

templates got rolled out by their company's system administrators,

and cannot change a damn thing on their own. Most *real* Unix

installations, as opposed to developer machines with Unixoids

playing with the source code, simply run out some defined list

of precompiled locales, and the SA establishes settings for

the installation scripts. Woe to any end user who actually tries

to create some non-standardized behavior by manipulating LC_XXX

environment settings on their own -- that usually just results in

some program refusing to run or dishing out error messages

about missing files or messages.

 

And the computer end user who actually knows enough to even

try manipulating LC_XXX environment values is already in

the 99th percentile of computer experts. Try talking to some

*real* end users sometime.

 

>

> The POSIX model is extensible, and is the only ISO standardized

> model.

 

Another sad state of affairs.

 

>

> > Furthermore, the proposed API standard aspires to platform

> > independent design. That, however, inappropriately conflates the

> > issue of designing appropriate behavior for internationalization

> > with the problem of designing appropriately abstracted API's

> > for that behavior on distinct platforms. In actual practice,

> > implementers are tending to make use of available libraries that

> > surface correct internationalization behavior (such as the

> > ICU classes) and then writing whatever wrappers are necessary to

> > abstract that behavior into their systems. The days of trying

> > to define complex behavior via ISO API standards, to be rolled

> > out by language compiler vendors in standard C libraries and such,

> > are being overtaken by object-oriented design and software

> > component models.

>

> Portablility across platforms is one of SC22's hallmarks,

> and we achive it well with other standards such as the programming

> language standards. The situations that is described above is

> just the ones SC22 is set up to solve.

 

This is baloney.

 

The portability across platforms that SC22 aspires to (and largely

achieves) is portability of the source code for a particular

language (and its associated algorithmic semantics) across

platforms. This enables the building of conformant language

compilers on many platforms, and even cross-platform compilers

that merely substitute out machine-specific code-generation and

optimizer modules.

 

What I was alluding to is the issue of portability of API's across

different language standards, which is a whole different kettle

of fish. Even between C and C++, which are designed to be closely

compatible, you cannot take an object oriented C++ API and simply

"port" it to C -- the principles are just entirely different,

and C doesn't have the mechanisms to express an object-oriented

API (although you can try to emulate it with clever fakery). Now

consider trying to do the same thing for C++ and FORTRAN. It would

be ludicrous.

 

>

> > At this point, WG20's project 15435 should just be abandoned as

> > a well-intentioned but obsolete project that has no demonstrated

> > need or support for its development.

>

> The 15435 standard is primarily set up for other PL standards.

> And furthermore, it is already implemented on major platforms

> in major compilers (GNU C/C++)

 

"The 15435 standard ... is already implemented on major platforms..."

 

Pardon me if I do a double take on this particular one.

 

15435 has not yet even seen a draft that has been approved to go

out for a CD ballot. It is 2 years away from being a standard, even

if we all agreed on its content in November and decided to progress

it to a CD ballot.

 

And if the whole point of this exercise is to standardize some

practice in the GNU C/C++ compiler community, why isn't that

implementation clearly on the table, identified as such, with

the appropriate manpages and documentation, so that WG20 can

evaluate existing practice and its appropriateness for

standardization? Instead, WG20 has been treated to very bad

drafts in 15435 that have waffled all over the map about their

approach, and which have no clear relation to *any* implementation.

 

>

> > 6. Identifiers

>

> WG20 was quite capable of producing the annex on 10176 on identifiers

> and quite successful in getting it adopted by the Programming

> Languages. WG20 has thus demonstrated it capabilities in

> this area and there is no need to move the subject to somebody else.

> WG20 even succeded to get Unicode to adopt the specifications.

 

No, this is a misrepresentation of the facts.

 

WG20 succeeded in getting Unicode's attention regarding unprincipled

differences between what WG20 was recommending and what the Unicode

Consortium was recommending. Then there was joint work which resulted

in some changes to both (including an Amendment to 10176), so as

to minimize interoperability differences in the two approaches.

 

The UTC still recommends a superset of what TR 10176 suggests,

and TR 10176 has not yet addressed issues of normalization or

other specifics regarding use of extended identifiers on the

Internet.

 

>

> > This entire issue, is, by the way, also of intense interest to

> > the Database standards arena, where it is of direct relevance

> > to the SQL standard, for example. So the SC22 working groups are

> > not the only JTC1 groups with an interest in standard,

> > interoperable results in this area for 10646 characters.

>

> WG20 has liaison to the SQL WG, and furthermore acts as a focal

> point for i18n for all of JTC 1, according to JTC 1 decisions.

 

We all know there has been zero input either direction from

the SQL WG and WG20. The input on internationalization in the

SQL WG has all been coming in from external connections -- through

communications between the internationalization experts and the

SQL experts in the database companies, largely. The internationalization

in SQL is the result of Jim Melton working in database companies,

not the result of Jim Melton talking to WG20.

 

JTC 1 may decide that WG20 *shall* act as a focal point for all

internationalization in JTC 1 committees, but that doesn't make it

happen.

 

--Ken

 

 

USA, Ken Whistler, September 13, 2000

 

Now I am going to take up Keld's assertions about character properties.

 

> > 3. Character Properties

> >

> > The most contentious issue regarding DTR 14652 is the effort to

> > extend LC_CTYPE to cover the repertoire of ISO 10646-1. The contending

> > positions effectively reflect a worldview divide among the participants

> > regarding character properties:

[snip]

> >

> > It is clear that among the rather large community of implementers

> > of 10646 (= Unicode), Position B has much more widespread support

> > than Position A. Position A is, however, a vocally held minority

> > opinion among those committed to the extension of the POSIX framework.

>

> On the other hand, in the UNIX/POSIX/C circles Position A is much

> more widespread.  Position B is voiced very actively by a small

> group of about 20 companies in the Unicode consortium.

 

This "small group" includes Sun, IBM, HP, and Compaq, which companies,

between them, account for the majority of enterprise Unix installations.

It also includes Oracle, Sybase, NCR, IBM, Microsoft, and Progress, which

between them account for the vast majority of commercial database

installations, many of them running on Unix platforms -- including

Linux.

 

> In terms of machines actually employing the two different positions,

> there is about 20 million or more in the UNIX/Linux community using

> it in the Position A way, while Position B is only standard on

> Windows 2000 which has less than 10 millions systems installed.

 

Well, this just goes to show, there are lies, damn lies, and then

there are statistics.

 

How about some counter-statistics...

 

Information from Information Data Center, IT Forecaster, August 8, 2000.

 

Worldwide Client Operating Environment New License Shipment Shares 1999

 

Windows  87.7%

MacOS     5.0%

Linux     4.1%

Other     3.8%

 

Worldwide Server Operating Environment New License Shipment Shares 1999

 

Windows NT 36%

Linux      24%

NetWare    19%

Unix       15%

Host/Server 3%

Other       3%

 

Linux is growing rapidly in the low-end server market, it is true.

But a very large proportion of the Linux installations are running

web servers, and or file and print services, with no significant

front end user interaction. And on the web servers at least, the

installations are dishing up HTML pages, Java script, and Java aps

that are Unicode compliant. So even if the lowest level OS is

POSIX compliant, they are running layers of software that deal

with characters the Unicode way. Another example: Sybase ships its

database software for Linux platforms now (as do most other database

companies). That software supports character properties in databases

the Unicode way, and does not depend on POSIX-compliant localization

at the OS level to make decisions about how to treat data.

 

So simple-minded tossing out of numbers about X-million systems

installed as a way of supporting a particular technical approach

to defining character properties is nothing more than smoke and

mirrors.

 

>

> However, the difference between Position A and B is in practice

> not big. Most agree that attributes are associated to characters,

> however there are some culturally dependent character properties,

> such as the Turkish mappings between uppercase and lowercase

> for the letter "I"

 

The local specifics for Turkish case mapping are well known. All

this (and all other instances we know about) is documented in

SpecialCasing.txt on the Unicode website.

 

But case mapping is not even properly a "property" of characters --

it is a *relation* between pairs (or triples) of characters, and

only the most obvious of many such types of relations. The

relation between Hiragana and Katakana is another such relation.

 

And simply because there are some celebrated (and acknowledged)

locale-specific differences in case mapping for a few characters

does not mean that one therefore needs to specify the entire

apparatus of character property definition *inside* locale

definitions. That is definitely an instance of the tail wagging the

dog.

 

> and display of native digits.

 

This is an artifact of incorrect old implementations of Arabic that

did not have proper encodings for characters. Those mistakes,

which are not necessary to duplicate in a Unicode implementation,

have no bearing on which committee is in a better position *now*

to specify character properties for 10646.

 

>

> > In point of actual fact, the *real* work on standardization of

> > 10646 character properties is being done almost entirely

> > by the Unicode Technical Committee, which for years now has been

> > publishing machine-readable tables of character properties and

> > associated technical reports that are in widespread implementation

> > in many products. A very few character properties, most notably

> > "combining" and "mirroring", are also formally maintained by SC2/WG2 in

> > ISO 10646 itself, and those properties are tracked in parallel by

> > the UTC.

>

> There has also been a lot of work going on in POSIX circles,

> with character properties for more that 20.000 characters

> already defined in the POSIX.2 standard that was finished

> in 1992. It is maybe a sign of how well researched the Unicode

> specifications are that this fact is still unnoticed by prominent

> Unicode people.

 

Well, now, let's just take a look, shall we?

 

ISO/IEC 9945-2:1993 (E) (= IEEE Std 1003.2-1992), in two volumes, right?

 

Part 2 is Shell and Utilities, and the majority of the normative

text constitutes the specification of the behavior of the shell,

and of all the POSIX utility programs. Other than a short few pages

about character set definition in general, the only significant

specification of character properties in the entire document can

be found in Annex G (informative) Sample National Profile, pp. 1063 -

1192. And guess what, that sample national profile is none other

than the *Danish* National Profile Example, authored by Keld.

Note also, that Annex G is *informative* in POSIX.2, though

it contains normative-sounding "shall" terminology that sounds

as if it is lifted from a DSA specification.

 

Annex G is effectively the source of the "i18n" FDCC-set definition

proposed in DTR 14562, minus its Danish-specific component, which

lives on, instead, in Annex B.1.3.3, the Sample FDCC-set

specification for Danish.

 

The Unicode participants in the WG20 work have assumed all along

that the "son of POSIX" work in 14652 represented extensions,

corrections, and emendations of any previous work. That would mean

that the 14652 drafts would be the more pertinent to consider in

comparison with current work on Unicode character properties. And

such would not conflict with the editor's own representations about

the status of the tables in the DTR 14652.

 

But since Keld claims that the Unicode character property work

is not well-researched, because it hasn't taken into account the

published POSIX.2 standard from 1992, maybe it does make sense

to skip past the 14652 drafts and go back to the earlier source.

 

The only mention of "more tha[n] 20.000 characters" in Annex G

can be found on p. 1066:

 

"The symbolic ellipsis benefits especially those locale definitions

with large character sets. For example there are about 6000 Kanji

characters in JIS X0208 {B26} and about 20 000 ideographic

characters (in a different order) in ISO/IEC 10646-1 {B13}. To

create a Japanese locale that can support JIS X0208 {B26} and

ISO/IEC 10646-1 {B13} code sets with code-value ellipses, two

separate charmaps and two separate locale definitions must be

created."

 

Note this is telling you how to create a shortcut representation

of a *charmap*, and doesn't in fact specify any character properties

for anything.

 

If we actually look for character properties per se, they are found

in the LC_CTYPE section of the Danish sample, pp. 1141 - 1148.

But the characters defined in the LC_CTYPE section depend on

the charmap itself, as specified in section G.6.1, starting on

p. 1152. The introduction to that charmap states: "Symbolic

character names are defined for about 1900 characters, covering

many coded character sets." By the way, notably *not* covering

10646-1:1993. So we are down quite a peg here, from a wild

claim about more than 20,000 characters, to an actual list

of "about 1900" characters.

 

And just as for the i18n repertoire in DTR 14652, this list is

seemingly arbitrarily culled from 10646-1:1993 to fit some

preconceived notion of what characters might be of particular

interest in Europe (or Denmark in particular, I suppose), but

with all kinds of omissions. Most Latin letters are included,

including those for Vietnamese, but not those for African

languages. Exactly one IPA character is in the list (ezh).

Greek spacing accents are included from the 1FXX block, but

not the rest of precomposed polytonic Greek. In addition to

8859-5 Cyrillic, 5 historic letters (but not all) are included

for OCS, plus one letter for Old Ukrainian, but no other

Cyrillic extensions. Hebrew but no Hebrew points. Basic

Arabic, but only 3 (arbitrary) Arabic extension letters, insufficient

to cover either Persian or Urdu. A bunch of fixed-width spaces,

but not the zero-width space. 4 arbitrarily picked currency signs

from the currency block -- but not all of them. Hiragana and

Katakana, but not halfwidth Katakana forms, nor sufficient

Asian symbols to cover any of the Asian standards. And so

on. In other words, a complete implementation hodge-podge,

full of all kinds of holes.

 

Well, so the repertoire is an arbitrarily chosen subset, but

what about the properties defined on that repertoire? Let's

take a look.

 

digit

 

Defines 0..9, but not the Arabic digits, which are in the repertoire

(unlike all the other digit sets from 10646-1). Presumably this

is not just an oversight, but is related to the claims about

culturally-specific implementation of national digit shapes.

But the fact remains that the Arabic digit characters in the

repertoire are not themselves given any character properties

in the LC_CTYPE definition -- and that just has to be the

wrong approach to those characters.

 

blank

space

 

Both of these classes completely overlook the various fixed-width

space characters which are included in the charmap. Oops!

 

upper

 

Somehow this definition manages to miss the uppercase Roman

numerals and parenthesized letter compatibility characters whose

lowercase forms *are* listed in the lower class. Oops!

 

lower

 

This section incorrectly specifies small Hiragana and small Katakana

characters as belonging to this lower class. Oops!

 

alpha

 

This class incorrectly specifies the parenthesized letter and

circled letter compatibility characters as being in the alpha

class. Any parsing operation depending on isalpha() will get the

wrong answer in that case, since those characters are used as

bullet symbols, not as letters per se, and cannot form parts of

words. Oops!

 

punct

 

This class follows the venerable (and incorrect) POSIX tradition

of conflating true punctuation with all other kinds of symbols

that happen not to be letters, spaces or digits. Included in

this list are Roman numerals (which are letterlike numeric

symbols), arrows, and math operators, for example. Also included

are the masculine and feminine ordinal symbols, which *do* form

parts of words, and which therefore should be part of the alpha

class, not the punct class. Clearly there is no particular

lesson to be gained here for Unicode character properties, except

a further demonstration that the "punct" class was misconceived

in POSIX in the first place.

 

So what do we learn by doing the research in the POSIX.2 document?

Basically that the character properties defined there are of

very little elucidative value for Unicode as a whole. And that

the particular class definitions have obvious errors in them,

as well as being incomplete and out-of-date.

 

Not much of a model to rely on, I would say.

 

>

> > On balance, it would seem far preferable to conclude that within

> > JTC1 any responsibility for character properties should belong

> > to SC2, rather than SC22. Once again, this is a matter of expertise

> > regarding the huge number of characters in 10646. That expertise

> > is in SC2, and not in SC22. And the implementation experience

> > regarding character properties resides in the UTC, which has a

> > firm working relationship with SC2, but no close ties to SC22.

>

> Again, the existence of SC2 experts in this area is a myth.

 

Untrue.

 

For each new script that is encoded in 10646, WG2 (and UTC) depends

on information provided by experts on that script (or elicitable

from experts on that script) to help determine character properties

for those characters. Many of those experts can only participate

in this work through they national body, and may attend WG2

meetings, but not UTC meetings. They certainly don't come to

WG20 meetings.

 

> I believe that Unicode has experts, but they are as well connected

> to WG20 as to SC2, having C liaison status in both groups.

 

The people who actually maintain lists of character properties,

write technical reports about them, implement them in libraries

or languages, do tend to be in the UTC, rather than in WG2, for

sure. But they depend on the experts from WG2 (among other sources)

for the primary information about character behavior this is

required for newly encoded characters.

 

> Furthermore the Unicode technical committee chairman, Arnold Winkler,

> is the convener of WG20. No high-ranking Unicode officers have

> the same level of office in SC2.

 

Arnold is the *vice*-chair of the UTC, but that is just a quibble.

 

However, your claim about SC2 is misleading. Michel Suignard is

a technical director of Unicode, Inc., and he is editor of

10646-2. Mike Ksar is a member of the board of directors

of Unicode, Inc., and he is convenor of WG2. Asmus Freytag is

a VP of Unicode, Inc., and he is the UTC liaison officer to WG2.

 

The UTC has good, working lines of communication into both working

groups. Any attempt to decide where to deal with character properties

on an imagined difference in these lines of communication is doomed

to the dustheap.

 

> SC2 has for a long time said that they were only into the encoding

> of characters, not the meaning. I think still this is a reasonable

> approach.

 

This is an incorrect characterization of the current *facts* about

10646, which does include some character semantic specifications

(combining and mirroring). Furthermore, your assertion that it

is a reasonable approach, when it comes to consideration of the

UCS, is not shared by many of the people participating in this

effort.

 

> > Furthermore, this practice of dealing with character

> > properties by reference to UTC and/or SC2 developed standards

> > for them, should be recommended to *all* the SC22 committees, as

> > the generic way to deal with character properties in formal

> > language standards.

>

> As said before, POSIX specs are more widespread than Unicode's,

> in therms of systems employing them,

 

This claim is just ludicrous -- either in terms of systems or

in terms of the availability and use of the specifications.

 

> and it seems like they may be

> better researched, as they have included Unicode specifications

> in their research, while Unicode still to this date is unaware of

> their bigger competitor...

 

I'll leave others to draw the obvious conclusion here.

 

--Ken

 

 

 

Please find links to more reactions on the top of this document.