SC22/WG20 N417
Minutes of the WG20 meeting #9 - Copenhagen
ISO/IEC JTC1/SC22/WG20
Internationalization
September 29, 1995
DATE: September 25-29, 1995
LOCATION: Danish Standards Association
Baunegårdsvej 73
Hellerup
AGENDA ITEMS:
Japan: T.K. Sato HP A. Kido IBM Canada A. LaBonté Trésor de Québec Denmark K. Simonsen RAP (consultant) USA M. Kung SGI Liaison A. Wallace IBM, COBOL Cooperations Þorvaður Kári CEN TC304 secretary Ólafsson Convenor A. Winkler Unisys
Chair: Winkler
Secretary: Winkler
Drafting: Simonsen, Sato
371R Draft minutes of the WG20 meeting #8 in Paris, May Winkler 15-19, 1995
The minutes were approved without changes.
411 Cultural element specification WD #2 (IS 14652) 15 412 Summary report of CEN/TC304/PT01 project team report 16.1 416 Input to WD#3 of 14651 (A9505-08) 13 418 Revised input for TR 10176 12 419 Some text for TR 10176 WD5 (A9505-12, A9410-07, 12 A9410-05) 421 Non editorial comments to TR 10176 WD4 12
375 Preliminary agenda for Meeting in Copenhagen Winkler
Agenda was approved with additions. Final agenda is document N401.
379 Resolutions of the 8th plenary meeting of JTC1, June SC22 N1882 13-16, 1995 in Kista, Sweden JTC1 N3586 380 SC22 secretariat report to the SC22 plenary Sep-95 SC22 N1883 381 SC22 Program of Work SC22 N1884 382 Revised agenda for SC22 plenary September 18-22, 1995 SC22 N1887 in Annapolis 383 Retention of projects, not reaching CD stage within 3 SC22 N1886 years of NP approval 408 Draft Resolutions from SC22 plenary in Annapolis, SC22 September 18-22, 1995 409 Rationale for the inclusion of paragraph numbers in SC22 N1967 SC22 standards
Convenor reported about the SC22 plenary with information about the resolutions that have effect on WG20:
AN - concurrent PDTR registration and ballot for 10176, AO - change of titles, AP - request for short identifiers, AQ - NP for APIs for I18N, BD for paragraph numbering of standards, BE & BF - cooperation with CEN TC304
Lots of discussion and work on proposals for electronic document transfer and use of WWW.
X3L2 for project JTC1.22.30.02.02 (International string ordering) requested by X3L2.
414 Liaison report from WG4 - COBOL A. Wallace
Ann gives report based on N414. Expresses desire to synchronize with WG20 the availability of standards for I18N with their use in COBOL. Identifier of 10646 need case information for use in COBOL. Ann is writing most papers for I18N functionality, only numeric formatting is written by somebody else.
New verb VALIDATE will be based on external locales, API standard is also needed, more features will be included in next release of the standard.
Question is: should WG4 include features, even if no standard is in sight, or would that be too fast and thus incompatible with future standards.
Should there be a Right-case function - this is beyond the compiler functionality, more an orthographic application.
Should there be a I18N-tag for data items that are culture dependent? Good idea.
Timing of sorting standard - CEN
WG20 has composed an answer to COBOL's liaison statement and submits it as N422 to WG4.
Ann wants the tables of the 10646 character properties made available in machine and human readable form so that COBOL implementers have access to them (for free). We could make them an informative annex for a TR or write the character properties standard asap.
No specific report, the amemndment to the shell and utilities is very much I18N oriented, it progresses very slowly.
386 Resolutions SC2 meeting Helsinki SC2 N2616 387 Resolutions SC2/WG2 meeting Helsinki SC2/WG2 N1254
No additional report besides resolutions and minutes which are available. Korean "old" characters might make problems with short identifiers.
Revision of C standard with input from WG20. Enhanced character set support, make better approach for character handling, localization. Amendment for internationalization (MSE) has been published. Proposal for POSIX alignment.
CD ballot, object oriented locales, new string class,object oriented APIs.
Nothing to report, merge of the organizations not finished.
Input for 10646 - second CD with fewer specifications, no composition. Complex things will go into TR. Symbols on numeric keypad - decimal point is a function, not a character.
See 16.1 and N413. Agreement on necessary cooeration, especially for sorting.
New copies of their sorting standard. Quite advanced in their specification, also for different languages. Are harmonizing with CEN.
391 Liaison document on alphabetical ordering of ISO/TC37, multilingual terminological and lexographical data SC3/WG3 represented in the Latin alphabet N58, N59, N60
Alain will write personal letter and send it to Winkler for forwarding to Hans Wellisch.
385 ITU-T recommendations and ISO standards dealing with Stefan Fuchs character coding
SD-5 List of Action Items Winkler
The action items from prior meetings were reviewed and the list updated.
354R Disposition of comments to TR 11017, including T.K. Sato R conclusions of Paris meeting (May 95) 359R Proposed text including comments from Paris and e-mail T.K. Sato on PDTR 11017 360R Proposed addition of Management Summary to T.K. Sato PDTR 11017 361R Proposed changes to section 6.2 (SCRIPT) of T.K. Sato PDTR 11017 362R Proposed revision of section 4.2.7 (cross cultural T.K. Sato friendliness) of PDTR 11017 366R Final draft for ISO/IEC DTR 11017: Framework for T.K. Sato internationalization 395 Addition of Annex C to PDTR 11017 T.K. Sato (Bi-directional text) R. Belhadj 396 Disposition results on section 5.6 and 5.7 of T.K. Sato PDTR 11017 397 Summary of differences N277 (PDTR) to N366R (DTR 11017) T.K. Sato
No SC22 resolution necessary for further processing, but clear understanding that the TR will be submitted for DTR ballot after Miles Ellis has edited it for proper English.
Discussion on inclusion of long annex on bidi presentation. If bidi, what about Hangul, Thai, etc.... We decided against such examples.
N396: add explanation of "customization" in 5.6
Management overview : many editorial changes.
More editorial changes throughout the document. After addition of bidi contribution, the TR text will be frozen. No more changes are allowed.
Convenor will get the DTR document and forward it to IETF and/or SC22.
377 ISO/IEC PDTR 10176, 2nd edition: Guidelines for A. Kido, M. preparation of programming language standards Noda SC22 N1931 389 Broadening the revision criteria for TR 10176 to Winkler accommodate WG11 issues 392 Proposed disposition of comments to WD5 of PDTR 10176 Akio Kido (Comments: B. Meek, A. Winkler) 394 More comments to TR 10176 from Brian Meek and Keld Brian Meek, Simonsen Keld Simonsen Oct.95-Nov. Kido preparation of WG#6 for concurrent registration 95 and approval, send to Winkler. Kido to request NB comments from people that sent technical comments now End Nov. 95 Winkler send to SC22 secretariat for registration and approval ballot April 96 Kido prepare comment disposition on ballot comments for discussion in Kyoto June 96 second PDTR ballot (final PDTR) September 96 Kido disposition of comments from final PDTR ballot and discussion in WG20 meeting in Vienna November ? Editing for proper English 1996
Kido explained the current organization of the document. Statements and built-in functions are in the body of the document, internationalization specific details are described in annexes, nomative and non-normative.
After discussion it was agreed that all I18N functionality should be described in the body of the TR as desired in the programming language, details might be in an annex.
Next discussion about binding of cultural conventions to processes or threads. Multi-locale processing must be synchronized - each thread must know which locale is to be used in the processing, otherwise default locale applies. Agreement: description of need for multi-locale support is in main body, a description of how C or POSIX handles this subject is in an annex (with or without coding example).
Discussion of character set support: character types. Must be coding independent, although languages might need single byte data types in addition to multi-byte types with conversion function betweenthe two. Characyrt boundry detection function is also needed. Programming language dependednt, subtypes can be implemented.
Keld recommends not only code independent data type but also encoding dependent data type for things such as UTF-16 data. Ann sees no need for coding dependent data-type, translation can be done in the I/O interface. Fortran has N-data type for implementation dependent type where the meaning is defined by sub-type (attribute). Charmap dependence?? Mike supports UCS data type to make applications portable to various platforms. Kido: POSIX binds the charmap to the character set independent character convention. Ann wants a data type that allows the use of combining sequences. - high level. Keld and Mike volunteered to prepare text for the document this evening for discussion tomorow.
Equivalence discussion: Glyph shape or meaning of character? Kido: programming language should have equivalence table for characters to be treated equaly, e.g. Latin uppper case A and Latin lower case A and Greek upper case A...etc. Alain - Keld: no, also no equivalence of accented letters. Ann: COBOL does not look at different A's - they are expected to be the same. Problem: there are about 16 blanc characters -are they the same?? Keld: different handling of identifiers from text. Ann: source code portability is important when transfering the program from Russian to Latin. Ann and Alain: source code should be locale independent.
We stopped the discussion onsource locale, but will have to come back to it.
Back to equivalence: Cyrillic A is the same as Greek A. Ann: Language syntax is ISO 646 invariant. If other character sets are used, these characters should not be eqyivalent. Alain: it is necessary to declare the UCS repertoire as the language independent portable character set. Ann: backward compatibility is so important that ISO 646 has to be the portable character set, not 10646. Also, the portable character set ought to be defined in the programming language, not in a locale. Equivalence is in the domain of the language.
Comparison and collation of data in execution.: Ann recommends that the code point is the default, but that programming languages might define "fuzzy" functionality - level of equivalence from sorting standard. Keld: at execution time all comparisons should be locale defined, with varying precision levels according to sorting standard. Literal comparison: at run time, based on the character set to which it is compared? In compile time the default order is locale dependent.
Range: to define ranges in programming languages is an execution time problem. We have no solution, we need to define problem - perhaps 2 different ranges, on of which is binary, the other locale dependent. Keld thinks that he has a proposed solution, we need more contributions and also text in the document to trigger comments. New syntax might be needed for culturally correct range finding - according to locale.
Brian Meeks comments: conflict with Antoni's comments to WD#2, US contribution is needed. Kido may use abreviated words for redundant specifications.
Definitions: Terminology will be in the glossary
Extended identifier list: corrected list will be included, no modification needed at this time
I18N library: include list from TR 11017, perhaps more needed
Multiple culture support: concept in main document, example in annex
Identification announcement mechanism:
Mapping:
Multiple character set support:
Discussion about text for annex B (or 4.7): Long discussion about Keld's proposed text for the guidelines in annex B. The groups agrees that the proposed text is conterproductive. It is better for the future to wait for the API standard to be advanced enough to be referenced in the TR rather than give incomplete guidelines in the TR itseld - that could lead to incomplete and differeing implementations which might have to be changed when the API standard is complete. A list of services will be taken out of the TR 11017 and the standards developer will be informed that these services will be available via the internationalization APIs.
Discussion about character set announcement mechanism: No mechanism or tagging schema is available, we want to stay away from 2022 tags. Better no recommendation than 2022. Applications might have to do the announcement and recognition of character sets. Message must contain the warning that "old" character sets will not go away quickly.
Instructions to the editor: In 4.7 add a description of the model - intrinsic fuctions and platform provided services where available via internationalization APIs. Add a note in which the current status of the API standard is described and a list of the services that will be covered by it. Add a section for the -non-existing - character set announcement method and point to the API standard note. Also point to the list of services described in TR 11017 and ask the convenors to send us their groups' requirements for internationalization APIs and other functionalities.
Kido's question: does the TR 10176 have a message to the developers to programming language standards about the use of culturally correct ordering during execution time. Message: at compilation time use the unchangable default locale, language specific wording is needed in p.l. to allow invocation of LC-LOCALE, and verb to invoce the comparison mechanism.
Discussion of proposed text from Keld and Mike for character handling: abstract?, code independent?, literals?, how to discourage the use of e.g.REDEFINE to get to the coding level and thus make portability impossible, do programming language committees agree with the paper and do they want this kind of guidance?, are there locale dependent literals?, should literals be in the compiler internal codeset = abstract in the sense of this paper?. Decision: rewrite according to discussion, discuss in group, send to PL standard groups for comments.
Discussion on resolution: Doubt, if the document is ready for registration and approval. Majority understanding is that is necessary to get the document registered and open it up for official comments from NB's and from other specialists and working groups.
Discussion of N423 - Guidelines on character data type in p.l. support: agreed text. Sorting data type has to go into the TR (text from Alain).
388 ISO/IEC WD 14651 International String Ordering (working Alain LaBonté draft #3) SC22 N1924 378 ANS: Alphabetic Arrangement of Letters and the Sorting ANSI/NISO of Numerals and Other Symbols Z39.75-199x 393 Comments on working draft #3 of ISO/IEC 14651 Hans Wellisch, NISO, chair AK 398 Instructions for ordering tables from NB's Alain LaBonté 391 Liaison document on alphabetical ordering of TC37 multilingual terminological and lexographical data represented in the Latin alphabeth
Registration means that all changes from the base document on must be traceable through ballot comments from national bodies and their resolutions !!!
Ordering standard should be synchronized with CEN and other groups that create such drafts.
Alain says that he might be better off to get the tables from CEN or from contacts in NB's.
Discussion about a message to be given to programming standards developer, if and how how to inform about the presence of a culturally correct ordering method. The TR 10176 has to deliver a message to all developers of programming languages: all P.L. have a default behaviour (default locale), locale switching invokes user locale at execution time - functionality must be provided.
Compile time locales must not be changable. Programming languages have the right to add characters for use in identifiers in addition to the "letters" as proposed in WG20's paper of extended identifiers.
Discussion and comments to WD#3 in the meeting:
Sato wants list of requirements; and conformance statement for tailored use of the comparison engine.
Arnold promised to get a copy of the LI procedure call standard to Alain and to Keld.
Possibility to allow tailoring in a way, that default sequence applies with exception of the users own script. That would allow simpler definition of tailoring without touching the :foreign" scripts. Nice and simple way for tailoring is needed.
407 Unique identifiers for characters in ISO/IEC 10646 SC22 N1968
SC22 is asking SC2/WG2 to define short, unique identifiers, based on a US request at the plenary (Hart / Winkler).
384 prENV 12005:1995 Procedure for European Registration CEN TC 304 of Cultural Elements (final draft) prENV 12005 411 Cultural convention specification IS 14652 WD #2 K. Simonsen
The working draft was discussed, especially the need for language independent specifications, possibly with an example of a binding to "C". The time schedule for WD #3 and WD #4 were discussed (November 1995 and March 1996 respectively). By mid 1996 we want to have a document for registration.
390 CEN and ISO cooperation Þ.K. Ólafsson 405 CEN/TC304/PT01 User requirements study in the field of CEN/TC304 character set technology 412 CEN presentation to SC22 by Þorvaður Kári Ólafsson
Þorvaður Kári Ólafsson gava a presentation about CEN and its TC304. CEN is the European equivalent to ISO, but takes instructions from the European Commission. First work was on character sets, later it was extended to cover the use of character sets - thus overlapping work with SC2, SC22, SC18, SC21.
CEN members are the 15 EU countries, Finland, Island, Switzerland. Affiliated members can be any ISO members in Europe, nine are members now.
Character sets: a mandatory set of about 1200 characters for all official languages, including Russian, Ukrainian, Greek, Turish, polytonic Greek, etc. An extended subset of about 3000 characters, based on pages of 10646, all Latin, Greek, also Vietnamese, African, etc.
TC304 next meeting in Barcelona
WG1: ordering rules
WG2: cultural elements, registration, TR on locales
WG3: character sets, subsets
WG4: transformations, conversions, transliterations
User requirement study PT01: to show EU what TC304 is doing and what their plans are.
Interesting projects for WG20:
L/11113a: Glossasoft project L/11113b Message interface L/1311: European default locale, based on European sub repertoire of 10646 C/1213b: General European rules for fallback representation C/313: Tools and transformation tables L/1311: European default locale L/1213a: European conversion and fallback rules L/111381a Ordering L/111381b Ordering C/211 Cultural elements (unregistered) C/31311 General model for character transformation L/111382 Ordering of UCS characters L/1312 More cultural conventions L/132 Formal specifications techniques for cultural data C3131 Guide on conversion between UCS coding forms (APIs only) C331 UCS in programming languages C/3311b Guidelines for the design of internationalization (for programming languages) C/3312 Language independent API specification for internationalization and UCS C/3311a Support for UCS in programming languages
Ideas for cooperation:
- combined meetings, planning and technical
- allow access to mailing lists
Break out (Keld, Alain, Þorvaður) prepared paper (N414) as a first plan and a listing of overlapping projects and current status. The paper recommends common meetings, cross membership, fast track procedures, and the development of a synchronization mechanism. Questions about what counts more, the European position or the CEN position in case of conflicts.
After discussing the paper, agreement for a resolution to forward to SC22 and to CEN/TC304 was reached.
Specific discussion on cooperation on the sorting standard:
1. Exchange documents October 10
2. Memberbody comments comments November 15
3. Keld to arrange for a meeting of Alain and possibly other WG20 members with CEN members or
4. Arrange for telephone bridge for discussion of differences with good preparations
399 Modification of DIS balloting procedures SC22 N1938 403 SC22 ad hoc report on the use of WWW SC22 AN-7 404 SC22 Recommendations on JTC1's Electronic Document SC22 N1965 Formatting Guidelines (AN-1R2) 406 WWW Sample pages SC22 ad hoc AN-8
Winkler wants to reduce the paper mailing, but also the mailing list of people who don't need the documents any more. The group decided that a request for confirmation can be sent to the recipients of WG20 mailings. This questionnaire could also contain the question about medium.. Some documents should remain on paper, especially working documents. National bodies have to get documents.
402 NP for Internationalization API standard WG20 SC22 N1962
Keld presented a draft document and explained his plan how to keep the standard langage independent, but easy to bind to C.
The action items were discussed and agreed upon. They will be added to SD-5.
The draft resolutions were discussed and approved. (SC22/WG20 N420)
The meeting was adjourned at 4:45pm.