From owner-sc22wg5  Mon Sep  9 01:33:31 2002
Received: from imo-d07.mx.aol.com (imo-d07.mx.aol.com [205.188.157.39])
	by dkuug.dk (8.9.2/8.9.2) with ESMTP id BAA18420
	for <SC22WG5@dkuug.dk>; Mon, 9 Sep 2002 01:33:27 +0200 (CEST)
	(envelope-from Wclodius@aol.com)
From: Wclodius@aol.com
Received: from Wclodius@aol.com
	by imo-d07.mx.aol.com (mail_out_v34.10.) id 9.109.184b191a (3890)
	 for <SC22WG5@dkuug.dk>; Sun, 8 Sep 2002 19:33:50 -0400 (EDT)
Message-ID: <109.184b191a.2aad385e@aol.com>
Date: Sun, 8 Sep 2002 19:33:50 EDT
Subject: Re: (SC22WG5.2544) SC22 meeting
To: SC22WG5@dkuug.dk
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Mailer: AOL 5.0 for Mac sub 39


In a message dated 9/4/02 10:19:25 AM, jkr@rl.ac.uk writes:

><snip>
>
>1. Rather than specifying ISO_10646 in our SELECTED_CHAR_KIND 
>   intrinsic, we should perhaps consider the three encodings of it:
>   UTF-8, UFT-16, UTF-32.  UTF-16 appears to have a lot of merit and is
>   catching on. It involves variable-width 16-bit strings. There are
>   2048 special 16-bit values, which allow the frequently-used
>   characters to be represented directly in 16 bits and the rest
>   (actually up to 1,048,576) to be represented as a pair of specials.
>   No 'escape' mechanism is needed since the special characters may be
>   recognized directly. The Unicode Consortium wishes programming
>   languages to support this data type.

For what its worth, I think that they are being too specific in providing 
direction to the Fortran committee.  They should not be concerned with the 
internal representation used by a processor for the CHARACTER data type 
KINDs.  Fortran's intrinsic CHARACTER type has a lot of associated procedures 
and operations.  So far Fortran's vendors have been very reluctant to provide 
more than one KIND of the CHARACTER data type.  They will not implement all 
that effort in three different encodings of the same critical functionality.  
Further if the internal representation is critical to the use of the 
processor, then the standard has not done a good job of properly 
encapsulating the CHARACTER type's functionality.  Finally, the standard 
probably does not do an ideal job in that encapsulation, and verifying that 
such encodings are consistent with the detailed semantics will require work 
that will significantly delay F2000.

What SC22 should be concerned with is whether a KIND will be available that 
can represent all the defined ISO-10646 characters, and whether that 
representation can be made available to the external world in those encoding. 
The current draft appears to allow the desired KIND. The minimal way to make 
to encodings accessible to the outside world is by having the ability to read 
and write files in those encodings. The obvious way to provide this 
capability is through a connection specifier in the OPEN statement, e.g., 
FORM='UTF-16'.  This might be doable without significant delays in F200x.  
Perhaps also procedures to translate character data to and from those 
representations. Anything else should be considered after the publication of 
the F2000.
>
>2. Other languages are beginning to allow international characters in
>   identifiers (names). I can see their merit for codes that are intended
>   never to leave the host culture or for private names within a module
>   that will always be maintained within the host culture.

Yes, Java, C++, and C99 provide that functionality, but the defined way of 
providing it is complicated, difficult to use, and error prone.  The assumed 
semantics, that every character code is unique, whether or not their visual 
representation is virtually identical, is particularly error, and not 
compatible with Fortran's caseless lexing.  Defining a lexical mapping for 
ISO-10646 similar to what Fortran does now for ASCII, probably requires 
defining a mapping for ever non-control character in ISO-10646.  I wouldn't 
want to think about tackling that.

What might be doable would be providing a standard defined mapping for the 
Latin-1 and Latin-0 character sets.  I believe Ada does this so we would have 
an example to follow, and it would make Fortran more user friendly for most 
of western Europe, Africa, Latin America, and a few Asian, East European, and 
Oceanic countries.
><snip>