From owner-sc22wg5  Wed Sep  4 22:06:17 2002
Received: from math.jpl.nasa.gov (math.jpl.nasa.gov [137.79.7.57])
	by dkuug.dk (8.9.2/8.9.2) with ESMTP id WAA86126
	for <SC22WG5@dkuug.dk>; Wed, 4 Sep 2002 22:06:16 +0200 (CEST)
	(envelope-from vsnyder@math.jpl.nasa.gov)
Received: from math.jpl.nasa.gov (vsnyder@localhost)
	by math.jpl.nasa.gov (8.11.6/8.11.6) with ESMTP id g84K7Be28561
	for <SC22WG5@dkuug.dk>; Wed, 4 Sep 2002 13:07:11 -0700
Message-Id: <200209042007.g84K7Be28561@math.jpl.nasa.gov>
X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4
X-Exmh-Isig-CompType: comp
X-Exmh-Isig-Folder: Fortran/WG5
To: SC22WG5@dkuug.dk
Subject: Re: John Reid's notes (SC22WG5.2544) SC22 meeting
Mime-Version: 1.0
Content-Type: text/plain
Date: Wed, 04 Sep 2002 13:07:11 -0700
From: Van Snyder <vsnyder@math.jpl.nasa.gov>


In (SC22WG5.2544), John Reid remarked that other languages are beginning
to allow international characters in identifiers.  I'm not convinced this
is possible in Fortran, as the lexer and parser sometimes don't know
where an identifier is until the end of a statement is reached, so it may
not be possible to know when to switch alphabets.  If this could be
allowed, care must be taken in doing so, so as not to allow more than one
set to be in use at one time.  The reason is that characters that have
different encodings in different parts of ISO-10646 have the same
appearance.  Consider Latin B, Russian B (named "vuh") and Greek B (named
"veeta" -- digression:  we say "beta", but the Latin/English "b" sound is
written in Greek as mu pi).  If I write identifiers BBB (Latin, Latin,
Latin) and BBB (Latin, Greek, Russian), are they the same or different? 
If they're different, this represents an enormous opportunity to multiply
maintenance costs by a large factor, and leads one to suspect the
proposal originally arose in the International Brotherhood of Maintenance
Programmers.  It's bad enough allowing O and 0 in the same identifier.

In any case, at this time, allowing other than the invariant set of
ISO-646 goes against the guidelines proposed in the fourth edition of
ISO/IEC JTC1/SC22 TR 10176 "Guidelines for the preparation of programming
language standards," which is the subject of a current DTR ballot (paper
JTC1 -- NOT WG5! -- N6815).  In 4.1.3.1.1, it says "As far  as possible,
the language should be defined in terms only of the characters included
within ISO/IEC 646, avoiding use of any that are in national use
positions." It goes on to say "The guideline relates to the need for
international interchange of programs, and hence is based on the
principle of using a minimal set of characters which can be expected to
be common to all systems likely to use the programs."

I also agree with Richard's reservations about UTF-16.  Many years ago,
CDC had a "6-12" encoding system.  There were very complicated rules to
compute the length of a character literal (actually a Hollerith literal
at that time).  It was a mess that I don't want to duplicate.  It is
remotely possible that strenuous pondering of the issue may find a way to
support it, but I, for one, don't want to hurt myself in trying. 
Besides, I have a long list of stuff I'd rather do.

-- 
Van Snyder                    |  What fraction of Americans believe 
vsnyder@math.jpl.nasa.gov     |  Wrestling is real and NASA is fake?
Any alleged opinions are my own and have not been approved or disapproved
by JPL, CalTech, NASA, Sean O'Keefe, George Bush, the Pope, or anybody else.


