From LJM@SLACVM.BITNET Mon Sep  6 06:31:00 1993
Received: from vm.uni-c.dk by dkuug.dk with SMTP id AA10912
  (5.65c8/IDA-1.4.4j for <SC22WG5@dkuug.dk>); Mon, 6 Sep 1993 23:35:29 +0200
Message-Id: <199309062135.AA10912@dkuug.dk>
Received: from vm.uni-c.dk by vm.uni-c.dk (IBM VM SMTP V2R2) with BSMTP id 6540;
   Mon, 06 Sep 93 23:35:57 DNT
Received: from SLACVM.SLAC.STANFORD.EDU by vm.uni-c.dk (Mailer R2.07) with
 BSMTP id 2989; Mon, 06 Sep 93 23:35:56 DNT
Received: by SLACVM (Mailer R2.08 R208004) id 2163;
          Mon, 06 Sep 93 14:35:44 PST
Date: Mon, 06 Sep 1993   14:31 -0800 (PST)
From: "Len Moss"                                     <LJM%SLACVM@vm.uni-c.dk>
To: "SC22/WG5 Mailing List"                        <SC22WG5@dkuug.dk>
Subject: Re: (SC22WG5.403) Letter Ballot to move Varying Strings to DIS
X-Charset: ASCII
X-Char-Esc: 29

In-Reply-To: martin@ocfmail.ocf.llnl.gov -- 07/27/93 17:52


   The question is:

   Do you approve WG5-N939 for submission to the SC22 Secretariat for
   registration as a draft international standard (DIS)?

                      yes____                   no__X_

   If your vote is no, supply revisions to N939 that would change your
   vote to yes.

                        * * * * * * * * * *

I vote NO on the WG5 ballot to submit WG5-N939 to the SC22
Secretariat for registration as a draft international standard
(DIS).

My vote will change to YES if the following changes are made.

A.1 Global:
   The word "must" is currently used in a number of places (e.g.,
   line 10 of page 5, line 9 of page 7, etc.) where "shall"
   should be used instead.

A.2 Sections 2.3 and 2.4:
   There should be clear statements that the results for
   operators and the extended generic functions must be scalars.

A.3 Sections 2.3 and 2.3.1:
   Assignment should not be classified as an intrinsic operator.
   Doing so will not only confuse users, it will produce an
   ongoing interpretation nightmare.

A.4 Section 2.3.3:
   In the description of the result value, the phrase
   "<b>string_a</> stands in the indicated relation to
   <b>string_b</>" is, by itself, too vague.  I would prefer
   language as parallel as possible to the description of these
   operators in Fortran 90 itself (see the last few paragraphs of
   7.2.3); however, at the very least the description should make
   it clear that the inequality comparisons proceed a character
   at a time from left to right until a non-matching character is
   found.

A.5 Section 2.4.6:
   The description of the result value for LEN_TRIM is
   significantly different (though probably equivalent) to that
   for the LEN_TRIM function in Fortran 90 and should be changed
   to more clearly correspond to the latter (for example, change
   "position ... <b>string</>" to "number of characters remaining
   after any trailing blanks in <b>string</> are removed").

A.6 Sections 2.4.7 and 2.4.8:
   The description of the result values for ADJUSTL and ADJUSTR
   are significantly different (though probably equivalent) to
   those for the corresponding functions in Fortran 90 and should
   be changed to more clearly correspond to the latter (for
   example, for ADJUSTL change "contains ... non-blank" to "is
   the same as <b>string</>except that any leading blanks have
   been deleted and the same number of trailing blanks have been
   inserted").

A.7 Section 2.4.10:
   The complete names of the four lexical comparison
   functions, LLT, LLE, LGE, and LGT, should occur explicitly
   somewhere in the standard proper (at present the initial L is
   factored out).  My preference is to handle these as four
   separate subsections as in Fortran 90; however, at a minimum
   the beginning of this section should be modified so that the
   four line table contains the complete, three-character names
   of these function.

   The description of the result value of these functions should
   be modified to clearly indicate that if both strings are of
   zero length, LLE and LGE return true while LLT and LGT return
   false.

A.8 Section 2.6:
   A clear statement should be added that the unit numbers in I/O
   procedures are part of the same global unit number space used
   for the I/O statements in Fortran 90, and thus, if a program
   intermixes I/O statements and procedure calls with the same
   unit number, data is transferred across the same file
   connection.  Similarly, it should be made clear that the
   "default input unit" is the same unit specified by an asterisk
   in the READ statement, and the "default output unit" is the
   same unit specified by an asterisk in the WRITE statement.

   An explicit statement should also be added that all the other
   I/O statements (OPEN, CLOSE, INQUIRE, BACKSPACE, REWIND, and
   ENDFILE) may specify the same unit, and that the set of
   allowable specifier values in the OPEN statement for such a
   unit are exactly the same as those for any other formatted
   sequential file.

A.9 Section 2.6.1:
   In the forms of GET that include a "set" argument, an
   additional optional VARYING_STRING argument should be added to
   return the actual terminal character found (or a zero length
   string if the input terminates due to EOR or maxlen).

   The description of IOSTAT should be modified to indicate that
   a second negative value, distinct from that returned on
   end-of-file, is returned if an end-of-record condition occurs.

A.10 Sections 2.6.2 and 2.6.3:
   A clarification should be added to the description of both PUT
   and PUT_LINE indicating that the string argument may be of
   zero length (note that it is important to permit an empty
   string in calls to PUT_LINE in order to permit a record to be
   ended without transferring any more characters).

A.11 Section 2.7.1:
   The description of the result value for INSERT is very poorly
   worded and should be improved.  For example, the sentence,
   "The remainder of the result string is shifted to the right
   and enlarged as necessary", is quite confusing: the result
   string, after all, is presumably what the function returns
   after all such manipulations are done.  I suggest that the
   result value should simply be given as an expression involving
   the EXTRACT function and the concatenation operator.

A.12 Section 2.7.2:
   The third form of REPLACE provides pattern manipulation rather
   than basic substring manipulation and is out of place in this
   standard.  Although the functionality it provides may be
   useful, one can imagine a great many other equally useful
   routines that might be added; for example: word- rather than
   character-oriented analogs of INSERT, REPLACE, REMOVE and
   EXTRACT; functions to count the number of words or instances
   of a pattern within a string, or to find the character offset
   to the nth word or pattern instance; a grep-like pattern
   matcher; etc.  This form of REPLACE should be deleted.

A.13 Section 2.7.5:
   The fourth sentence of the description of the action of the
   SPLIT subroutine should be amended by adding "or if <b>set</>
   is of zero length" following "If no character from <b>set</>
   is found".

   When I examined the implementation of the SPLIT subroutine in
   Annex A, I discovered that, unlike all the other (function)
   procedures, SPLIT returns pointers into the input string
   rather than creating independent copies of the returned
   substrings.  In an implementation with garbage collection,
   this distinction would be critical to the user; however, the
   only indication of this difference in the standard itself is
   the absence of phrases like "returns a copy of the characters"
   in the description the action of SPLIT.  This is not
   sufficiently precise and a more formal way should be found to
   specify when a returned VARYING_STRING entity is pointing to
   an independent string and when it is associated with other
   VARYING_STRING entities -- or at the very least, when this is
   processor dependent.

A.14 Missing from section 2.7:
   Some basic character manipulations peculiar to varying-length
   strings require fairly awkward expressions with the set of
   functions and operators provided in this standard: namely, the
   ability to pad a string to a specified length, either with
   leading or trailing blanks; and the ability to extract the
   rightmost characters of a string.

   While I agree with the editor that it would be inappropriate
   to add a LENGTH argument to the VAR_STR conversion function,
   the functionality of padding a string to a specified length is
   nevertheless frequently needed (for example, in constructing a
   table a column at a time).  Moreover, one often wants to pad
   with leading, rather than trailing blanks.  The ADJUSTL and
   ADJUSTR functions provide the analog of this for fixed-length
   strings, but they are inadequate by themselves for varying
   length strings.

   Extracting the leftmost characters of a string can be done
   quite easily with the EXTRACT function, but the equivalent
   operation on the rightmost characters of a string requires a
   somewhat more awkward expression.  In both cases, if the goal
   is to produce a string of exactly the specified length, the
   expression becomes even more complicated.

   I suggest addressing both of these problems by adding two
   additional functions, LEFT and RIGHT.  Each function would
   take two arguments, a string (fixed or varying) and an integer
   length, and would return a string of exactly the specified
   length.  In the case of LEFT, the string would consist of a
   copy of the leftmost characters of the input string, truncated
   or padded on the right if necessary.  Similarly, the result
   from RIGHT would be the rightmost characters of the input
   string, truncated or padded on the left if necessary.

                        * * * * * * * * * *

I suggest making the following additional changes; however, a change
to my NO vote is not contingent upon doing so.

B.1 General Style:
   I feel very strongly that the requirements of this standard
   should be stated in language as close as possible to that of
   Fortran 90.  Since many of these requirements are extensions
   of similar facilities in Fortran 90, couching their
   descriptions in highly parallel language will help both users
   and implementers (and especially the maintainers of both
   standards) to keep in mind precisely where the similarities
   and differences lie.  The intimate connection between these
   two standards more than justifies such an approach, even if
   the result is sometimes more stilted or verbose than would be
   necessary in a stand-alone varying string standard.

   Some improvement has been made in this area since the last
   major draft, for example, in the description of the INDEX
   function.  However, I believe the descriptions of the
   following requirements could also benefit from such a rewrite:
   the concatenation operation (2.3.2); the comparison operations
   (2.3.3); the ADJUSTL and ADJUSTR functions (2.4.7 and 2.4.8);
   the REPEAT function (2.4.9); the lexical comparison functions,
   i.e., after dividing into four separate subsections (2.4.10);
   and the VERIFY function (2.4.13).

B.2 Lack of examples:
   Another issue of general style is the lack of examples in the
   text of the standard itself.  Nearly every procedure described
   in section 13 of Fortran 90 contains some examples -- the same
   should be done for this standard.

B.3 Section 2.7:
   I indicated above that the description of the INSERT procedure
   was unacceptably confusing and needed to be rewritten, and I
   suggested simply giving an expression for the result using the
   EXTRACT function and the concatenation operation.  I suggest
   similar rewrites for the first two forms of the REPLACE
   function, the REMOVE function, and the SPLIT subroutine.

B.4 Minor editorial points:
   In the second paragraph of section 1, the use of "However,"
   and "Nevertheless," at the beginning of two consecutive
   sentences is very awkward.

   In the first sentence of the third paragraph, "the name of the
   derived data type" should be changed to "the name of a derived
   data" (this standard is not intended to preclude the use of
   other derived data types to represent varying length character
   strings).

   The last sentence of the fifth paragraph should be modified to
   make it clear that extensions to the varying string
   facilities provided in this standard must not conflict with
   either the requirements of this standard or of Fortran 90; for
   example, by deleting the words "those defined in" and
   appending to the end of the sentence "or with ISO/IEC
   1539:1991".

   In the sixth paragraph, the words "a standard conforming
   module could be written" are confusing, since we usually use
   words like "written" for programs produced by end users, not
   implementers.  I suggest replacing this phrase with "the
   facilities described in this standard could be provided".

   In the second sentence of the last paragraph of section 1,
   after "which apply to fixed length character strings" add
   "(with the exception of substring selection)".

B.5 Annex B
   The second example in Annex B is grossly inefficient and
   suggests incompetence on the part of the committees
   responsible for this standard.  It should either be made at
   least moderately efficient (e.g., by using insertion sort to
   maintain the vocabulary list and binary search to access it)
   or else it should simply be deleted.

--
Leonard J. Moss <ljm@slac.stanford.edu>   | My views don't necessarily
Stanford Linear Accelerator Center, MS 97 | reflect those of SLAC,
P.O. Box 4349; Stanford, CA  94309        | Stanford or the DOE
