From zongaro@VNET.IBM.COM  Thu Dec 19 00:10:05 1996
Received: from VNET.IBM.COM (vnet.ibm.com [199.171.26.4]) by dkuug.dk (8.6.12/8.6.12) with SMTP id AAA07177 for <sc22wg5@dkuug.dk>; Thu, 19 Dec 1996 00:09:45 +0100
Received: from TOROLAB by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 7390;
   Wed, 18 Dec 96 18:09:32 EST
Received: by TOROLAB (XAGENTA 4.0) id 3034; Wed, 18 Dec 1996 18:08:32 -0500 
Received: by twinpeaks.torolab.ibm.com (AIX 4.1/UCB 5.64/4.03)
          id AA14284; Wed, 18 Dec 1996 18:09:04 -0500
From: <zongaro@VNET.IBM.COM> (Henry Zongaro)
Message-Id: <9612182309.AA14284@twinpeaks.torolab.ibm.com>
Subject: Re: (SC22WG5.1211) *** N1235 / 97-107 ***
To: hennecke@rz.uni-karlsruhe.de
Date: Wed, 18 Dec 1996 18:09:02 -0500 (EST)
Cc: sc22wg5@dkuug.dk, sc22wg5-interop@ncsa.uiuc.edu
In-Reply-To: <199612040023.BAA02979@dkuug.dk> from "Michael Hennecke" at Dec 3, 96 11:04:57 pm
X-Mailer: ELM [version 2.4 PL24alpha3]
Content-Type: text
Content-Length: 19357

Hi Michael,

     Now that I have five minutes away from looking at XL HPF APARs you've
submitted ;-), I've decided to try to give some comments on your note.  My
comments are interspersed in your original posting.


>   "Interoperability with C and Binding to POSIX.1"
>
> which reviews the question of binding to POSIX.1 by means of the
> Interoperability TR.  I discovered several problems with the
> current interoperability approach.  Could WG5 and X3J3 members
> please review this document, and send me as many answers /
> suggestions as possible to the following questions?
>
>  1. Is it still the intent to interface to system routines
>     by using the TR-C features, like expressed in N1131?
>     Does this mean support for the POSIX.1 C interface?

     Personally, I never thought this was absolutely required - something that
can help bridge the gap between the two worlds in a portable way should be the
minimum requirement.  That way, a user might need some stub routines written in
C to be able to get from Fortran to the POSIX.1 C interface, but these stubs
would work on all POSIX.1 compliant systems.

     These stubs would do the dirty work of handling things like NULL pointers,
or converting a C pointer function result into something that the Fortran side
is able to deal with.  This is a lot like the way people have to work today,
except that the piece of code required on the Fortran side to call C is not
portable.

>  2. Are there any ideas how to implement a better way to
>     deal with dummy arguments of C pointer type?????

     One possibility (and this is a *huge* change) would be to use OPTIONAL
arguments.  (This is an idea that's just off the top of my head, so you don't
have to take it seriously.)  If an actual argument that is PASSBY("*") is not
present, the compiler could pass a NULL pointer of the appropriate type to the
C procedure.

     I personally don't like the idea, but it's a quick and dirty way of
getting to C for those people who don't like the idea of using stubs.

>  3. Apart from the problem of null pointers etc, are there any objections
>     (esp. implementation issues) agains allowing PASSBY("*") for function
>     result variables, with the understanding that the Fortran compiler
>     receives a pointer result from the C function, de-references it, and
>     copies that target to the Fortran variable/temporary?

     One concern I have is with what the C side is expecting to have happen
to this thing that it's returning.  It might be returning a pointer to storage
that was malloc'ed, and there might be an expectation on the calling side to
free() the storage.  This would work fine if the Fortran side proceeded to
free() the storage.  But what if the calling side was expected to call a
cleanup procedure after it had finished with the storage; this cleanup
procedure might be responsible for calling free() itself.  The pointer that was
returned by the initial C function would have been already freed by code that
the Fortran compiler generated, which could cause serious problems.  A third
possibility is that the pointer returned is not associated with malloc'ed
memory, but with memory that is statically defined by the C function, and it
would not even be eligible for release by a call to free().

     The Fortran compiler could always take no responsibility for freeing the
memory associated with the pointer; this could of course lead to memory leaks.

     Having the user write a stub that worries about the particular situation
that applies in their particular case seems preferable to having the Fortran
compiler get it right 1/3 of the time.  The other alternative would be to
introduce all of the semantics and functionality of C pointers into Fortran
which is something WG5 has said it does not want to do.

>  4. In lack of a good general solution to the C pointer problem, is there
>     any objection against providing a derived type for the special case
>     #char *#, which I called TYPE(C_CHAR_PTR), plus ASSIGNMENT(=) from
>     TYPE(C_CHAR_PTR) to CHARACTER? What about TYPE(C_CHAR_PTR_PTR)?
>
> I have tried hard to find a solution to deal with C pointers, but up to now
> my thoughts haven't been quite right :-(   Any help/ideas appreciated...
>
> Thanks,
> Michael
>
>  ======================================================================
>   Michael Hennecke      http://www.uni-karlsruhe.de/~Michael.Hennecke/
>  ----------------------------------------------------------------------
>   University of Karlsruhe         RFC822: hennecke@rz.uni-karlsruhe.de
>   Computing Center (G20.21 R210)               No longer on BITNET :-(
>   Zirkel 2  *  P.O. Box 69 80                 Phone: +49 721  608-4862
>   D-76128  Karlsruhe                               Fax: +49 721  32550
>  ======================================================================
>
> 
>                                            ISO/IEC JTC1/SC22/WG5/N1235
>                                                              X3J3/97-107
>                                                              Page 1 of 5
>
>                                                               1996-12-03
>
> From:       Michael Hennecke
> To:         WG5, X3J3
> Subject:    Interoperability with C and Binding to POSIX.1
> References: ISO/IEC 9945-1:1990 (IEEE Std 1003.1-1990)
>             WG5/N1178 (X3J3/96-069), WG5/N1229 (X3J3/96-153), WG5/N1131
>             HPF Language Specification 2.0.delta (1996-10-19)
>
> I have now finished my first reading of the POSIX.1 standard, which
> specifies the System Application Programming Interface (API) of POSIX
> for the C language. As the request for subdivision for the Technical
> Report on Interoperability with C, N1131, expresses WG5's intent to
> be able to interface to system routines using this TR, I felt a review
> of that standard in the light of the current interoperability approach
> would be useful. This revealed a number of outstanding problems, some
> of them of a very fundamental nature. They are summarized below.
>
>
> 1. Arguments passed by address (C pointers), and null pointers
>
> Up to now, WG5 has taken the approach (or at least thought of it) to do
> argument passing for scalar dummy arguments inside an BIND(C) interface
> **by value**, and define a BYVAL/BYREF attribute for dummy arguments to
> be able to switch to passing **by reference**. The HPF-2 attribute
> PASSBY("VAL") or PASSBY("*") does exactly this inside an EXTRINSIC(C)
> interface, this functionality might be take over by the WG5 TR.
> The semantics are that for a C function
>
>   extern void c_func ( int *i );
>
> and a Fortran interface (different function/argument names for clarity)
>
>   INTERFACE
>     BIND(C, NAME="c_func") SUBROUTINE f_func ( j )
>       USE ISO_C ; INTEGER(c_int), PASSBY("*") :: j
>     END SUBROUTINE f_func
>   END INTERFACE
>
> a Fortran call like CALL F_FUNC(K) will cause the compiler (not the
> user) to get the address of the Fortran actual argument K, and pass
> that address to #c_func#. This avoids to put C pointers in the Fortran
> user's hands. It works fine as long as #c_func# only expects #i# to
> point to a location which holds an #int# value (and only modifies the
> value of #*i#, not #i# itself). BUT:
>
>  * It may well happen that #c_func# also modifies the value of #i#
>    itself, because in C there are two possible uses of such a function:
>
>      int f;                  int *g;
>      (void) c_func(&f);      (void) c_func(g);    /* these are CALLs */
>
>    The call using the address #&f# of #f# is what the PASSBY("*") with
>    automatic address-of-actual-arg semantics supports, but not the call
>    with #g#. C functions dealing with the more general call with a
> 
>                                              ISO/IEC JTC1/SC22/WG5/N1235
>                                                              X3J3/97-107
>                                                              Page 2 of 5
>
>    pointer #g# may modify the dummy #i# itself, e.g. store any address
>    in #i# instead of just modifying its target. This will break the
>    Fortran program (at best).

     You noted in a later note that there wasn't a problem with the C function
modifying g.  However, this is something that might at least be worth noting in
the TR - the fact that a C procedure can modify a formal argument without
affecting the actual argument that was passed.  This is something that Fortran
cannot do, so the difference in semantics might cause mild confusion.

>  * It may also happen that the function has a different (but documented
>    and important-to-have) behavior when it is passed a null pointer
>    (like C's NULL) instead of the address of an #int# variable.
>    This cannot be supported by the PASSBY("*") mechanism combined with
>    automatic address-of-actual-arg semantics, because the compiler
>    **always** passes the address of a Fortran actual argument -- which
>    is different from C's NULL.
>
> A number of POSIX.1 functions make use of this special case of passing
> a null pointer as actual argument. Examples include:
>
>  * The #argv# and #envp# arguments to the <exit> family of functions.
>  * The #const struct utimbuf *times# argument of #utime#.
>  * The #struct sigaction *# arguments to #sigaction()#.
>  * #ctermid# may also have a NULL argument
>
> So the handling of dummy arguments of C pointer types should be
> reconsidered. If a PASSBY("*") approach with automatic
> address-of-actual-arg by the compiler is taken, many C functions will
> not be callable from within Fortran (or at least strange results may
> occur if they are called).
>
>         ********************************************************
>         ***  This is a very fundamental design problem,      ***
>         ***  which should be addressed as soon as possible!  ***
>         ********************************************************
>
>
> 2. Function result values which are C pointers
>
> A significant portions of the POSIX.1 functions returns a C pointer.
> Examples of such functions and their return types are:
>
>   * The #getlogin#, #getenv#, #ctermid#, #getcwd# and #setlocale#
>     functions have a return type of #char *#.
>   * #readdir# has return type #struct dirent *#, and #opendir# has
>     return type #DIR *#, where #DIR# is some typedef-ed type.
>   * #getgrgid# and #getgrnam# have return type #struct group *#.
>   * #getpwuid# and #getpwnam# have return type #struct passwd *#.
>
> In order to support binding to these functions, some mechanism to
> handle C pointers as result types of functions seems to be inevitable.
> At first sight, the extension of something like the HPF-2 PASSBY("*")
> spec to function result variables seems natural.
> But a very severe problem with such an approach is that some of these
> functions may return a null pointer (notably #getlogin# and #getenv#)
> under some conditions, which would (at best) break the program when
> the Fortran compiler does an automatic de-referencing of that result
> on return from the C function. This is essentially the same problem
> as in topic (1).
> 
>                                              ISO/IEC JTC1/SC22/WG5/N1235
>                                                              X3J3/97-107
>                                                              Page 3 of 5
>
> Even if this problem is ignored, dealing with pointer function results
> would either imply copying of data from the de-referenced C result
> into the Fortran variable (or temporary if the function reference is
> in an expression), or require to provide C-pointer datatypes to the
> Fortran programmer.
> Neither HPF-2 nor the ISO TR do currently provide such functionality.
>
>
> 3. Structures defined in POSIX.1
>
> A variety of #struct# derived types is defined by POSIX.1. However, for
> all of these structures only the names of the types and their required
> components are specified. POSIX.1 does not define the actual order of
> these components in the #struct#, neither does it require that the
> specified components are the only components present in an actual
> implementation. In reality, there are sometimes many more components
> since vendors include their own extensions in these structures.
>
>   NOTE:
>   This is another argument agains the MAP_TO approach to
>   interoperability: Since the actual contents of such a #struct# is
>   not standardized, it is not possible to specify a portable MAP_TO
>   for it in an interface block residing in application programs.
>
> The structures defined by POSIX.1 are:
>
>   struct dirent           struct flock
>   struct group            struct lconv
>   struct passwd           struct sigaction
>   struct stat             struct tms
>   struct utsname          struct termios
>   struct tm               struct utimbuf
>
> Most of the components are intrinsic or primitive system data types
> (see topic 7), some are character arrays of implementation-dependent
> (but fixed?) size. These can all be modeled by using a BIND(C) spec
> inside a Fortran derived type definition. But some exceptions are
> important (and difficult):
>
> * A #char*# component which does not hold the actual character data,
>   but only points to its location (possibly in system memory rather
>   than in user memory) is contained in #group# and #passwd#:
>
>     group.gr_name
>     passwd.pw_name
>     passwd.pw_dir
>     passwd.pw_shell
>
> * The list of group members is attached to the #group# structure by a
>   #char**# component. This points to a NULL-terminated list, again
>   possibly in system memory and possibly static.
>
> * A function pointer is contained in #sigaction#:
>
>     sigaction.sa_handler  void(*)()
> 
>                                              ISO/IEC JTC1/SC22/WG5/N1235
>                                                              X3J3/97-107
>                                                              Page 4 of 5
>
>   It is perhaps not necessary to access this component directly.
>   But users must be able to declare objetcs of this structure type,
>   so at least some kind of dummy field (of suitable size) must be
>   declared instead of the function pointer when binding to such
>   structures.
>
> A derived type TYPE(C_CHAR_PTR) seems to be necessary to be able to
> bind to the #group# and #passwd# structures, as well as a means to
> read the string where it points to into a Fortran CHARACTER object.
>
>   NOTE:
>   Maybe also the reverse functionality of storing the
>   "address-of a Fortran CHARACTER object" in a C_CHAR_PTR.
>
> Additionally, a TYPE(C_CHAR_PTR_PTR) including increment/decrement and
> de-reference operations may be useful. The former would allow to move
> through such a pointer list, the latter results in a C_CHAR_PTR which
> can then be accessed as above.
>
>   NOTE:
>   These problems are different from the problem of argument association
>   (by value, by address): they occur in a structure component, and a
>   data type for these pointer components must be provided to the
>   application programmer in order to be able to bind to this API.
>   Automatic handling of the address-of and de-reference operations by
>   the compiler is not possible here. The same holds for global
>   variables (see topic 4).
>
> Dealing with function pointers is not possible with F95 facilities
> since Fortran up to now does not support procedure variables/pointers
> to procedures. F2000 developments in this area may be incorporated
> when integrating the TR into IS 1539-1, but this is out of the scope of
> the TR itself.
>
>
> 4. Global variables
>
> POSIX.1 uses at least three external variables:
>
>   extern int errno;
>   extern char **environ;
>   extern char *tzname[2];
>
> Note that POSIX.1 is more restrictive than ISO C because it requires
> #errno# to be an external variable: the C standard also allows #errno#
> to be a macro. It is necessary that applications can check the value of
> #errno# directly, this shows the need for a means to bind to extern
> data objects, not only to extern procedures.
> Section 3.4 of N1178 (96-069) will be enhanced to support this
> requirement by a module variable approach.
>
> If #environ# were to be accessed directly from Fortran, a corresponding
> datatype would be required, as well as operations to de-reference that
> pointer. This may not be critical, since the same information may be
> accessed by the #getenv()# function. But that same facility is also
> necessary to access some structure components, see above.

     I don't like the idea of getting involved with errno.  Would Fortran have
to specify what the value of errno is after various intrinsic function
references, for instance?

> 
>                                              ISO/IEC JTC1/SC22/WG5/N1235
>                                                              X3J3/97-107
>                                                              Page 5 of 5
>
> 5. Underscore as the first character in identifiers
>
> POSIX.1 defines one function, #_exit#, and a number of numerical limits
> and other symbolic constants which have an underscore character as
> their first character (e.g. all symbols starting with #_POSIX#).
> This is not allowed in Fortran, but is not a severe restriction since
> a Fortran binding to POSIX.1 may establish naming conventions that
> circumvent the leading underscore.

     This should be handled by the name binding you've already proposed.  I
don't think there should be a problem with leading underscores.

> 6. Unsigned integers
>
> Some POSIX.1 function have unsigned integer arguments or result type,
> notably the #alarm# and #sleep# functions. Some of the baud rate
> functions for terminal control have return type #speed_t#, which is
> also an unsigned integral type.
> This is a minor difficulty: these types may be mapped to their
> corresponding signed types, leaving the interpretation of "negative"
> values implementation dependent.
>
>
> 7. Type name aliases for primitive system data types
>
> For portability, POSIX.1 defines a number of so-called primitive system
> data types. They are all #typedef#s to arithmetic types and include:
>
>   dev_t    gid_t    ino_t    mode_t
>   nlink_t  off_t    pid_t    size_t
>   ssize_t  time_t   uid_t
>
> The <type-alias-stmt> of N1178 (96-069) is necessary and sufficient to
> establish corresponding derived type names for a Fortran binding.
>
>   NOTE:
>   Working with the original intrinsic types for which these datatypes
>   are aliases would be possible, but would sacrifice source code
>   portability across platforms. This is a key issue of POSIX.1.
>
>
> 8. Varying length argument lists
>
> There are several POSIX.1 functions which include an <ellipsis> in
> their argument list, using the features of #<stdarg.h>#. These include
> the three members #execl#, #execle# and #execlp# of the <exec> family
> of functions, and the #open# and #fcntl# functions.
>
> The features specified in N1229 (96-153), improved along the lines
> of X3J3's comments from meeting 139, should be sufficient to provide
> an interface to these functions.

Thanks,

Henry