From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org  Mon Mar 24 14:30:31 2014
Return-Path: <owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org>
X-Original-To: sc22wg5-dom8
Delivered-To: sc22wg5-dom8@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id 7542435869B; Mon, 24 Mar 2014 14:30:31 +0100 (CET)
Delivered-To: sc22wg5@open-std.org
Received: from ppsw-42.csi.cam.ac.uk (ppsw-42.csi.cam.ac.uk [131.111.8.142])
	by www.open-std.org (Postfix) with ESMTP id F181C356E06
	for <sc22wg5@open-std.org>; Mon, 24 Mar 2014 14:30:28 +0100 (CET)
X-Cam-AntiVirus: no malware found
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
Received: from hermes-1.csi.cam.ac.uk ([131.111.8.51]:47383)
	by ppsw-42.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.157]:25)
	with esmtpa (EXTERNAL:nmm1) id 1WS4xP-0004vg-7v (Exim 4.82_3-c0e5623)
	(return-path <nmm1@hermes.cam.ac.uk>); Mon, 24 Mar 2014 13:30:27 +0000
Received: from prayer by hermes-1.csi.cam.ac.uk (hermes.cam.ac.uk)
	with local (PRAYER:nmm1) id 1WS4xP-0002aC-D5 (Exim 4.72)
	(return-path <nmm1@hermes.cam.ac.uk>); Mon, 24 Mar 2014 13:30:27 +0000
Received: from [87.113.109.128] by old-webmail.hermes.cam.ac.uk
	with HTTP (Prayer-1.3.5); 24 Mar 2014 13:30:27 +0000
Date: 24 Mar 2014 13:30:27 +0000
From: "N.M. Maclaren" <nmm1@cam.ac.uk>
To: John Reid <John.Reid@stfc.ac.uk>
Cc: WG5 <sc22wg5@open-std.org>
Subject: Re: [ukfortran] (SC22WG5.5192) Ballot on draft DTS
Message-ID: <Prayer.1.3.5.1403241330271.4769@hermes-1.csi.cam.ac.uk>
In-Reply-To: <20140312154430.1B7899EB083@www.open-std.org>
References: <20140312154430.1B7899EB083@www.open-std.org>
X-Mailer: Prayer v1.3.5
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset=ISO-8859-1
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

This is a WG5 letter ballot on N2007, the fourth draft DTS for TS 18508,
Additional Parallel Features in Fortran.

Please answer the following question "Is N2007 ready for forwarding to 
SC22 as the DTS?" in one of these ways. 


3) No, for the following reasons.


Regards,
Nick Maclaren.



Because this is a long response, the most serious and difficult points
are:

Events.2) This TS introduces a facility that is arguably in conflict
with Fortran 2008 page 23:1-2, and is not described as processor
dependent.  It is not clear how to fix it, and it should be removed.
At the very least, it should be explicitly specified to be processor-
dependent.

Collectives.1) This TS introduces a conflict with Fortran 2008's concept
and specification of variable definition context.  This is definitely
fixable, but may need significant changes to the collectives.

General.1 and General.2) This TS should be explicitly removed from
consideration for inclusion in the next Fortran standard (see N1979),
because it will probably be be infeasible to specify a data consistency
model by the deadline for the first CD ballot, and there is very
unlikely to be sufficient implementation and user experience.



EVENTS
------

Events.1) The wording in 8.7 page 32:26-27 still refers to EVENT POST
and EVENT WAIT being within segments.

8.7 page 32:26-27 should replace:

    A coarray that is of type EVENT TYPE may be referenced or defined
    during the execution of a segment that is unordered relative to
    the execution of another segment in which that coarray of type
    EVENT TYPE is defined.

by:
    A coarray that is of type EVENT TYPE may be referenced or defined
    during the execution of a segment that is unordered relative to the
    execution of the EVENT POST and EVENT WAIT statements by which that
    coarray is defined.


Events.2) Reinhold Bader's (6B) [lines 258-264] and my Comment I [lines
923-965] in N1999 have still not been addressed.  In addition, other
comments along the same lines have been made in the past by other
people.  As a consequence, it is still completely unclear whether the
statements in the example in 7.4.11 page 23:40-44 and 24:1-2 are
actually correct.

Indeed, in the absence of any progress guarantee, it is doubtful that
example A.2.1 is permitted to always return zero from the call to
EVENT_QUERY, thus going into an infinite loop irrespective of whether
any events had been posted.  If that is allowed, it would not have a
defined interpretation, and would therefore not be a standard-conforming
program (Fortran 2008 page 23:1-2).  See also Atomics.2 and General.1.

Unless those questions can be resolved, and I do not see how, I remain
of the opinion that EVENT_QUERY should be removed.  If not, its effect
and whether example A.2.1 will work at all should be explicitly stated
to be processor dependent.


TEAMS
-----

Teams.1) The clarification that variables of type TEAM_TYPE have value
semantics is welcome, but the TS also needs to state whether the values
can be assigned between images and used on an image other than the one on
which they were defined.  It doesn't matter whether they can or cannot,
but it should be explicitly stated which.


Teams.2) It is strange to provide intrinsics like GET_TEAM and TEAM_ID
yet provide no way to test whether two team values are the same.
Equality and inequality comparison should be supported for TEAM_TYPE.

After 5.2 page 9:30+ add:

    The module defines the following:

        The elemental operator == for two values of type TEAM_TYPE to
        return true if the values are the same and false otherwise.

        The elemental operator /= for two values of type TEAM_TYPE to
        return true if the values differ and false otherwise.


Teams.3) The current wording for CHANGE TEAM still contains some
implications of association semantics, which forbids VALUE and intrinsic
copying.  It has also been extended since N1996 to allow changing to the
initial team, which is not in accordance with N1981, and I assume is an
accident.

5.3 page 10:18-21 should replace:

    The <team-variable> shall have been defined by execution of a FORM
    TEAM statement in the team that executes the CHANGE TEAM statement
    or be the value of a team variable for the initial team. The values
    of the <team-variable>s on the imag
es of the team shall be those
    defined by execution of the same FORM TEAM statement on all the
    images of the team.

by:

    The values of <team-variable> shall have been established by
    execution of a FORM TEAM statement in the team that executes the
    CHANGE TEAM statement.  The values of the <team-variable>s on all
    of the images of the team shall have been established by execution
    of the same FORM TEAM statement.


Teams.5) The wording of SYNC TEAM seems to allow mixing values created
by FORM TEAM with the initial ones, and is ambiguous about exactly which
team it is collective over.  It also refers to "the team variables for
the initial team", which doesn't make sense as there will usually be no
such variables.

5.6 page 12:21-25 should replace:

    The SYNC TEAM statement is an image control statement.  The value of
    <team-variable> shall have been established by execution of a FORM
    TEAM statement by the current team or an ancestor of the current
    team, or be the value of a team variable for the initial team.  The
    values of the <team-variable>s on the images of the team shall be
    those defined by execution of the same FORM TEAM statement on all
    the images of the team or shall be the values of the team variables
    for the initial team.

by:

    The SYNC TEAM statement is an image control statement.  The value of
    <team-variable> shall have been established by execution of a FORM
    TEAM statement by the current team or an ancestor of the current
    team, or shall specify the initial team.  The values of the
    <team-variable>s on the images of the team specified by the
    <team-variable> shall all specify the same team.


Teams.6) The example in 7.4.15 page 26:8-19 still asserts more than is
in the normative text.  The whole reason for adding NEW_INDEX to FORM
TEAM was to allow control over the image ordering but leave the default
ordering processor dependent.  This is not a mere nitpick - examples
like this are a major cause of erroneous interpretations by
programmers.

7.4.15 page 26:15 should be replaced by:

     :  ! Code for half of the images in the current team

7.4.15 page 26:17 should be replaced by:

     :  ! Code for the other half of the images in the current team


Teams.7) 5.5 page 12:7 refers to "the same team", but that is not
wholly clear.  A Note should be sufficient.  See below.


Teams.8) There is still nothing said about when resources may be
released, which is a sure recipe for some programmers assuming that they
will be and some implementors not doing it.  A simple Note would be
better than nothing.


The following Note addresses both points.  After 5.5 page 12:18+,
append:

    NOTE
    Each execution of the FORM TEAM statement establishes a new team
    for each value of <team-id>.  Separate executions or separate
    statements establish separate teams, even if the images and values
    of <team-id> are identical.  Programmers should not assume that
    FORM TEAM can be called an indefinite number of times, though
    processors are encouraged to support that.



ATOMICS
-------

Atomics.1) 7.2 and A.3.2 are truly massive improvements and, as far as I
can see, make the intent, requirements and non-requirements clear.
However, I think that there should be a paragraph saying explicitly that
the matter of progress is not addressed by the TS, and is processor
dependent.

A.3.2.2 page 46:9+, add:

    This includes the concept often called 'progress'.  A change to an
    atomic variable in a segment may become visible to another image
    'immediately', or may not until that image reaches a segment that
    is ordered after the segment in which it was changed.


Atomics.2) In 7.2 page 17:20-21, the TS says that developing a formal
data consistency model is left until integration into the main standard,
but it is important to note that such a model is NOT a matter of the
atomics alone.  It will be much harder to specify the interactions
between the atomics, collectives, events and the existing execution
order model, as well as when a processor is required to make progress
(if ever).  See also Event.2 and General.1.


Collectives
-----------

Collectives.1) This is an old objection, where I was mollified by
people saying that intrinsics need not follow the same rules as
ordinary procedures.  However, on thinking it over, that is not an
adequate answer.

Variable definition context is strictly defined (i.e. not processor
dependent) in Fortran 2008 16.6.7 and elsewhere, and some constraints
(especially C539) require a compile-time diagnostic to be issued if a
read-only entity is used in one.  The question is when SOURCE
constitutes a variable definition context.  N2007 does not say so, and
it is not one of the items listed in 16.6.7.

    a) If it always is, then collectives cannot be used on INTENT(IN)
arguments altogether, which is very poor software engineering.

    b) If it never is, then we have defined a circumstance under which a
variable can become defined that is not a variable definition context,
which is worse.

    c) It cannot be dependent on whether it can be modified, as that
is not statically determinable, though that is all that can be deduced
from the current wording.

    d) It could be dependent on the presence of a RESULT argument.  That
introduces a new syntactic concept into the base standard, which is
dubiously within the scope of N1981, but is feasible.


Collectives.2) The specification introduces a 'gotcha' where code like
the following will cause data corruption:

    SUBROUTINE Collective (array, target_image)
        INTEGER :: array, target_image
        CALL CO_MAX(array,target_image)
    END SUBROUTINE Collective

While that is the user's mistake, the standard prevents any kind of
diagnostic, which is poor software engineering.


Collectives.3) They are unnecessarily restrictive, in that there is no
good reason that the result should not be obtainable on an arbitrary
subset of images.


Collectives.4) While I can see why CO_BROADCAST is exceptional in having
solely an in-place form, there are circumstances under which a copying
form is wanted.  I agree that a foolish consistency is the hobgoblin of
little minds, but this is a case where there are good reasons to be
consistent, even if the extended function is rarely needed.


If this specification is preserved, at least the first issue should be
resolved in normative text, and preferably the others as well.

As an example of what I regard as a better approach, here is a rewrite
of CO_MAX.  The replicated wording for SOURCE and OBJECT could easily be
condensed, but I have not done so here.


7.4.7 CO_MAX (SOURCE [, RESULT, RESULT_IMAGE, STAT, ERRMSG]) or
      CO_MAX_IN_PLACE (OBJECT [, RESULT_IMAGE, STAT, ERRMSG])

Description. Compute elemental maximum value on the current team of
images.

Class. Collective subroutine.

Arguments.

SOURCE shall be of type integer, real, or character.  It is an
INTENT(IN) argument.  It shall have the same type and type parameters on
all images of the current team.  If it is a scalar, the computed value
is equal to the maximum value of SOURCE on all images of the current
team.  If it is an array it shall have the same shape on all images of
the current team and each element of the computed value is equal to the
maximum value of all the corresponding elements of SOURCE on the images
of the current team.

RESULT (optional) shall be of the same type, type parameters, and shape
as SOURCE. It is an INTENT(OUT) argument.

OBJECT shall be of type integer, real, or character.  It is an
INTENT(INOUT) argument.  It shall have the same type and type parameters
on all images of the current team.  If it is a scalar, the computed
value is equal to the maximum value of OBJECT on all images of the
current team.  If it is an array it shall have the same shape on all
images of the current team and each element of the computed value is
equal to the maximum value of all the corresponding elements of OBJECT
on the images of the current team.

[[[ Unchanged

RESULT_IMAGE (optional) shall be a scalar of type integer. It is an
INTENT(IN) argument.  If it is present, it shall be present on all
images of the current team, have the same value on all images of the
current team, and that value shall be the image index of an image of the
current team.

STAT (optional) shall be a scalar of type default integer. It is an
INTENT(OUT) argument.

ERRMSG (optional) shall be a scalar of type default character. It is an
INTENT(INOUT) argument.

]]]

For the purpose of matching the sequence of invocations of collective
subroutines [7.3 paragraph 1], CO_MAX and CO_MAX_IN_PLACE are the same
collective.

If RESULT_IMAGE is present, either OBJECT or RESULT shall be present on
the image specified by RESULT_IMAGE, and the computed value is assigned
to that argument.  If RESULT_IMAGE is not present, the computed value is
assigned to OBJECT or RESULT on all the images of the current team that
have either argument present.

[ Issues:

    1) Would it be better to say that any argument associated with
OBJECT or RESULT on other images than one specified by RESULT_IMAGE
becomes undefined?  Or not used?  Or processor dependent?  Or simply
unspecified, as in N2007?  I have no strong view.

    2) If no image has either OBJECT or RESULT, this is a waste of time,
but is perfectly well-defined.  That could easily be forbidden, though
I can't see why it need be.

]

[[[ Unchanged

The effect of the presence of the optional arguments STAT and ERRMSG is
described in 7.3.

Example. If the number of images in the current team is two and SOURCE
is the array [1, 5, 3] on one image and [4, 1, 6] on the other image,
the value of RESULT after executing the statement
CALL CO_MAX(SOURCE, RESULT) is [4, 5, 6] on both images.

]]]



GENERAL
-------

General.1) I remain extremely concerned about potential inconsistencies
and unimplementabilities in the specification, especially caused by
interactions between features.  In particular, several aspects
introduced since N1958/N1863 are beyond the domain of established
computer science - and, as far as I know, support for 'process' failure
has never been successfully standardised before, despite several
attempts over the past half-century.

The point here is not that this TS should be delayed until it has a data
consistency model, but that we cannot be certain that one is even
definable.  Half a century of experience in this area has taught us that
intuition is not a reliable guide, and 'obvious truths' are often false
when parallel data consistency is involved.  As an example in a much
more widely-used language than Fortran, Java has had threading support
and a memory model since about 1995 but it was shown, only in 2004, to
be badly flawed; it was replaced in 2005.

    http://www.ibm.com/developerworks/library/j-jtp02244/index.html

In the case of the facilities in this TS, I cannot either produce
examples of deadlock or causal loops, or convince myself that they do
not exist.  However, I can convince myself that livelock and infinite
loops (see Events.2), in the case where the program's logic contains no
such flaws, are possible with the current specification of EVENT_QUERY.

I do not believe that developing a proper data consistency model, formal
or informal, will be possible by February 2015.  As mentioned in
Atomics.2 and Events.2, this is NOT a matter of the atomics alone, but
of the interactions between the atomics, collectives, events and the
existing execution order model, as well as when a processor is required
to make progress (if ever).

A proposal is made below.


General.2) I also remain extremely concerned about whether this
specification will be reliably and efficiently implementable, for a
sufficiently high proportion of user programs, except on systems where
the hardware or operating system has special support for it.  I accept
that people who understand the implementation issues will be able to
write portably efficient programs, and SOME users will do so, but
the the question is how many.

A specific issue here is that Fortran is maintaining its position
largely because it can be used to write more efficient code than its
competitors.  If, in the eyes of the user community, it loses that edge,
it will disappear.  In particular, I cannot see how to to implement it
portably, reliably and efficiently, using only the reliably portable
facilities of POSIX, MPI and TCP/IP.  These do not include MPI passive
one-sided communication or even MPI_THREAD_MULTIPLE, largely because
there is no one-sided synchronisation in POSIX.

My understanding is that there are, as yet, only two full and released
implementations of Fortran 2008 coarrays (Cray's and Intel's), and very
little user experience with the latter; IBM's may be imminent, but I
know of no others.  In all cases, the implementation works only with the
vendor's own MPI, as far as I know.  It seems very unlikely indeed that
there will be significant experience with implementing and using the
facilities in this TS before February 2015. 

A proposal is made below.


General.3) This comment is made mainly because it supports the previous
points in this section.

It is dubious that this TS meets requirement S1 in N1981 any longer, as
it is over double the length as and much more complex than the original
proposal (N1858 and N1863).  Events can be regarded as replacing
NOTIFY/QUERY and parallel I/O has been dropped, but major new,
interacting, features include the new atomics, non-hierarchical team
usage, and support for image failure.


As a consequence of the above comments, I believe that this TS should be
explicitly removed from consideration for inclusion in the next Fortran
standard (see N1979), because the first CD ballot of the next revision
of the standard is due in February 2015.

