From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org  Mon Sep 12 15:36:13 2011
Return-Path: <owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org>
X-Original-To: sc22wg5-dom8
Delivered-To: sc22wg5-dom8@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id 5EE0A3568E8; Mon, 12 Sep 2011 15:36:13 +0200 (CEST)
Delivered-To: sc22wg5@open-std.org
Received: from mk-filter-4-a-1.mail.uk.tiscali.com (mk-filter-4-a-1.mail.tiscali.co.uk [212.74.100.55])
	by www.open-std.org (Postfix) with ESMTP id 6B20C3568A3
	for <sc22wg5@open-std.org>; Mon, 12 Sep 2011 15:36:00 +0200 (CEST)
X-Trace: 662391336/mk-filter-4.mail.uk.tiscali.com/B2C/$b2c-THROTTLED/TalkTalk_Customer/92.21.163.83/None/John.Reid@stfc.ac.uk
X-SBRS: None
X-RemoteIP: 92.21.163.83
X-IP-MAIL-FROM: John.Reid@stfc.ac.uk
X-SMTP-AUTH: 
X-Originating-Country: GB/UNITED KINGDOM
X-MUA: Mozilla/5.0 (Windows NT 5.1;
 rv:6.0.2) Gecko/20110902 Firefox/6.0.2 SeaMonkey/2.3.3
X-IP-BHB: Once
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApIBACgKbk5cFaNT/2dsb2JhbAAMNap+LCUuDwwKGAMCAQIBNxQKAwYCAr46g1qDFASOLoUNhQeMGA
X-IronPort-AV: E=Sophos;i="4.68,368,1312153200"; 
   d="txt'?scan'208";a="662391336"
Received: from host-92-21-163-83.as13285.net (HELO [127.0.0.1]) ([92.21.163.83])
  by smtp.tiscali.co.uk with ESMTP; 12 Sep 2011 14:35:59 +0100
Message-ID: <4E6E0ABE.4050500@stfc.ac.uk>
Date: Mon, 12 Sep 2011 14:35:58 +0100
From: John Reid <John.Reid@stfc.ac.uk>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20110902 Firefox/6.0.2 SeaMonkey/2.3.3
MIME-Version: 1.0
To: WG5 <sc22wg5@open-std.org>
Subject: Coarray comments from Germany
Content-Type: multipart/mixed;
 boundary="------------070709000505040104050300"
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

This is a multi-part message in MIME format.
--------------070709000505040104050300
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

WG5,

Reinhold has sent me two coarray comments from Germany.

The first comes from Reinhold himself, who says "I've talked with Uwe 
Küster (HLRS), who has experience with the Cray implementation, as well 
as Tobias Burnus; both share my serious doubts that the technical 
content of N1858 is suitable (without a serious redesign effort).".

The second comes from Uwe, who wants an improved version of NOTIFY/QUERY 
to be very high on the priority list.

Cheers,

John.

--------------070709000505040104050300
Content-Type: text/plain;
 name="comment_coarray_TS.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="comment_coarray_TS.txt"

Answer to N1868
~~~~~~~~~~~~~~~

The question to be answered is: 

"Is the technical content of N1858 suitable for the TS on further coarray
features?"

My answer is: NO, with comments

[I'm aware that this is not one of the alternatives I was allowed to 
 choose among, but after much consideration I still think this is the 
 appropriate answer].


Reasons for the NO vote and comments: 

(1) WG5/N1858 Coarray TS draft (Long) was extracted from a Fortran 
    2008 draft standard, among other reasons, because its technical 
    content was considered at least partially controversial. In its 
    present form, too many issues and open questions remain, as is 
    indicated by the fact that a lot of additional suggestions were 
    made in 
    WG5/N1835 Requirements for TR of further coarray features (Reid), 
    as well as 
    WG5/N1856 Addition/Modification of CAF Features (Authors from 
    Rice University). 
    These also reflect new knowledge obtained from a number of years of 
    research, which should be taken into account when re-designing the 
    basic ideas for the coarray extensions. 

(2) While I agree that the workload on J3 may be an issue, and that the 
    number and complexity of the features dealt with by the TS should be 
    limited, based on the observations in (1) I think it is unrealistic 
    to expect that the minimal reasonable feature set will only be as 
    big and complex as the one defined in N1858. My opinion is that the 
    correct way to deal with this situation is to 

    (a) set up, in a manner analogous to how 
        WG5/N1820 C Interoperability Objectives (Maclaren/Long)
        did for the interop TR, a document which describes the 
        objectives, including features, requirements, constraints and 
        excluded features for the coarray TS. It would be nice if the
        feature items on this list could be individually be voted on on 
        the WG5 level.

    (b) estimate the additional amount of work needed to get the work
        done by J3. If this takes longer than originally envisioned, 
        this is still a better situation than rapidly churning out a 
        badly designed coarray extension.

    In my opinion the sweet point is to have a feature set which is 
    approximately 30% bigger in complexity and workload on J3 as the
    one implied by N1858, but will provide a significantly higher 
    enhancement of programming productivity as well as performance 
    scalability to coarray Fortran users. The price to pay (probably 
    2-3 additional J3 meetings compared to the present schedule in
    WG5/N1859 Strategic plans for WG5 (Reid)) seems adequate; 
    having an overlap of at most one year with the startup of the work 
    for the next Fortran standard also seems acceptable. 

(3) I also do not consider it a good idea to freeze part of the features
    before all others, at least not unless the process suggested in (2a) 
    allows to determine that a particular feature decided there does not
    interact in any relevant way with a feature targeted for "early 
    release" (unlikely). This is also an argument against attempting to 
    split off parts of the coarray TS contents to be treated separately 
    in future Fortran extensions. 


--------------070709000505040104050300
Content-Type: text/plain; charset=windows-1252;
 name="Notify_Query_Kuester.txt"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment;
 filename="Notify_Query_Kuester.txt"


Coarray Fortran enables the programmer to formulate single sided 
communication in a simple and intuitive way via the codimensions syntax.

Why is this important?
In a modern computer we see latencies for data access of various kinds.
These are memory and cache latencies, and much larger latencies in the 
interconnection network.
Latencies are hindering for obtaining good performance because they limit
the bandwidth that is actually reachable for small-size messages.
In a given architecture latencies cannot be reduced. They can be avoided
by concatenating a bunch of single data to a stream of data with a 
latency appearing only once at the begin of the stream.
Or they can be hidden behind other useful operations.
Requesting data by a consumer from a remote source typically means that
a latency appears twice, for sending the request (the remote address)
and transferring the data back. The advantage is that the consumer 
can consume the data directly after arrival.

a = b[remote_proc]

allows for the immediate use of a after this instruction. Unless the 
compiler can reorder the fetch of b[] to an earlier  point of time 
we have to wait for a long latency time assuming that b[] is already 
defined in the remote memory.

Using the opposite direction

a[target_proc] = b

would imply nearly no latency for the image where "b" is residing.
The image target_proc is paying by the uncertainty about when the 
data will arrive. A synchronizing call 

sync images([target_proc,remote_proc])

ensures that "a" can be used on target_proc. But it requires more 
than twice the latency.
A well formulated and well programmed parallel algorithm should 
contain as few synchronization points as possible to ensure high 
performance for a large number of active images.

The flow of information should go only in one direction in order to
decouple sender and receiver. This removes some latencies and 
allows for pipelining.

If the order of the information transfer is not changed, a special
trailer at the end of the transmitted data can inform the target 
processor about successful arrival of data.
The consuming processor may wait for the data or can do other 
work in the meantime.

That is the purpose of NOTIFY --> QUERY pairs.
The sending processor informs via NOTIFY that it has initiated the 
transmission and has transferred the data to the transmitting hardware.
The image target_proc recognizes the message as trailer of the data.

Without the NOTIFY --> QUERY dependence the one-sided communication 
capabilities of Coarray Fortran are not complete. Unwanted 
synchronization via "sync images" or "sync all" is needed.


Remark 1: 
Because the notifying image could proceed to another context and 
would produce other NOTIFYs in this new context for other purposes, 
it will be necessary to differentiate between the different contexts.


Remark 2:
QUERY([proc]) will wait and block image proc in the case that 
the image target_proc has not yet received the data.

This is very different from the behaviour QUERY([proc], READY=ready) 
which will neither block image target_proc nor image proc.

I would recommend differing names, e.g. BLOCKING_QUERY for the first 
case.


Uwe Küster (kuester@hlrs.de)
 

--------------070709000505040104050300--