From owner-sc22wg5@open-std.org  Fri Apr  1 20:32:08 2011
Return-Path: <owner-sc22wg5@open-std.org>
X-Original-To: sc22wg5-dom8
Delivered-To: sc22wg5-dom8@www2.open-std.org
Received: by www2.open-std.org (Postfix, from userid 521)
	id 907E4C3BA28; Fri,  1 Apr 2011 20:32:08 +0200 (CET DST)
X-Original-To: sc22wg5@open-std.org
Delivered-To: sc22wg5@open-std.org
Received: from mailrelay1.lrz-muenchen.de (mailrelay1.lrz-muenchen.de [129.187.254.106])
	by www2.open-std.org (Postfix) with ESMTP id E642DC178E4
	for <sc22wg5@open-std.org>; Fri,  1 Apr 2011 20:32:07 +0200 (CET DST)
Received: from postout2.mail.lrz.de ([10.156.6.19] [10.156.6.19]) by mailrelay1.lrz-muenchen.de with ESMTP; Fri, 1 Apr 2011 20:31:55 +0200
Received: from BADWLRZ-SWHBT1.ads.mwn.de (BADWLRZ-SWHBT1.ads.mwn.de [IPv6:2001:4ca0:0:108::125])
	(using TLSv1 with cipher AES128-SHA (128/128 bits))
	(No client certificate requested)
	by postout2.mail.lrz.de (Postfix) with ESMTPS id 03D89A4121;
	Fri,  1 Apr 2011 20:31:54 +0200 (CEST)
Received: from BADWLRZ-SWMBX1.ads.mwn.de ([fe80::11b4:b130:c4e2:2d0e]) by
 BADWLRZ-SWHBT1.ads.mwn.de ([fe80::e42f:e9f5:bde9:f99b%17]) with mapi id
 14.01.0270.001; Fri, 1 Apr 2011 20:31:54 +0200
From: "Bader, Reinhold" <Reinhold.Bader@lrz.de>
To: WG5 <sc22wg5@open-std.org>, Tobias Burnus <burnus@net-b.de>
Subject: AW: (j3.2006) Comments to 10-166 (early coarray TR draft)
Thread-Topic: (j3.2006) Comments to 10-166 (early coarray TR draft)
Thread-Index: AQHL8I4wCPF8xJ9jqkG2S2R/LxsUp5RJUpUp
Date: Fri, 1 Apr 2011 18:31:53 +0000
Message-Id: <166ED263DF83324D9A3BA67FB6772B2BB5F40C@BADWLRZ-SWMBX1.ads.mwn.de>
References: <4D960451.7080909@net-b.de>
In-Reply-To: <4D960451.7080909@net-b.de>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [129.187.48.234]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

Hello Tobias, =0A=
=0A=
 A more recent WG5 with further suggestions is available at =0A=
=0A=
ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1835.txt=0A=
=0A=
This also gives an expectation on when the feature list could be fixed.=0A=
=0A=
Regards=0A=
Reinhold=0A=
=0A=
________________________________________=0A=
Von: j3-bounces@j3-fortran.org [j3-bounces@j3-fortran.org]&quot; im Auftrag=
 von &quot;Tobias Burnus [burnus@net-b.de]=0A=
Gesendet: Freitag, 1. April 2011 18:59=0A=
Bis: fortran standards email list for J3=0A=
Betreff: (j3.2006) Comments to 10-166 (early coarray TR draft)=0A=
=0A=
Hi all,=0A=
=0A=
admittedly, it is probably a bad timing as everyone is interested in TR=0A=
29113 and not other work items. However, I happened to have time to=0A=
glance at 10-166 (draft of coarray TR, dated 2010/02/18).=0A=
=0A=
First, I spotted an "ALL STOP" which should be an ERROR STOP (in A.1.1).=0A=
=0A=
Secondly, I miss a possibility to broadcast values to all (or to a=0A=
team); unless I have missed something even with TR one has still to do do:=
=0A=
=0A=
   if (this_image()=3D=3D1) then=0A=
     ! READ input file=0A=
     ! Distribute values:=0A=
     do image =3D 2, num_images()=0A=
       z[image] =3D z=0A=
     end do=0A=
   end do=0A=
   SYNC ALL=0A=
=0A=
(Or in the "IF" a "SYNC IMAGES(*)" and in ELSE a "SYNC IMAGES(1)"). I=0A=
think sending the value to each other image, image by image, is rather=0A=
slow if many images are involved. (Assume a calculation on 6k Blue Gene=0A=
processors or using the full 294,912 processors of the HPC system 600=0A=
metres from here.) On such systems, sending the configuration can then=0A=
take a significant amount of the total computation time. That time is=0A=
wasted especially as there is a dedicated collective network with=0A=
one-to-all broadcast functionality.=0A=
=0A=
For reductions, the draft provides the most important ones. However, I=0A=
see again some unneeded communication as: "A collective subroutine is=0A=
one that is invoked on a team of images to perform a calculation on=0A=
those images and which assigns the value of the result on all of them"=0A=
(4.1.1). While that is often the desired result, one frequently needs=0A=
the result only at one image. Coming again back to calculation on a=0A=
many-processor system: Doing the collective operations in a tree-like=0A=
manner and sending it to a single reduction-master image is faster than=0A=
collecting it on all systems - especially since there is a barrier (team=0A=
synchronization) after the reduction, which could be avoided on all but=0A=
the one image which is interested in the reduction.=0A=
=0A=
Tobias=0A=
=0A=
PS: It would be nice if someone could save the three comments such that=0A=
they can be discussed, when the topic comes up again after TR 29113.=0A=
=0A=
PPS: I hope I have found the latest draft.=0A=
_______________________________________________=0A=
J3 mailing list=0A=
J3@j3-fortran.org=0A=
http://j3-fortran.org/mailman/listinfo/j3=0A=
