From owner-sc22wg5@open-std.org  Thu Jun 25 08:27:28 2009
Return-Path: <owner-sc22wg5@open-std.org>
X-Original-To: sc22wg5-dom7
Delivered-To: sc22wg5-dom7@www2.open-std.org
Received: by www2.open-std.org (Postfix, from userid 521)
	id 4904FC4596C; Thu, 25 Jun 2009 08:27:28 +0200 (CET DST)
X-Original-To: sc22wg5@open-std.org
Delivered-To: sc22wg5@open-std.org
Received: from ns.nag-j.co.jp (218-42-159-107.cust.bit-drive.ne.jp [218.42.159.107])
	by www2.open-std.org (Postfix) with ESMTP id 3A82EC3BB09
	for <sc22wg5@open-std.org>; Thu, 25 Jun 2009 08:27:02 +0200 (CET DST)
Received: from 218-42-159-108.cust.bit-drive.ne.jp ([218.42.159.108] helo=[127.0.0.1])
	by ns.nag-j.co.jp with esmtp (Exim 4.50)
	id 1MJiQ9-0006My-KB
	for sc22wg5@open-std.org; Thu, 25 Jun 2009 15:26:53 +0900
Message-ID: <4A4318BE.8040108@nag-j.co.jp>
Date: Thu, 25 Jun 2009 15:27:10 +0900
From: Malcolm Cohen <malcolm@nag-j.co.jp>
User-Agent: Thunderbird 3.0a1pre (Windows/2008022014)
MIME-Version: 1.0
To: WG5 <sc22wg5@open-std.org>
Subject: Re: (j3.2006) (SC22WG5.4023) Late in the day question
References: <Pine.LNX.4.21.0906242048120.2308-100000@mjolnir> <4A430B68.4040008@sun.com>
In-Reply-To: <4A430B68.4040008@sun.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-sc22wg5@open-std.org
Precedence: bulk


Robert Corbett wrote:
>
> In any case, the FORALL construct in
> Fortran 95 and FORTRAN 2003 is largely the same as that of HPF 2.0.
>   
HPF copied theirs from Fortran 8X, copied from CM Fortran IIRC.


FORALL as written makes sense when (very crudely) parallel evaluation 
trumps array temporaries.  This was at least semi-true for various 
massively parallel machines in the 1980s which had 100s or 1000s of 
nodes each of which with significant local storage; as long as the array 
temps fitted in the local storage you could get 1000-fold speedup over 
the DO loops.  (The array temps being the size of the iteration space, 
not the size of the data being worked on.)  Once the iteration space got 
too large though, this did not work so well.

Note that these machines had pretty good communication facilities for 
the time; in other distributed-memory systems the communications 
overheads are the limiting factor rather than the parallelising.  (Hmm, 
that's not expressed very well, what I'm trying to get across is that 
the focus on the normal distributed-memory system is on partitioning the 
data sets, which is maybe helped a bit by FORALL but not really enough.)

It's quite possible that FORALL might once again give good performance, 
if we get to systems with many hundreds of nodes each of which has 
significant local memory - and we have programs that can use its 
semantics, and that the parallel hardware is not yet maxed out by our 
other parallel facilities.  That's a lot of if's.

> In my work at Sun, I have had to explain what FORALL does to many users.
> A lot of them thought that FORALL was a parallel DO.
>
>   
Some of the users I ran into seemed to think that it was a normal DO 
"that ran faster" without any specific reason other than "it's a new 
feature, therefore it must run faster".  Mostly those users didn't even 
have dual processors so parallelism wasn't even on the map!

Cheers,
-- 
........................Malcolm Cohen, Nihon NAG, Tokyo.