From owner-sc22wg5@open-std.org Thu Jun 25 08:27:28 2009 Return-Path: X-Original-To: sc22wg5-dom7 Delivered-To: sc22wg5-dom7@www2.open-std.org Received: by www2.open-std.org (Postfix, from userid 521) id 4904FC4596C; Thu, 25 Jun 2009 08:27:28 +0200 (CET DST) X-Original-To: sc22wg5@open-std.org Delivered-To: sc22wg5@open-std.org Received: from ns.nag-j.co.jp (218-42-159-107.cust.bit-drive.ne.jp [218.42.159.107]) by www2.open-std.org (Postfix) with ESMTP id 3A82EC3BB09 for ; Thu, 25 Jun 2009 08:27:02 +0200 (CET DST) Received: from 218-42-159-108.cust.bit-drive.ne.jp ([218.42.159.108] helo=[127.0.0.1]) by ns.nag-j.co.jp with esmtp (Exim 4.50) id 1MJiQ9-0006My-KB for sc22wg5@open-std.org; Thu, 25 Jun 2009 15:26:53 +0900 Message-ID: <4A4318BE.8040108@nag-j.co.jp> Date: Thu, 25 Jun 2009 15:27:10 +0900 From: Malcolm Cohen User-Agent: Thunderbird 3.0a1pre (Windows/2008022014) MIME-Version: 1.0 To: WG5 Subject: Re: (j3.2006) (SC22WG5.4023) Late in the day question References: <4A430B68.4040008@sun.com> In-Reply-To: <4A430B68.4040008@sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-sc22wg5@open-std.org Precedence: bulk Robert Corbett wrote: > > In any case, the FORALL construct in > Fortran 95 and FORTRAN 2003 is largely the same as that of HPF 2.0. > HPF copied theirs from Fortran 8X, copied from CM Fortran IIRC. FORALL as written makes sense when (very crudely) parallel evaluation trumps array temporaries. This was at least semi-true for various massively parallel machines in the 1980s which had 100s or 1000s of nodes each of which with significant local storage; as long as the array temps fitted in the local storage you could get 1000-fold speedup over the DO loops. (The array temps being the size of the iteration space, not the size of the data being worked on.) Once the iteration space got too large though, this did not work so well. Note that these machines had pretty good communication facilities for the time; in other distributed-memory systems the communications overheads are the limiting factor rather than the parallelising. (Hmm, that's not expressed very well, what I'm trying to get across is that the focus on the normal distributed-memory system is on partitioning the data sets, which is maybe helped a bit by FORALL but not really enough.) It's quite possible that FORALL might once again give good performance, if we get to systems with many hundreds of nodes each of which has significant local memory - and we have programs that can use its semantics, and that the parallel hardware is not yet maxed out by our other parallel facilities. That's a lot of if's. > In my work at Sun, I have had to explain what FORALL does to many users. > A lot of them thought that FORALL was a parallel DO. > > Some of the users I ran into seemed to think that it was a normal DO "that ran faster" without any specific reason other than "it's a new feature, therefore it must run faster". Mostly those users didn't even have dual processors so parallelism wasn't even on the map! Cheers, -- ........................Malcolm Cohen, Nihon NAG, Tokyo.