From owner-sc22wg5@open-std.org  Fri Mar 20 23:45:25 2009
Return-Path: <owner-sc22wg5@open-std.org>
X-Original-To: sc22wg5-dom7
Delivered-To: sc22wg5-dom7@www2.open-std.org
Received: by www2.open-std.org (Postfix, from userid 521)
	id 68E98C76BB5; Fri, 20 Mar 2009 23:45:25 +0100 (CET)
X-Original-To: sc22wg5@open-std.org
Delivered-To: sc22wg5@open-std.org
X-Greylist: delayed 784 seconds by postgrey-1.18 at www2.open-std.org; Fri, 20 Mar 2009 23:45:24 CET
Received: from smtp.llnl.gov (nspiron-3.llnl.gov [128.115.41.83])
	by www2.open-std.org (Postfix) with ESMTP id 754F9C76BB3
	for <sc22wg5@open-std.org>; Fri, 20 Mar 2009 23:45:24 +0100 (CET)
X-Attachments: None
Received: from cyrus2.llnl.gov ([128.15.97.105])
  by smtp.llnl.gov with ESMTP; 20 Mar 2009 15:32:06 -0700
From: Aleksandar Donev <donev1@llnl.gov>
Organization: LLNL
To: Van.Snyder@jpl.nasa.gov, sc22wg5 <sc22wg5@open-std.org>
Subject: Re: (j3.2006) (SC22WG5.3961) [Fwd: Fortran concurrency memory model vs. its C++ counterpart]
Date: Fri, 20 Mar 2009 15:32:05 -0700
User-Agent: KMail/1.9.4
References: <20090320215810.99DEDC76BB3@www2.open-std.org>
In-Reply-To: <20090320215810.99DEDC76BB3@www2.open-std.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200903201532.05764.donev1@llnl.gov>
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

Some answers:

On Friday 20 March 2009 14:57, Van Snyder wrote:
> 1) I couldn't find a clear description of the semantics of a data
> race.=20
We do, only the words definition and race are not used, rather, it is a=20
bunch of restrictions.

> If I access a coarray element while someone else is writing it,=20
> what are the possible outcomes?
It is not allowed to do that. We do not prescribe what happens with=20
non-conforming programs---the compiler/RTL can do whatever.

> =A0=A0=A0=A0=A0=A0=A0=A0c) The program behavior becomes undefined. =A0Thi=
s is the
> approach taken by Posix and C++0x. =A0Mostly allows existing
> optimizations.
Yes, this is our choice too, though I do not know what Posix does=20
exactly.

> 2) Presumably the intent is to prohibit the compiler from introducing
> new "speculative" stores to co-arrays that may add data races?
Correct, such "harmless" "optimizations" (in a serial context), can=20
cause problems with shared data and compilers must disable those. We=20
don't say such things explicitly in Fortran---the compilers need to do=20
the right thing to produce the correct answer for any conforming=20
program. If they do the above, they are in error.

> C++ and C are expected to outlaw this, with exceptions
> for sequences of contiguous bit-fields (which I assume don't exist in
> Fortran?) =A0
They don't, yet :-)

> 3) Fortran relies at least superficially on explicit memory fences
> ("sync_memory") to provide memory ordering guarantees. =A0C++0x and
> Java instead provide a "sequential consistency for data-race-free
> programs" guarantee by default, with some esoteric constructs to
> defeat that for performance.=20
By contrast, Fortran's default is performance and we have "some esoteric=20
constructs" to ensure some form of sequential consistency.

>=A0(On X86, with shared memory, I'd expect a factor
> of 10 or so difference between bracketing every atomic access with
> sync_memory vs. the C++ approach.=20
Yes, but what is the cost of the implementation ensuring=20
sequential-consistency on, say, a cluster with 1000 nodes???

We hope not to see coarray codes with loads of sync memories. Use=20
another language for that---coarrays are meant to cover coarse-grained=20
parallelism for the most part. We don't even provide real atomic=20
features, just a simple load and store!

> =A0(For example, you get into subtle issues as to whether data
> dependencies are required to enforce memory ordering. =A0Programmers
> tend to automatically assume yes. =A0Implementers tend to automatically
> assume no.)
These are indeed tricky, and we have argued over them at length.=20
However, again, the hope is that most (Fortran---recall that these tend=20
to be different from people such as yourselves!) programmers will not=20
dwelve into such depths.

> I am actually hoping that hardware implementations gradually favor
> the C ++/Java approach even more
Perhaps for shared-memory one-machine type hardware. Certainly it won't=20
happen for large clusters anytime soon, will it?

> 4) I'm not sure what, if anything, volatile means with respect to
> concurrent access by multiple images.=20
Neither do we, and I do not care to know :-) I am not trying to be=20
annoying, it is just a question I have had to discuss on this list=20
waaaay too many times for no good reason or outcome.

Probably, implementations will follow the C model, whatever that happens=20
to be. The standard itself will defer this as a processor-dependent=20
issue....I hope.

Best,
Aleks