From owner-sc22wg5@open-std.org  Fri Mar 20 22:58:10 2009
Return-Path: <owner-sc22wg5@open-std.org>
X-Original-To: sc22wg5-dom7
Delivered-To: sc22wg5-dom7@www2.open-std.org
Received: by www2.open-std.org (Postfix, from userid 521)
	id 544F8C76BB5; Fri, 20 Mar 2009 22:58:10 +0100 (CET)
X-Original-To: sc22wg5@open-std.org
Delivered-To: sc22wg5@open-std.org
Received: from mail.jpl.nasa.gov (sentrion1.jpl.nasa.gov [128.149.139.105])
	by www2.open-std.org (Postfix) with ESMTP id D7D05C76BB3
	for <sc22wg5@open-std.org>; Fri, 20 Mar 2009 22:57:56 +0100 (CET)
Received: from mprox3.jpl.nasa.gov (mprox3.jpl.nasa.gov [137.78.160.171])
	by mail.jpl.nasa.gov (Switch-3.3.2mp/Switch-3.3.2mp) with ESMTP id n2KLvrqq032141
	for <sc22wg5@open-std.org>; Fri, 20 Mar 2009 21:57:54 GMT
Received: from [137.79.7.57] (math.jpl.nasa.gov [137.79.7.57])
	(authenticated bits=0)
	by mprox3.jpl.nasa.gov (Switch-3.2.6/Switch-3.2.6) with ESMTP id n2KLvqlE005246
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <sc22wg5@open-std.org>; Fri, 20 Mar 2009 14:57:52 -0700
Subject: [Fwd: Fortran concurrency memory model vs. its C++ counterpart]
From: Van Snyder <Van.Snyder@jpl.nasa.gov>
Reply-To: Van.Snyder@jpl.nasa.gov
To: sc22wg5 <sc22wg5@open-std.org>
Content-Type: text/plain
Organization: Yes
Date: Fri, 20 Mar 2009 14:57:52 -0700
Message-Id: <1237586272.17416.729.camel@math.jpl.nasa.gov>
Mime-Version: 1.0
X-Mailer: Evolution 2.12.3 (2.12.3-8.el5_2.3) 
Content-Transfer-Encoding: 7bit
X-Source-IP: math.jpl.nasa.gov [137.79.7.57]
X-Source-Sender: Van.Snyder@jpl.nasa.gov
X-AUTH: Authorized
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

Does anybody care to tackle these questions from what appears to be a
member of the C++ committee?

I've already replied with superficial answers, and invited the
correspondents to contact Bill, John or Craig, or to come to a meeting.
Let me know if you want to see exactly what I sent them.

Van

-------- Forwarded Message --------
From: Boehm, Hans <hans.boehm@hp.com>
To: Snyder, W Van <w.van.snyder@jpl.nasa.gov>, johnmc@cs.rice.edu
<johnmc@cs.rice.edu>
Cc: Schreiber, Robert Samuel <rob.schreiber@hp.com>, Boehm, Hans
<hans.boehm@hp.com>
Subject: Fortran concurrency memory model vs. its C++ counterpart
Date: Fri, 20 Mar 2009 13:19:30 -0700

I have been leading an effort to specify clean semantics for shared
variable access in the next C++ standard, commonly referred to as C++0x.
(It is actually expected to be finalized in 2010 or 2011.)  We've made
an attempt for consistency with Java where there were not strong
arguments to the contrary.  (And indeed,the people involved overlapped
appreciably.)  The C committee appears likely to adopt a model similar
to the C++0x one, though at a later time.  We've been working towards
consistency with OpenMP, but I don't think we're quite there yet.

David Hough forwarded a pointer to the Fortran draft on the
numeric-interest mailing list, and I took a quick look at it.  I haven't
had a lot of contact with Fortran during the last 30 years, so some of
this may have to be taken with a grain of salt.  I also don't even know
whether HP has any representation in the Fortran standardization
process.  I'm certainly not officially representing HP here.

Based on what I was quickly able to deduce, there are some possible
discreptancies, which may cause problems, for one of two reasons:

1) Programmers who switch between languages are likely to be confused by
any unmotivated differences in what's already a very confusing part of
language definitions.

2) The actual compiler back-end implementations seem fairly likely to be
shared between languages.  As far as I can tell, at least for certain
kinds of disagreements, this often greatly reduces the chances that
several conflicting standards in this area will all be implemented
correctly.

Here is my initial perception of the issues.  I'm intentionally ignoring
inherent co-array vs. threads distinctions, and focussing on questions
to which we could, and probably should, share the same answer.  I'm no
doubtimplicitly focussing on shared memory implementations.

1) I couldn't find a clear description of the semantics of a data race.
If I access a coarray element while someone else is writing it, what are
the possible outcomes?  Possibilities:

	a) I see either the old or the new value, possibly subject to some
additional constraints, as in Java for most data types.  There are great
reasons for Java to follow this route, but they don't apply to C++ or
Fortran.  And the Java spec has some issues.  We don't actually know how
to specify this precisely in a way that avoids significant anomalies.
	b) I see some unspecified value that may be a combination of the old
and new values.  I think no language currently really follows this path,
though it's sometimes useful.  Still disallows some compiler
transformations, such as reloading a spilled register from a shared
variable/coarray.
	c) The program behavior becomes undefined.  This is the approach taken
by Posix and C++0x.  Mostly allows existing optimizations.

Since there are atomic accesses (good!), I assume (a) is not intended.
I would argue for (c), for a number of reasons that I can expand on (or
point you to C++-related documents).  Even for Java, I would greatly
encourage programs to treat data races as bugs; we just can't outlaw
them for a combination of security and historical reasons.

(I actually believe the correct answer in all cases is something like
(d) Data races result in an error report.  I just don't know how to
implement that with reasonable efficiency.)

2) Presumably the intent is to prohibit the compiler from introducing
new "speculative" stores to co-arrays that may add data races?  Current
compiler back-ends sometimes do this, either to implement stores of
small objects adjacent to the one in question, or because a shared value
is promoteed to a register during a loop, and then written back even if
the loop executes no iterations, or because it makes it possible to
vectorise a loop that should probably have been written differently.  I
think there is now a near consensus that such systems are not
programmable by mere humans.  Though this rarely breaks code, the cases
in which it may are essentially impossible to characterize.  C++ and C
are expected to outlaw this, with exceptions for sequences of contiguous
bit-fields (which I assume don't exist in Fortran?)  I think it would be
good to be clear on this issue.

3) Fortran relies at least superficially on explicit memory fences
("sync_memory") to provide memory ordering guarantees.  C++0x and Java
instead provide a "sequential consistency for data-race-free programs"
guarantee by default, with some esoteric constructs to defeat that for
performance.  Atomic accesses are normally implemented with some
implicit fences to ensure sequential consistency, but it typically
doesn't take nearly as many as in the Fortran examples.  (On X86, with
shared memory, I'd expect a factor of 10 or so difference between
bracketing every atomic access with sync_memory vs. the C++ approach.
And I don't think that's optimizable without whole program analysis, and
often not then.)  I also believe the "sequential consistency for
data-race-free programs model is far more intuitive and less error-prone
for those who are not already familiar with the issues.  Even for those
who are, my general experience has been that fence-based code is rarely
correct.  (For example, you get into subtle issues as to whether data
dependencies are required to enforce memory ordering.  Programmers tend
to automatically assume yes.  Implementers tend to automatically assume
no.)

C++0x actually provides equivalents of the Fortran constructs, but I
normally try hard to discourage their use.  (And Microsoft's
representative indicated at some point that they would probably forbid
their use completely.)  Thus this isn't an implementation compatibility
issue, but it is a serious programmability issue.  Even at this late
stage, it seems reasonable to me to consider a switch, given that
hopefully not too much code actually uses explicit sync_memory
statements.  This clearly interacts with the first issue.

I am actually hoping that hardware implementations gradually favor the C
++/Java approach even more, since hardware architects finally have a
reasonably well-defined language-level model that they can target, and
that is likely to cover most of the market.  If that happens, that would
be another reason for consistency among the language standards. 

4) I'm not sure what, if anything, volatile means with respect to
concurrent access by multiple images.  The C++ answer is that it has
nothing to do with threads.  Variables that may be accessed and modified
concurrently are dedlared as "atomic" variables, not "volatile".  There
are very good reasons for that having to do with incompatibility with
existing uses of "volatile".  But Java uses "volatile" for what became
"atomic" in C++.

In all cases, I'd be happy to send pointers to further
analysis/discussion.  We have been spending a lot of time on these and
related issues.  There is an overview of the C++0x memory model and some
of the issue underlying it in the PLDI 08 proceedings.

Hans