From owner-sc22wg5@open-std.org  Fri Mar 20 23:54:13 2009
Return-Path: <owner-sc22wg5@open-std.org>
X-Original-To: sc22wg5-dom7
Delivered-To: sc22wg5-dom7@www2.open-std.org
Received: by www2.open-std.org (Postfix, from userid 521)
	id AFCF5C76BB9; Fri, 20 Mar 2009 23:54:13 +0100 (CET)
X-Original-To: sc22wg5@open-std.org
Delivered-To: sc22wg5@open-std.org
Received: from mail1.cray.com (mail1.cray.com [136.162.0.111])
	by www2.open-std.org (Postfix) with ESMTP id B1FA5C76BB8
	for <sc22wg5@open-std.org>; Fri, 20 Mar 2009 23:53:57 +0100 (CET)
Received: from beaver.us.cray.com (beaver.us.cray.com [172.30.74.51])
	by mail1.cray.com (8.13.6/8.13.3/gw-5323) with ESMTP id n2KMrqhK005911
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Fri, 20 Mar 2009 17:53:52 -0500 (CDT)
Received: from CFEXFE01.us.cray.com (cfexfe01.us.cray.com [172.30.74.93])
	by beaver.us.cray.com (8.13.8/8.13.3/hub-5273) with ESMTP id n2KMroCP023479;
	Fri, 20 Mar 2009 17:53:51 -0500
Received: from fortran.us.cray.com ([172.31.19.200]) by CFEXFE01.us.cray.com with Microsoft SMTPSVC(6.0.3790.3959);
	 Fri, 20 Mar 2009 17:53:50 -0500
Message-ID: <49C41EB4.7050307@cray.com>
Date: Fri, 20 Mar 2009 17:54:44 -0500
From: Bill Long <longb@cray.com>
Reply-To: longb@cray.com
Organization: Cray Inc.
User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302)
MIME-Version: 1.0
To: Van.Snyder@jpl.nasa.gov,
	fortran standards email list for J3 <j3@j3-fortran.org>
Cc: sc22wg5 <sc22wg5@open-std.org>
Subject: Re: (j3.2006) (SC22WG5.3961) [Fwd: Fortran concurrency memory model
 vs.	its C++ counterpart]
References: <20090320215810.99DEDC76BB3@www2.open-std.org>
In-Reply-To: <20090320215810.99DEDC76BB3@www2.open-std.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 20 Mar 2009 22:53:50.0903 (UTC) FILETIME=[BC855C70:01C9A9AE]
X-Cray-VirusStatus: clean
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

Hi Van,

Below would be my proposed responses.... Feel free to forward them if 
you like.


Van Snyder wrote:
> Does anybody care to tackle these questions from what appears to be a
> member of the C++ committee?
>
> I've already replied with superficial answers, and invited the
> correspondents to contact Bill, John or Craig, or to come to a meeting.
> Let me know if you want to see exactly what I sent them.
>
> Van
>
> -------- Forwarded Message --------
> From: Boehm, Hans <hans.boehm@hp.com>
> To: Snyder, W Van <w.van.snyder@jpl.nasa.gov>, johnmc@cs.rice.edu
> <johnmc@cs.rice.edu>
> Cc: Schreiber, Robert Samuel <rob.schreiber@hp.com>, Boehm, Hans
> <hans.boehm@hp.com>
> Subject: Fortran concurrency memory model vs. its C++ counterpart
> Date: Fri, 20 Mar 2009 13:19:30 -0700
>
> I have been leading an effort to specify clean semantics for shared
> variable access in the next C++ standard, commonly referred to as C++0x.
> (It is actually expected to be finalized in 2010 or 2011.)  We've made
> an attempt for consistency with Java where there were not strong
> arguments to the contrary.  (And indeed,the people involved overlapped
> appreciably.)  The C committee appears likely to adopt a model similar
> to the C++0x one, though at a later time.  We've been working towards
> consistency with OpenMP, but I don't think we're quite there yet.
>   

The Fortran parallelism model complements OpenMP.  The program is 
replicated across multiple processors, with each instance called an 
"image".  OpenMP can be used (assuming compiler support exists) within a 
single image for an additional layer of parallelism.

> David Hough forwarded a pointer to the Fortran draft on the
> numeric-interest mailing list, and I took a quick look at it.  I haven't
> had a lot of contact with Fortran during the last 30 years, so some of
> this may have to be taken with a grain of salt.  I also don't even know
> whether HP has any representation in the Fortran standardization
> process.  I'm certainly not officially representing HP here.
>   

HP was represented until very recently.  I believe the costs of travel 
from India was a major consideration in no longer attending. Most of the 
Fortran meetings are in Las Vegas.


> Based on what I was quickly able to deduce, there are some possible
> discreptancies, which may cause problems, for one of two reasons:
>
> 1) Programmers who switch between languages are likely to be confused by
> any unmotivated differences in what's already a very confusing part of
> language definitions.
>   

The Fortran model is conceptually similar to MPI, at least in terms of 
how a program is structured.  Given that MPI is widely used, we hope 
that the confusion of migrating to Fortran 2008 will be minimal.

> 2) The actual compiler back-end implementations seem fairly likely to be
> shared between languages.  As far as I can tell, at least for certain
> kinds of disagreements, this often greatly reduces the chances that
> several conflicting standards in this area will all be implemented
> correctly.
>   

I can speak only for Cray on this point, but we do use the same compiler 
optimizer and back end infrastructure for both Fortran 2008 and UPC.  
Most of the underlying runtime library code is also shared.

> Here is my initial perception of the issues.  I'm intentionally ignoring
> inherent co-array vs. threads distinctions, and focussing on questions
> to which we could, and probably should, share the same answer.  I'm no
> doubtimplicitly focussing on shared memory implementations.
>
> 1) I couldn't find a clear description of the semantics of a data race.
> If I access a coarray element while someone else is writing it, what are
> the possible outcomes?  Possibilities:
>
> 	a) I see either the old or the new value, possibly subject to some
> additional constraints, as in Java for most data types.  There are great
> reasons for Java to follow this route, but they don't apply to C++ or
> Fortran.  And the Java spec has some issues.  We don't actually know how
> to specify this precisely in a way that avoids significant anomalies.
> 	b) I see some unspecified value that may be a combination of the old
> and new values.  I think no language currently really follows this path,
> though it's sometimes useful.  Still disallows some compiler
> transformations, such as reloading a spilled register from a shared
> variable/coarray.
> 	c) The program behavior becomes undefined.  This is the approach taken
> by Posix and C++0x.  Mostly allows existing optimizations.
>   

Fortran 2008 specifies option (c) above.  Unless explicitly coded using 
atomic intrinsics, potentially colliding accesses are not allowed.

> Since there are atomic accesses (good!), I assume (a) is not intended.
> I would argue for (c), for a number of reasons that I can expand on (or
> point you to C++-related documents).  Even for Java, I would greatly
> encourage programs to treat data races as bugs; we just can't outlaw
> them for a combination of security and historical reasons.
>
> (I actually believe the correct answer in all cases is something like
> (d) Data races result in an error report.  I just don't know how to
> implement that with reasonable efficiency.)
>
> 2) Presumably the intent is to prohibit the compiler from introducing
> new "speculative" stores to co-arrays that may add data races?  Current
> compiler back-ends sometimes do this, either to implement stores of
> small objects adjacent to the one in question, or because a shared value
> is promoteed to a register during a loop, and then written back even if
> the loop executes no iterations, or because it makes it possible to
> vectorise a loop that should probably have been written differently.  I
> think there is now a near consensus that such systems are not
> programmable by mere humans.  Though this rarely breaks code, the cases
> in which it may are essentially impossible to characterize.  C++ and C
> are expected to outlaw this, with exceptions for sequences of contiguous
> bit-fields (which I assume don't exist in Fortran?)  I think it would be
> good to be clear on this issue.
>   

Fortran does not have a bit field data type.  The language standard 
defines behavior for variables, and has no discussion of concepts like 
shared cache lines.  It is up to the implementation to get things like 
that right.  (This is not a new concept - in the days of word 
addressable hardware, accesses and definitions of character variables in 
OpenMP codes  had the same issues.)

> 3) Fortran relies at least superficially on explicit memory fences
> ("sync_memory") to provide memory ordering guarantees.  C++0x and Java
> instead provide a "sequential consistency for data-race-free programs"
> guarantee by default, with some esoteric constructs to defeat that for
> performance.  Atomic accesses are normally implemented with some
> implicit fences to ensure sequential consistency, but it typically
> doesn't take nearly as many as in the Fortran examples.  (On X86, with
> shared memory, I'd expect a factor of 10 or so difference between
> bracketing every atomic access with sync_memory vs. the C++ approach.
> And I don't think that's optimizable without whole program analysis, and
> often not then.)  I also believe the "sequential consistency for
> data-race-free programs model is far more intuitive and less error-prone
> for those who are not already familiar with the issues.  Even for those
> who are, my general experience has been that fence-based code is rarely
> correct.  (For example, you get into subtle issues as to whether data
> dependencies are required to enforce memory ordering.  Programmers tend
> to automatically assume yes.  Implementers tend to automatically assume
> no.)
>   

Many of the examples in the draft Fortran 2008 standard appear to have a 
lot of synchronization code because they are so short.  Actual coarray 
codes tend to have a much lower density of synchronization points.  
Furthermore, vendors are free to optimize constructs if the underlying 
semantics are obeyed.  For example, this code segment:

   critical
      n[1] = n[1]+1
   end critical

might be implemented as a single atomic add instruction and if that is 
supported in the target hardware. The implied fence could be migrated or 
even eliminated depending on the surrounding code.  Note that most codes 
would not use explicit SYNC MEMORY statements that often.  Instead they 
would use other synchronization statements (e.g. SYNC ALL) that 
incorporate SYNC MEMORY as part of a larger synchronization action.


> C++0x actually provides equivalents of the Fortran constructs, but I
> normally try hard to discourage their use.  (And Microsoft's
> representative indicated at some point that they would probably forbid
> their use completely.)  Thus this isn't an implementation compatibility
> issue, but it is a serious programmability issue.  Even at this late
> stage, it seems reasonable to me to consider a switch, given that
> hopefully not too much code actually uses explicit sync_memory
> statements.  This clearly interacts with the first issue.
>   

By forcing users to explicitly specify synchronizations in the code, the 
hope is to reduce confusion and maintenance costs.  It is hard to debug 
codes where synchronization happens by invisible magic.


> I am actually hoping that hardware implementations gradually favor the C
> ++/Java approach even more, since hardware architects finally have a
> reasonably well-defined language-level model that they can target, and
> that is likely to cover most of the market.  If that happens, that would
> be another reason for consistency among the language standards. 
>   

Hardware vendors have had the MPI, coarray, and UPC models for several 
years, and some have improved their hardware to optimize for these 
cases.  We hope that trend continues.

> 4) I'm not sure what, if anything, volatile means with respect to
> concurrent access by multiple images.  The C++ answer is that it has
> nothing to do with threads.  Variables that may be accessed and modified
> concurrently are dedlared as "atomic" variables, not "volatile".  There
> are very good reasons for that having to do with incompatibility with
> existing uses of "volatile".  But Java uses "volatile" for what became
> "atomic" in C++.
>   

Volatile variables in Fortran still come under the usual ordering 
rules.  Volatile is intended for references and definitions of local 
variables (which is backward compatible with the previous standard).  
Unsynchronized references or definitions of variables on a different 
image require the use of the atomic intrinsics.


> In all cases, I'd be happy to send pointers to further
> analysis/discussion.  We have been spending a lot of time on these and
> related issues.  There is an overview of the C++0x memory model and some
> of the issue underlying it in the PLDI 08 proceedings.
>   

Thanks for your interest.   To date we have mainly focused on 
coexistence with UPC, as that is an established and implemented 
language.   Coexistence with other variants of C is a high priority for 
many Fortran users, and hence also of significant interest for the 
Fortran standards committee.

Cheers,
Bill


> Hans
>
> _______________________________________________
> J3 mailing list
> J3@j3-fortran.org
> http://j3-fortran.org/mailman/listinfo/j3
>   

-- 
Bill Long                                   longb@cray.com
Fortran Technical Support    &              voice: 651-605-9024
Bioinformatics Software Development         fax:   651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120