From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org  Tue May 14 22:12:52 2013
Return-Path: <owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org>
X-Original-To: sc22wg5-dom8
Delivered-To: sc22wg5-dom8@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id 46CBD356E85; Tue, 14 May 2013 22:12:52 +0200 (CEST)
Delivered-To: sc22wg5@open-std.org
Received: from exprod6og123.obsmtp.com (exprod6og123.obsmtp.com [64.18.1.241])
	by www.open-std.org (Postfix) with ESMTP id AE916356E4B
	for <sc22wg5@open-std.org>; Tue, 14 May 2013 22:12:32 +0200 (CEST)
Received: from CFWEX01.americas.cray.com ([136.162.34.11]) (using TLSv1) by exprod6ob123.postini.com ([64.18.5.12]) with SMTP
	ID DSNKUZKakKETYhSgTH3VAtDhOFMA8I7M6j7o@postini.com; Tue, 14 May 2013 13:12:51 PDT
Received: from fortran.us.cray.com (172.31.19.200) by
 CFWEX01.americas.cray.com (172.30.88.25) with Microsoft SMTP Server id
 14.2.342.3; Tue, 14 May 2013 15:08:53 -0500
Message-ID: <51929A9A.1090901@cray.com>
Date: Tue, 14 May 2013 15:12:10 -0500
From: Bill Long <longb@cray.com>
Reply-To: <longb@cray.com>
Organization: Cray Inc.
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: "N.M. Maclaren" <nmm1@cam.ac.uk>
CC: <sc22wg5@open-std.org>, Mark Batty <mbatty@cantab.net>, "Lionel, Steve"
	<steve.lionel@intel.com>, Lorri Menard <lorri.menard@intel.com>, Daniel C
 Chen <cdchen@ca.ibm.com>
Subject: Re: Existing support for uses of atomics in Fortran coarray codes
References: <Prayer.1.3.5.1305141957470.21184@hermes-2.csi.cam.ac.uk>
In-Reply-To: <Prayer.1.3.5.1305141957470.21184@hermes-2.csi.cam.ac.uk>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-sc22wg5@open-std.org
Precedence: bulk



On 5/14/13 1:57 PM, N.M. Maclaren wrote:
> Mark Batty, Peter Sewell and I had a discussion about atomic semantics and
> Fortran this morning, but there were a couple of things that were rather
> important and I didn't have a feel for the answers.  Specifically, what
> semantics are provided by existing coarray atomics, and how they are used
> in real programs.  We are definitely going to have to decide what to say
> about this at Delft, and the problem isn't simple :-(
>
> In particular:
>
>     Do implementations guarantee coherence of access to a single atomic
> location and, if not, what do they guarantee?

As long as all of the accesses are done using atomic operations, there 
should be no problem. If the network atomics and the local processor 
atomics are not coherent (this depends on the hardware characteristics) 
then atomics to local memory locations need to bounce off the NIC if 
remote atomics are possible.
>
>     What, if anything, do they guarantee about the consistency of accesses
> to two different atomic locations?

None without explicit SYNC operations. Similar to the current atomic 
subroutines in Fortran.

>
> We know what POWER and x86 hardware guarantee, but have no idea of what
> (more? less?) is guaranteed in coarray implementations.  Even with using
> MPI as a basis, it could be anything from sequential consistency to
> nothing, depending on the details of the implementation.  And the fancy
> RDMA networks are another matter entirely!
>
> It would also be useful if there were some examples of how they are used
> for ordering (i.e. in combination with SYNC MEMORY) and running totals
> etc.  Specifically, any use where the consistency semantics matter to the
> program.

Two examples come to mind.

1) The famous (notorious ? ) Table Toy benchmark code, also known as the 
"Random Access" benchmark.  It involves a large distributed table of 
64-bit integers spread across many images with all of those images 
asynchronously replacing a randomly located table value with the XOR of 
the current table value with a local value.  The "standard" version of 
the code is several pages of MPI calls. Well beyond comprehension by 
normal humans.   The coarray version is a simple loop of about 10-15 
lines.  The loop has no explicit synchronizations.

2) By far the most common usage I've seen of atomics in coarray codes is 
for buffer filling.  Suppose you have a "receiving" buffer that is a 
coarray of globally know size on each image, and a separate coarray 
integer equal to  the subscript value of the next free element in the 
buffer array.   The goal is for other images to write data into buffers 
on remote images.  The process is simple: If I want to write N elements 
into the buffer in image T, I do a "fetch and add" atomic  of N on the 
buffer subscript on image T.  The returned value is the old starting 
point in the buffer on that image. If that value + N is still within the 
buffer, do the assignment buffer(old_val:old_val+N-1)[T] = mydata(1:N). 
Several images can be "attacking" image T asynchronously and each gets a 
non-overlapping part of the buffer as the target of the assignment. 
Basically no synchronization involved.   This code sequence is, 
effectively, the guts of the "all-to-all" memory rearrangement that 
seems to pop up in multiple codes.  In  practice it is about twice as 
fast as the standard-distribution MPI_Alltoall routine for the same 
operation. And it has the significant advantage that images can send the 
data to remote locations as soon as it is ready, rather than waiting for 
a global sync (as would be the case with the MPI call). This allows some 
images to be sending data across the network while others are computing. 
And if the implementation does the sends as non-blocking operations 
(until the next image control statement) there is also overlap of local 
computation and communication. Besides all-to-all, the scheme  can also 
be used as a way to add items to a remote work queue without having to 
deal with the overhead of locks.   In my experience, this is the "killer 
app" for coarray atomics.

Cheers,
Bill


>
> Any feedback appreciated.  Thanks.
>
>
> Regards,
> Nick Maclaren.
>
>
>

-- 
Bill Long                                           longb@cray.com
Fortran Technical Support    &                 voice: 651-605-9024
Bioinformatics Software Development            fax:   651-605-9142
Cray Inc./Cray Plaza, Suite 210/380 Jackson St./St. Paul, MN 55101


