From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org Sat Mar 30 12:30:38 2013 Return-Path: X-Original-To: sc22wg5-dom8 Delivered-To: sc22wg5-dom8@www.open-std.org Received: by www.open-std.org (Postfix, from userid 521) id 459FC356DD7; Sat, 30 Mar 2013 12:30:38 +0100 (CET) Delivered-To: sc22wg5@open-std.org Received: from ppsw-42.csi.cam.ac.uk (ppsw-42.csi.cam.ac.uk [131.111.8.142]) by www.open-std.org (Postfix) with ESMTP id 913FA3569ED for ; Sat, 30 Mar 2013 12:30:36 +0100 (CET) X-Cam-AntiVirus: no malware found X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/ Received: from hermes-1.csi.cam.ac.uk ([131.111.8.51]:57096) by ppsw-42.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.159]:25) with esmtpa (EXTERNAL:nmm1) id 1ULtzV-0003ai-9R (Exim 4.80_167-5a66dd3) (return-path ); Sat, 30 Mar 2013 11:30:33 +0000 Received: from prayer by hermes-1.csi.cam.ac.uk (hermes.cam.ac.uk) with local (PRAYER:nmm1) id 1ULtzV-00077c-T2 (Exim 4.72) (return-path ); Sat, 30 Mar 2013 11:30:33 +0000 Received: from [87.115.144.83] by webmail.hermes.cam.ac.uk with HTTP (Prayer-1.3.5); 30 Mar 2013 11:30:33 +0000 Date: 30 Mar 2013 11:30:33 +0000 From: "N.M. Maclaren" To: "Bader, Reinhold" Cc: "Van.Snyder@jpl.nasa.gov" , fortran standards email list for J3 , sc22wg5 Subject: Re: AW: [ukfortran] (SC22WG5.4944) AW: (j3.2006) Thoughts on Reinhold's thoughts Message-ID: In-Reply-To: <166ED263DF83324D9A3BA67FB6772B2B59F2B4E6@BADWLRZ-SWMBX11.ads.mwn.de> References: <20130329203623.0D66F356D96@www.open-std.org> <20130329212446.AD585356D97@www.open-std.org> <166ED263DF83324D9A3BA67FB6772B2B59F2B4E6@BADWLRZ-SWMBX11.ads.mwn.de> X-Mailer: Prayer v1.3.5 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Sender: owner-sc22wg5@open-std.org Precedence: bulk On Mar 30 2013, Bader, Reinhold wrote: >> >> 'Outside' images could push data into the subteam, but it could not use >> it, because there is no way to synchronise. > > My data_feeder example effectively uses double buffering and separates > buffer exchange (a pure memory operation) via a partial synchronization > operation in the ancestor team. I don't see that it helps, unfortunately. At this level, it doesn't make much difference whether an operation is a pure memory one or involves computation. >> Well, actually, there is, >> but I am in two minds about whether it is a facility or a loophole that >> needs closing. >> >> If 'outside' synchronises with 'inside' using consistent atomics and >> SYNC MEMORY, should that be legal? > > If coindexing ancestor-inherited coarrays is possible. I believe it > shouldn't be. Now I am puzzled. That is almost exactly the case you are trying to tackle in data_feeder. I didn't mean data transfer via consistent atomics, but the ordering. The data transfer would be via coindexing. As far as I can see, my suggestion provides precisely the facility you want. My concern is that the implementation needs to assume that ordinary coarrays can change value in the same way that volatile variables can (i.e. they can be modified by actions in other universes). For example, SYNC MEMORY has to assume that any untouched coarrays or sections of coarrays may have changed, which is NOT good for performance! On the other hand, I don't see that as catastrophic, given that Fortran's PGAS model is essentially identical to MPI's passive one-sided and, to some extent, we already have that issue. What it does mean is that the implementation CANNOT simply generate fast, untracked access to coarrays and put all of the synchronisation in SYNC MEMORY. That won't fly. Let's consider a push-driven model. At the very least, an implementation will have to implement release in a SYNC MEMORY by handshaking with all images on which the invoking image has updated data, ensuring that they know what has changed AND have taken a safe copy. Worse, that includes all LOCAL coarray data. This is because it will have to implement acquire in a SYNC MEMORY by ensuring that all data updated on its image is merged in, and it has flushed all cached copies of data held on other images. The effect of this is that a SYNC ALL on a team is likely to be as expensive as a SYNC ALL on the whole image space, irrespective of the size of the team. That's not pretty, and risks implementations taking short cuts and getting it wrong. Naturally, that doesn't necessarily apply to systems with hardware RDMA support, because the data synchronisation is automatic. But it applies to anything based on Ethernet, OpenIB or any other asynchronous message passing interface. Regards, Nick Maclaren.