From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org Sun Aug 11 19:58:17 2013 Return-Path: X-Original-To: sc22wg5-dom8 Delivered-To: sc22wg5-dom8@www.open-std.org Received: by www.open-std.org (Postfix, from userid 521) id D560E35718A; Sun, 11 Aug 2013 19:58:16 +0200 (CEST) Delivered-To: sc22wg5@open-std.org Received: from mk-filter-3-a-1.mail.uk.tiscali.com (mk-filter-3-a-1.mail.tiscali.co.uk [212.74.100.54]) by www.open-std.org (Postfix) with ESMTP id 73484357112 for ; Sun, 11 Aug 2013 19:57:55 +0200 (CEST) X-Trace: 1049364488/mk-filter-3.mail.uk.tiscali.com/B2C/$THROTTLED_STATIC/TalkTalk_Customer/2.101.16.250/None/John.Reid@stfc.ac.uk X-SBRS: None X-RemoteIP: 2.101.16.250 X-IP-MAIL-FROM: John.Reid@stfc.ac.uk X-SMTP-AUTH: X-Originating-Country: XX/UNKNOWN X-MUA: Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20 X-IP-BHB: Once X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBANPPB1ICZRD6/2dsb2JhbAANTYx7tyCDciU9FhgDAgECAUsNCAKsE4k6iAeUUwOQFoEuBIYchhWOWA X-IPAS-Result: ApMBANPPB1ICZRD6/2dsb2JhbAANTYx7tyCDciU9FhgDAgECAUsNCAKsE4k6iAeUUwOQFoEuBIYchhWOWA X-IronPort-AV: E=Sophos;i="4.89,857,1367967600"; d="scan'208";a="1049364488" Received: from host-2-101-16-250.as13285.net (HELO [127.0.0.1]) ([2.101.16.250]) by smtp.tiscali.co.uk with ESMTP; 11 Aug 2013 18:57:55 +0100 Message-ID: <5207D173.8050001@stfc.ac.uk> Date: Sun, 11 Aug 2013 19:01:23 +0100 From: John Reid User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0 SeaMonkey/2.20 MIME-Version: 1.0 To: WG5 Subject: Ballot on N1983 Content-Type: multipart/mixed; boundary="------------050401000202090705000808" Sender: owner-sc22wg5@open-std.org Precedence: bulk This is a multi-part message in MIME format. --------------050401000202090705000808 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit --------------050401000202090705000808 Content-Type: text/plain; charset=windows-1252; name="JKRvote" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="JKRvote" Please answer the following question "Is N1983 ready for forwarding to SC22 as the DTS?" in one of these ways. 3) No, for the following reasons. 1. Reasons for no vote 1.1 Continued execution in the presence of failed images It is not clear how execution is intended to continue in the presence of failed images. For most calculations, the failure of one image leads to the failure of the whole calculation. To recover from this, the program probably needs to revert to a previous "check point" and continue the execution from there using images that have not failed. One possibility is to not to use all the images for the calculation but keep a few "spares". Execution is within a CHANGE TEAM construct, with spares in a separate team and idle. When an image fails, the CHANGE TEAM construct is left, a new team is formed by substituting a spare for the failed image, the check point data are recovered, and the CHANGE TEAM construct is re-entered. This avoids any need in the main code for remapping of data - it only has to detect failed images and exit the construct if there are any. Some calculations are "massively parallel". Most of the work is done completely independently on separate images. Perhaps one image acts as "master" handing out tasks and collecting results. As long as the master does not fail, the calculation can continue happily with failed images. The master sends the work that it gave to a failed image to the next image that is free. I will assume that we want to cater for both situations. Even in the first case, the parent team needs to execute in a team that has failed images while it forms the new team and recovers the check-point data. The collective procedures are not massively parallel. They should surely fail if any of the images of the team have failed. However, the last paragraph of page 15 says "If an image has failed, but no other error condition occurred, the argument is assigned the value STAT_FAILED_IMAGE.". If this behaviour is retained, the effect of failed images on the result needs to be described. The effect of SYNC ALL and SYNC TEAM in the presence of failed images should be to synchronize the images that have not failed. This should be stated. I am not sure about SYNC IMAGES. The FORM SUBTEAM statement should work in the presence of failed images. I am inclined to think that if a subteam has a nonfailed image, all its images should be nonfailed. The CHANGE TEAM statement should work in the presence of failed images. The ALLOCATE and DEALLOCATE statements should work in the presence of failed images. For locks and events, at most one other image is involved and its failure must be regarded as an error. 1.2 Cosubscripts of arrays declared in ancestor teams New syntax (R624) was added during the Delft meeting to allow a coarray to be addressed by the cosubscipts of a team other than the current team. It is not restricted to be the current team or an ancestor, but I think that was the intention. Because there is no means of specifying the mapping between cosubscripts when teams change, the new syntax should be restricted to refer to the team in which the coarray was declared. Alternatively, a mechanism for specifying the mapping should be added. I suggest that it should be as for the association of a dummy coarray with the corresponding actual coarray. A possibility is the statement new cosubscripts () where is [] 2. Other comments 2.1 FORM SUBTEAM What happens if NEW_INDEX is absent? Is the mapping from parent image index to child image index processor dependent? Or is THIS_IMAGE(DISTANCE=1) monotic increasing? 2.2 REDUCE For CO_MAX, CO_MIN, and CO_SUM, there is a corresponding transformational function so it is easy to write code for the common case where the max, min, or sum of all the elements of the arrays on all the images is wanted. We need to add REDUCE to play the same role for CO_REDUCE. 2.3. Error condition for a collective The last para. of 15, says that if an error condition occurs and STAT is present, the effect is as if SYNC MEMORY were executed. This seems wrong because RESULT has intent out so one expects it to become undefined. Do we expect SOURCE to be used by the implementation for workspace when RESULT is absent? If so, an error condition should cause SOURCE to become undefined. 2.4 Examples More examples are needed, particularly of continued execution in the presence of failed images. 3. Edits [15:22] After "the beginning" add "to the end". --------------050401000202090705000808--