From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org Wed Apr 16 00:32:09 2014 Return-Path: X-Original-To: sc22wg5-dom8 Delivered-To: sc22wg5-dom8@www.open-std.org Received: by www.open-std.org (Postfix, from userid 521) id 40B3A35706C; Wed, 16 Apr 2014 00:32:09 +0200 (CEST) Delivered-To: sc22wg5@open-std.org Received: from out.ipsmtp3nec.opaltelecom.net (out.ipsmtp3nec.opaltelecom.net [62.24.202.75]) by www.open-std.org (Postfix) with ESMTP id 97785356D97 for ; Wed, 16 Apr 2014 00:32:04 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAICyTVMCYXhq/2dsb2JhbAANTch+g3MSEz0WGAMCAQIBSw0IArBwnUWFfxeNd4UrBJBjgTaGSZV3gXE X-IPAS-Result: ApMBAICyTVMCYXhq/2dsb2JhbAANTch+g3MSEz0WGAMCAQIBSw0IArBwnUWFfxeNd4UrBJBjgTaGSZV3gXE X-IronPort-AV: E=Sophos;i="4.97,867,1389744000"; d="txt'?scan'208";a="122765558" Received: from host-2-97-120-106.as13285.net (HELO [192.168.1.2]) ([2.97.120.106]) by out.ipsmtp3nec.opaltelecom.net with ESMTP; 15 Apr 2014 23:32:03 +0100 Message-ID: <534DB35F.7090905@stfc.ac.uk> Date: Tue, 15 Apr 2014 23:31:59 +0100 From: John Reid User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0 SeaMonkey/2.25 MIME-Version: 1.0 To: WG5 Subject: Vote on draft TS Content-Type: multipart/mixed; boundary="------------080806000205000005080000" Sender: owner-sc22wg5@open-std.org Precedence: bulk This is a multi-part message in MIME format. --------------080806000205000005080000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit --------------080806000205000005080000 Content-Type: text/plain; charset=windows-1252; name="JKRvote.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="JKRvote.txt" Please answer the following question "Is N2007 ready for forwarding to SC22 as the DTS?" in one of these ways. 2) Yes, but I recommend the following changes. 1. Failed images There has not been adequate discussion of an image stalling because of the failure of another image, which will cause it to fail despite there being nothing wrong with its hardware. All we have is a warning in NOTE 5.7. If restarting is planned, as in the example A.1.2, such images should be available for reuse. Such stalling anyway indicates that the team calculation has gone wrong, so I suggest addding a RENDEZVOUS specifier to the SYNC ALL statement. If a stalled image is detected, all images of the team would re-execute the most recently executed RENDEZVOUS FOR ALL statement with STAT= value set to STAT_FAILED_IMAGE, ignoring all pending synchronizations initiated since that FOR ALL was last executed. Alternatively, all images of the team could exit the change team construct. That would give the required effect in A.1.2 without any code change. The programmer who wishes to guard against stalling because of accessing an image that has failed may do do as follows: IF(ALL(FAILED_IMAGES()/=i)) THEN a = a[i] ELSE ! Do something else END IF This is illustrated in my rewrite of A.2.1 below. I think we may need a logical function that tests a given image for failure. 2. Teams I am unhappy with the idea a code using an image selector for ancestor team not visible in the scope. For example, a library code should leave such work to the calling code. We have chosen to address an ancestor in an image selector by using its team variable name. We should be doing the same for other references to an ancestor. An ancestor team may be at different team distances on executing images of different descendant teams. I suggest the replacement of DISTANCE by TEAM as arguments of TEAM_ID, NUM_IMAGES, and THIS_IMAGE, the removal of DISTANCE as an argument of GET_TEAM, and the removal of TEAM_DEPTH. 3. Edits [10:27] Change "Apart from its final upper bound, its" to "Its". [There is nothing special about the codimension-decl here, so there is no need to say anything about the final codimension.] [10:30] Change "established." to "established, apart from its final upper cobound". [Here, the final upper cobound is likely to be different.] [13:10-13] Replace by "The value of the default integer scalar constant STAT_FAILED_IMAGE is positive and different from the value of STAT_STOPPED_IMAGE, STAT_LOCKED, STAT_LOCKED_OTHER_IMAGE, or STAT_UNLOCKED. If the processor detects that an image of the current team has failed, the". [With the addition of FAIL IMAGE, we want a processor that cannot detect a truly failed image to respond to the execution of FAIL IMAGE.] [15:6-24] Should an event variable be atomic? [15:34] Wording like that at [17:12-14] is needed here. [23:26] EVENT_QUERY should be an atomic subroutine. [40:4, 40:25-32] To make the code tolerate failing spare images, replace the DO loop by the following: k = images_used DO i = 1, size(failed_img) IF (failed_img(i) == 1) ERROR STOP 'cannot recover' DO k = k+1, num_images() IF (all(failed_img(:)/=k) EXIT END DO IF (me == k) THEN me = failed_img(i) id = 1 EXIT END IF END DO If this is done, [40:4] should be deleted. [43:6-44:34] This example has bugs. I think the message would be clearer if this example were replaced by an example that is not tolerant to failed images, followed by a modification that is. Here is a draft. A.2.1 EVENT_QUERY example The following example illustrates the use of events via a program in which image 1 acts as master and shares out work items to the other images. Only one work item at a time can be active on a worker image, and each deals with the result (e.g. via I/O) without directly feeding data back to the master image. Because the work items are not expected to be balanced, the master keeps cycling through all the images to find one that is waiting for work. An event is posted by each worker to indicate that it has completed its work item. Since the corresponding variables are needed only on the master, we place them in an allocatable array component of a coarray. An event on each worker is needed for the master to post the fact that it has made a work item available for it. PROGRAM work_share USE, INTRINSIC :: iso_fortran_env USE :: mod_work, ONLY: & ! Module that creates work items work, & ! Type for holding a work item create_work_item, & ! Function that creates work item process_item, & ! Function that processes an item work_done ! Logical function that returns true if all work done TYPE(event_type) :: submit[*] ! Whether work ready for a worker TYPE :: asymmetric_event TYPE(event_type), ALLOCATABLE :: event(:) END TYPE TYPE(asymmetric_event) :: free[*] ! Whether worker is free TYPE(work) :: work_item[*] ! Holds all the data for a work item INTEGER :: count, i, n, nbusy[*] IF (this_image() == 1) THEN ! Get started ALLOCATE(free%event(2:num_images())) nbusy = 0 ! This holds the number of workers working DO i = 2, num_images() ! Start the workers working IF (work_done()) EXIT nbusy = nbusy + 1 work_item[i] = create_work_item() EVENT POST (submit[i]) END DO ! Main work distribution loop master : DO image : DO i = 2, num_images() CALL EVENT_QUERY(free%event(i), count) IF (count == 0) CYCLE ! Worker is not free EVENT WAIT (free%event(i)); nbusy = nbusy - 1 IF (work_done()) CYCLE nbusy = nbusy + 1 work_item[i] = create_work_item() EVENT POST (submit[i]) END DO image IF ( nbusy==0 ) THEN ! All done. Exit on all images. DO i = 2, num_images() EVENT POST (submit[i]) END DO EXIT master END IF END DO master ELSE ! Work processing loop worker : DO EVENT WAIT (submit) IF (nbusy[1] == 0) EXIT CALL process_item(work_item) EVENT POST (free[1]%event(this_image())) END DO worker END IF END PROGRAM work_share A.2.1a EVENT_QUERY example that tolerates image failure This example is an adaptation of the example of A.2.1 to make it able to execute in the presence of the failure of one or more of the worker images. The function create_work_item now accepts an integer argument to indicate which work item is required. It is assumed that the work items are indexed 1, 2, ... . It is also assumed that if an image fails while processing a work item, that work item can subsequently be processed by another image. The internal subroutine failed tests whether a particular image has failed. PROGRAM work_share USE, INTRINSIC :: iso_fortran_env USE :: mod_work, ONLY: & ! Module that creates work items work, & ! Type for holding a work item create_work_item, & ! Function that creates work item process_item, & ! Function that processes an item work_done ! Logical function that returns true if all work done TYPE(event_type) :: submit[*] ! Whether work ready for a worker TYPE :: asymmetric_event TYPE(event_type), ALLOCATABLE :: event(:) END TYPE TYPE(asymmetric_event) :: free[*] ! Whether worker is free TYPE(work) :: work_item[*] ! Holds all the data for a work item INTEGER :: count, i, k, kk, n, nbusy[*], np, status INTEGER, ALLOCATABLE :: working(:) ! Items being worked on INTEGER, ALLOCATABLE :: pending(:) ! Items pending after image failure IF (this_image() == 1) THEN ! Get started ALLOCATE(free%event(2:num_images())) ALLOCATE(working(2:num_images()), pending(num_images()-1)) nbusy = 0 ! This holds the number of workers working k = 1 ! Index of next work item np = 0 ! Number of work items in array pending DO i = 2, num_images() ! Start the workers working IF (work_done()) EXIT work_item[i] = create_work_item(k) working(i) = k k = k + 1 nbusy = nbusy + 1 EVENT POST (submit[i], STAT=status) IF (status==STAT_FAILED_IMAGE) THEN working(i) = 0 k = k - 1 nbusy = nbusy - 1 END IF END DO ! Main work distribution loop master : DO image : DO i = 2, num_images() IF (ANY(FAILED_IMAGES()==i)) THEN ! Image has failed IF (working(i)>0) THEN ! It failed while working np = np + 1 pending(np) = working(i) working(i) = 0 END IF CYCLE image END IF CALL EVENT_QUERY(free%event(i), count) IF (count == 0) CYCLE image ! Worker is not free EVENT WAIT (free%event(i)) nbusy = nbusy - 1 IF (np>0) THEN kk = pending(np) np = np - 1 ELSE IF (work_done()) CYCLE image kk = k k = k + 1 END IF nbusy = nbusy + 1 working(i) = kk work_item[i] = create_work_item(kk) EVENT POST (submit[i],STAT=status) ! If image i has failed, this will not hang and the failure ! will be handled on the next iteration of the loop END DO image IF ( nbusy==0 ) THEN ! All done. Exit on all images. DO i = 2, num_images() EVENT POST (submit[i],STAT=status) IF (status==STAT_FAILED_IMAGE) CYCLE END DO EXIT master END IF END DO master ELSE ! Work processing loop worker : DO EVENT WAIT (submit) IF (nbusy[1] == 0) EXIT worker CALL process_item(work_item) EVENT POST (free[1]%event(this_image())) END DO worker END IF END PROGRAM work_share --------------080806000205000005080000--