From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org Sun Feb 22 21:50:33 2015 Return-Path: X-Original-To: sc22wg5-dom8 Delivered-To: sc22wg5-dom8@www.open-std.org Received: by www.open-std.org (Postfix, from userid 521) id 0D3663585D2; Sun, 22 Feb 2015 21:50:32 +0100 (CET) Delivered-To: sc22wg5@open-std.org Received: from ppsw-40.csi.cam.ac.uk (ppsw-40.csi.cam.ac.uk [131.111.8.140]) by www.open-std.org (Postfix) with ESMTP id BAE51356972 for ; Sun, 22 Feb 2015 21:50:29 +0100 (CET) X-Cam-AntiVirus: no malware found X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/ Received: from hermes-1.csi.cam.ac.uk ([131.111.8.51]:43692) by ppsw-40.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.156]:25) with esmtpa (EXTERNAL:nmm1) id 1YPdTt-0003mY-k3 (Exim 4.82_3-c0e5623) (return-path ); Sun, 22 Feb 2015 20:50:25 +0000 Received: from prayer by hermes-1.csi.cam.ac.uk (hermes.cam.ac.uk) with local (PRAYER:nmm1) id 1YPdTt-00018b-6s (Exim 4.72) (return-path ); Sun, 22 Feb 2015 20:50:25 +0000 Received: from [80.189.148.101] by old-webmail.hermes.cam.ac.uk with HTTP (Prayer-1.3.5); 22 Feb 2015 20:50:25 +0000 Date: 22 Feb 2015 20:50:25 +0000 From: "N.M. Maclaren" To: Bill Long Cc: fortran standards email list for J3 , WG5 Subject: Re: [ukfortran] (SC22WG5.5454) (j3.2006) Response to TS ballot Message-ID: In-Reply-To: <20150221040848.B42B1358809@www.open-std.org> References: <20150219231019.EF7E83570D2@www.open-std.org> <20150220114055.4BF3C358556@www.open-std.org> <20150220165455.076C4358262@www.open-std.org> <20150220183834.1CBF735837E@www.open-std.org> <20150221040848.B42B1358809@www.open-std.org> X-Mailer: Prayer v1.3.5 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Sender: owner-sc22wg5@open-std.org Precedence: bulk On Feb 21 2015, Bill Long wrote: > > Sorry, Nick. I was mistaken in assuming that the "infeasible" task was > the compiler and runtime work. I'm glad that we have a demonstration of > success in that area. Well, no, we don't. What we have is evidence that it is possible to provide error recovery with callback to user code on mainframe systems, at least for single image (but multi-tasking) systems. I was doing it for a single, non-optimising compiler for one system, and the only other projects that I know of (IBM CEL and DEC VMS) were for single systems, and may never have been completed. I have heard rumours that it has been done in other single language, single system cases, but investigation has never shown up any evidence. What I can also say, from subsequent experience, is that even the simpler task we were tackling would have been completely impossible on ANY modern mainstream system, let alone in a portable fashion, because of the lack of adequate system interfaces and tendency of the system calls to misbehave or hang, unpredictably, on encountering any hardware or underlying system failure. In particular, I can witness that the current mainstream networking interfaces have that defect, badly, including name lookup and TCP/IP. That is one of the reasons that MPI says that any continuation after it has detected an error is undefined. > Did you provide documentation with your project? (I'm guessing so.) Based > on that, any suggestions for our writing task would be most appreciated. No. I was planning to, but one of the things that I learnt was that it was too hard. In particular, any objects active when failure occurs (including those being passed as arguments but otherwise unused) can end up in a worse-than-undefined state. I can witness that IBM CEL was not planning to specify the state of objects, because I was a consultant on that project. I have a document that I wrote for WG14 proposing something for C, based on my implementation experience, but it would not be a great help to WG5. The main reason is that C's object model is vastly simpler than Fortran's, but it also needed the introduction of an exception recovery point (which, in Fortran terms, would be a SYNC ALL across all images, teams notwithstanding). The last is one of the reasons that I voted against being able to access images not in the current team, or in an active subteam. > In terms of "specifying the circumstances when recovery is possible", I > see two aspects. One is whether there are syntax and semantics specified > that allow the program to resume execution at a place that is not part of > the normal execution sequence. Second is whether the algorithm is such > that it is possible to restart. Our current position is that the second > is entirely the programmer's responsibility. We only supply help with the > first. Yes. And it is purely the first I am talking about. > The task of "specifying the state of the program following such recovery" > is more of a challenge. It depends on how the programmer is trying to > restart. If, for example, the program includes checkpoints, the desired > action after the END TEAM statement would be to branch to code that > restored state from the most recent checkpoint and resume execution just > after that checkpoint. The method we're supplying additional help for is > to branch back to the beginning of the current CHANGE TEAM construct and > re-execute it. There are a lot of potential states for objects in the > program. Paper 15-138 is an attempt at describing what happens. > Suggestions for improvement are welcome. Don't start from here. I don't believe that it can be done, and all evidence from the past half century is that it is too hard for the best experts, and possibly flatly impossible. Regards, Nick.