From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org  Sat Jul 27 17:20:53 2019
Return-Path: <owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org>
X-Original-To: sc22wg5-dom8
Delivered-To: sc22wg5-dom8@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id A5088358C11; Sat, 27 Jul 2019 17:20:53 +0200 (CEST)
Delivered-To: sc22wg5@open-std.org
Received: from ppa02.jpl.nasa.gov (ppa02.jpl.nasa.gov [128.149.137.113])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by www.open-std.org (Postfix) with ESMTP id 20A56356840
	for <sc22wg5@open-std.org>; Sat, 27 Jul 2019 17:20:52 +0200 (CEST)
Received: from pps.filterd (ppa02.jpl.nasa.gov [127.0.0.1])
	by ppa02.jpl.nasa.gov (8.16.0.27/8.16.0.27) with SMTP id x6RFBgg9106793
	for <sc22wg5@open-std.org>; Sat, 27 Jul 2019 08:20:50 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jpl.nasa.gov; h=subject : from :
 reply-to : to : in-reply-to : references : content-type : date :
 message-id : mime-version : content-transfer-encoding; s=InSight1906;
 bh=ZY8r6kjzwjwGuqF1i881CuI1dErhExlvVxz/wmUboF8=;
 b=GcH0PuD6Faz/dJHJBiHCAaAMfn251rJs+C0qDJgQgRLRQ7RVNuc87xqAzaaFWDY9iVwc
 XPZfdfJBtKeaSPvcit0XwCfojYX13WCTOwkviIjUNdbfiNEq9spAyb4Uasiys9+uj2P4
 nnNNk891OJ5YrMcqo+Ey3HNT6lwaEbJYfti7fSdiMTBzCxtzAsw7UEosFMANomv24mKH
 VfQWpWkkTVMsbpRwm16YybLhgNKttRyicmM8g/R6RBcieiXzgwiW5m9R6wJfGkMi60OG
 rH2wyLp6NgrNky+3PnYtzp4Dtteljj1oQu9YKlxAoPMUGiRv2CW8xo8ByNGJZmVkh5H0 tQ== 
Received: from mail.jpl.nasa.gov (altphysenclup02.jpl.nasa.gov [128.149.137.53])
	by ppa02.jpl.nasa.gov with ESMTP id 2u0nstgfbc-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT)
	for <sc22wg5@open-std.org>; Sat, 27 Jul 2019 08:20:49 -0700
Received: from [192.168.1.81] (97-93-121-138.static.mtpk.ca.charter.com [97.93.121.138])
	(authenticated (0 bits))
	by smtp.jpl.nasa.gov (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id x6RFKmHm008028
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128 bits) verified NO)
	for <sc22wg5@open-std.org>; Sat, 27 Jul 2019 08:20:49 -0700
Subject: Re: Two things from IFIP WG 2.5 meeting
From: Van Snyder <van.snyder@jpl.nasa.gov>
Reply-To: van.snyder@jpl.nasa.gov
To: sc22wg5 <sc22wg5@open-std.org>
In-Reply-To: <Prayer.1.3.5.1907271035310.18541@hermes-1.csi.cam.ac.uk>
References: <20190726015634.E052D35860C@www.open-std.org>
	 <Prayer.1.3.5.1907261324060.24019@hermes-1.csi.cam.ac.uk>
	 <20190726172038.9A3DD358B76@www.open-std.org>
	 <Prayer.1.3.5.1907271035310.18541@hermes-1.csi.cam.ac.uk>
Content-Type: text/plain; charset="UTF-8"
Organization: Jet Propulsion Laboratory
Date: Sat, 27 Jul 2019 08:20:48 -0700
Message-ID: <1564240848.27421.261.camel@vanlap.vsnyder>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.3 (2.32.3-37.el6) 
Content-Transfer-Encoding: 7bit
X-Source-IP: 97-93-121-138.static.mtpk.ca.charter.com [97.93.121.138]
X-Source-Sender: van.snyder@jpl.nasa.gov
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-07-27_12:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501
 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0
 mlxlogscore=837 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.0.1-1906280000 definitions=main-1907270192
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

On Sat, 2019-07-27 at 10:35 +0100, N.M. Maclaren wrote:
> On Jul 26 2019, Van Snyder wrote:
> >>For binary128, 49,536 bits would be required, but it was not requested
> >for IEEE 1788.
> 
> That's what I meant. Sorry about being confusing. Yes, I agree that it's 
> affordable to implement under many circumstances, but a standard should NOT 
> consider just the easy circumstances. And a specification that is feasible 
> for 64 bits but not 128 bits is dubiously within the spirit of Fortran.

Specification that a DOT_PRODUCT produce a correctly-rounded result does
not depend upon the kind of the arguments.  If a processor has a super
accumulator that only works for binary64 (or binary32), it could use it
for those precisions, and use a software method otherwise.  The
processor could detect at run time whether the CPU (or a coprocessor)
provides a super accumulator, and use the appropriate method.

> As a slight aside, there are other approaches that give most of the benefit
> for much less cost.  One is (true) probabilistic rounding during the
> accumulation, which may be heretical but has a lot of technical advantages.

The super accumulator is the fastest way possible.  Probabilistic
rounding cannot beat it.  "Most of the benefit" isn't good enough.  The
Cray-I "approximate inverse" only got 12 correct bits.  That's "most of
the benefit,", right?  So why did the Cray-I compiler ALWAYS follow it
up with two Newton iterations to get 48 correct bits?

Using a floating-point method, correct rounding of each multiply-add is
not sufficient, probabalistic or otherwise.  The defect needs to be
accumulated.  The XBLAS dot product uses extended precision accumulation
(quad or double quad), but this limits the maximum length of vectors for
which a correct result is guaranteed.  Siegfried Rump's method is 40%
faster anyway.

> The relevance is the question of whether it is reasonable to exclude other
> approaches by mandating one particular one?

The standard should not mandate a method -- only a result.

If a super accumulator is available for the kind of the arguments, a
processor could use it, and use a software algorithm otherwise.  A super
accumulator with enough bits for binary64 could work for binary128, if
it has a wide enough arithmetic unit, provided the vectors are short
enough or overflow does not occur.  If overflow occurs, one could start
over with a software algorithm.

But all this is below the purview of the standard.  The standard should
say nothing more, in normative text, than "produce a correctly-rounded
dot product."

> >If you want papers that describe the problem, I can get them.  Siegfried
> >Rump gave very nice demonstrations of the problem at the WG 2.5 meeting
> >in Sydney last year, and at ICIAM in Valencia last week (and many other
> >times when I was not in attendance).
> 
> I should be interested if you have any that demonstrate it is a critical
> issue in realistic analyses.  All of the ones I have seen were implausible,
> to put it mildly.  Yes, some were of academic interest, but I never saw a
> case that this makes the difference between a complete, practical analysis
> being valid and not being.  As I tried to say, being perfectionist in one
> respect rarely helps with complete problems, which are usual complex.

I recall descriptions of neural network training failing to converge in
even 100,000 iterations without correct linear algebra.  Iterative
refinement was tried, but made only a small dent, because of poor
conditioning of the dot product.  It finally worked when an accurate dot
product was used.  IIRC, this was described in a paper by Victor
Pereyra, in connection with variable projection and separable nonlinear
least-squares problems.  I've asked correspondents for more examples.

> >This is a red herring.  We are discussing the Fortran standard, after
> >all, not the C or C++ or 754 standard.  BTW, Kahan did discuss the
> >80-bit vs. 64-bit problem in the paper attached to the original message.
> 
> Regrettably, it is not, as I can witness from way back when....

The "red herring" was referring to defects in C or C++ in a discussion
of the Fortran standard.

> You will remember when I raised the issue of what IEEE 754 said about the
> evaluation of expressions, and asked whether Fortran should honour that.
> The vast majority of the meeting was adamantly against it.  To enable such
> code to be conforming, reliable and portable, Fortran would have at least
> to provide a mechanism for such an evaluation mode.  C does, in theory.

At some time during the last thirty years, maybe more than once, I
proposed a STRICT block.  This was included in Ada 83, because it was in
the requirements specifications from a very early stage.  The vast
majority of the meeting (or correspondents, I don't recall which), were
against it.

The SGI priorities -- get it out, get it fast, get it right -- permeated
the Fortran committee even in 1986.  You might recall that SGI sometimes
got through the second step, but rarely the third.

> 
> Regards,
> Nick Maclaren.
>