From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org  Fri Jul 26 14:41:01 2019
Return-Path: <owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org>
X-Original-To: sc22wg5-dom8
Delivered-To: sc22wg5-dom8@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id EDA1C358B6E; Fri, 26 Jul 2019 14:41:00 +0200 (CEST)
Delivered-To: sc22wg5@open-std.org
X-Greylist: delayed 1012 seconds by postgrey-1.34 at www5.open-std.org; Fri, 26 Jul 2019 14:41:00 CEST
Received: from ppsw-41.csi.cam.ac.uk (ppsw-41.csi.cam.ac.uk [131.111.8.141])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by www.open-std.org (Postfix) with ESMTP id B9B66356E7B
	for <sc22wg5@open-std.org>; Fri, 26 Jul 2019 14:41:00 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cam.ac.uk;
	 s=20180806.ppsw; h=Sender:Content-Type:Mime-Version:References:In-Reply-To:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-Transfer-Encoding:
	Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender:
	Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:
	List-Subscribe:List-Post:List-Owner:List-Archive;
	bh=S508uZK6EmU9kquYnYxNv+iKSMfmdsZZTvG+r4OECAE=; b=rSEhpHuH9XQMoCD2cflYn2QeDw
	tetRJ+1a/N6eyPLsrVs1bCBlztROOCelPL1bWzWiNRAmG7boYEPUrXq11pCpWXEaKhoujIpuz/a/2
	6bzREm/6opdIteOpMs4EajRzNZYVRnUU1n/6xsw8zHVluggR1hexcPdibiJ+/hYt3o5M=;
X-Cam-AntiVirus: no malware found
X-Cam-ScannerInfo: http://help.uis.cam.ac.uk/email-scanner-virus
Received: from hermes-1.csi.cam.ac.uk ([131.111.8.51]:44136)
	by ppsw-41.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.159]:25)
	with esmtpa (EXTERNAL:nmm1) id 1hqzGY-001G6Y-RG (Exim 4.92.1)
	(return-path <nmm1@hermes.cam.ac.uk>); Fri, 26 Jul 2019 13:24:06 +0100
Received: from prayer by hermes-1.csi.cam.ac.uk (hermes.cam.ac.uk)
	with local (PRAYER:nmm1) id 1hqzGY-0001wt-EO (Exim 4.92)
	(return-path <nmm1@hermes.cam.ac.uk>); Fri, 26 Jul 2019 13:24:06 +0100
Received: from [51.9.71.185] by old-webmail.hermes.cam.ac.uk
	with HTTP (Prayer-1.3.5); 26 Jul 2019 13:24:06 +0100
Date: 26 Jul 2019 13:24:06 +0100
From: "N.M. Maclaren" <nmm1@cam.ac.uk>
To: Van.Snyder@jpl.nasa.gov
Cc: sc22wg5 <sc22wg5@open-std.org>
Subject: Re: [ukfortran] (SC22WG5.6113) Two things from IFIP WG 2.5 meeting
Message-ID: <Prayer.1.3.5.1907261324060.24019@hermes-1.csi.cam.ac.uk>
In-Reply-To: <20190726015634.E052D35860C@www.open-std.org>
References: <20190726015634.E052D35860C@www.open-std.org>
X-Mailer: Prayer v1.3.5
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset=ISO-8859-1
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

On Jul 26 2019, Van Snyder wrote:

>Two things that ought to be of interest to the Fortran standard were
>discussed at the recent meeting of IFIP Working Group 2.5 (Numerical
>Software).
>
>1. The dot product is an ubiquitous operation in mathematically-related
>problems, not just in library modules.  Floating-point arithmetic is not
>associative.  The dot product can be arbitrarily badly conditioned.  The
>numerical software community would like to have either an exact dot
>product or a correctly-rounded dot product.  Using a super-accumulator
>as originally described by Kulisch, the exact dot product can be
>computed as fast as operands can be fetched.  IBM implemented the exact
>dot product in hardware for the 370, in an option called ACRITH.
>Kulisch and his students implemented the exact dot product on a PCI
>card.  Demmel and his students implemented the exact dot product on a
>PCI card.  Both Kulisch and Demmel observed that the PCI card computed
>an exact dot product six times faster than a double-precision floating
>point method using a Pentium.

In order to do that for IEEE 754 extended precision, you need a huge
accumulation 'register' (c. 4,200 bytes, if I recall).  That's not a
problem in many cases (e.g. the hardware ones you mention) but becomes
one for many others (e.g. some highly parallel implementations).

Are you quite sure that the numerical software community really DOES
want that?  I agree that it wants something better than the obvious code
gives, but I very much doubt that more than a few people could make use
of so-called last-bit accuracy.  Even the people who want repeatability
can't have that without forbidding many important forms of optimisation
and any reasonable performance / numerical improvements in the less
tractable special functions and most forms of solver, not to say
forbidding many important non-deterministic methods.  And that's a
fundamental restriction of the basic mathematics.

>Computing a correctly-rounded dot product is possible using conventional
>floating point arithmetic, at a cost of up to six times the cost of the
>obvious method.  It is difficult for users to implement it because
>processors sometimes do not respect parentheses.  For example, a
>compensated-summation loop might be written

It's not that simple, unfortunately.  While almost all Fortran compilers
have at least an option to honour the standard together with optimisation,
surprisingly few C and C++ ones do in this area.  They do NOT just need to
honour parentheses, but to use the same arithmetic throughout; using
native Intel 80-bit for calculation and IEEE 754 for storage also knackers
that code.  I baulked at teaching the necessary hacks in my C++ course, and
I was notorious for not sparing students the sordid details.  I can post
my example code if you like.

Furthermore, I suspect that the factor of six might well be exceeded in
some cases (e.g. systems with very limited L1 cache and a particularly
foul data distribution and IEEE 128-bit).  There's no getting round that
4,200 byte requirement.


Regards,
Nick Maclaren.