From owner-sc22wg5@open-std.org  Thu Nov  6 10:29:46 2008
Return-Path: <owner-sc22wg5@open-std.org>
X-Original-To: sc22wg5-dom7
Delivered-To: sc22wg5-dom7@www2.open-std.org
Received: by www2.open-std.org (Postfix, from userid 521)
	id 19DBFCA5FE5; Thu,  6 Nov 2008 10:29:46 +0100 (CET)
X-Original-To: sc22wg5@open-std.org
Delivered-To: sc22wg5@open-std.org
Received: from ppsw-0.csi.cam.ac.uk (ppsw-0.csi.cam.ac.uk [131.111.8.130])
	by www2.open-std.org (Postfix) with ESMTP id 48994CA343A
	for <sc22wg5@open-std.org>; Thu,  6 Nov 2008 10:29:42 +0100 (CET)
X-Cam-AntiVirus: no malware found
X-Cam-SpamDetails: not scanned
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
Received: from hermes-1.csi.cam.ac.uk ([131.111.8.51]:33889)
	by ppsw-0.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.150]:25)
	with esmtpa (EXTERNAL:nmm1) id 1Ky1BO-0006Uv-06 (Exim 4.70) for sc22wg5@open-std.org
	(return-path <nmm1@hermes.cam.ac.uk>); Thu, 06 Nov 2008 09:29:42 +0000
Received: from prayer by hermes-1.csi.cam.ac.uk (hermes.cam.ac.uk)
	with local (PRAYER:nmm1) id 1Ky1BO-0001cA-0g (Exim 4.67) for sc22wg5@open-std.org
	(return-path <nmm1@hermes.cam.ac.uk>); Thu, 06 Nov 2008 09:29:42 +0000
Received: from [83.67.89.123] by webmail.hermes.cam.ac.uk
	with HTTP (Prayer-1.3.1); 06 Nov 2008 09:29:41 +0000
Date: 06 Nov 2008 09:29:41 +0000
From: "N.M. Maclaren" <nmm1@cam.ac.uk>
To: sc22wg5@open-std.org
Subject: Re: [ukfortran] (SC22WG5.3627) N1751
Message-ID: <Prayer.1.3.1.0811060929410.8016@hermes-1.csi.cam.ac.uk>
In-Reply-To: <20081106010411.78412CA3434@www2.open-std.org>
References: <20081106010411.78412CA3434@www2.open-std.org>
X-Mailer: Prayer v1.3.1
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

On Nov 6 2008, Michael Ingrassia wrote:

>There's a premiss in N1751.txt that I'm not sure I believe, although
>I may be reading it wrong.
>
>>Segments Pi and Qj are unordered, and it is therefore permitted for a
>>processor to execute segment Qj to completion before starting segment
>Pi.
>
> I accept that in theory, there is no argument based solely on segment=20
> ordering to disallow a processor to execute segment Qj to completion=20
> before starting segment Pi.
>
>But do we actually say that a processor is permitted to predicate
>the start of segment Pi on the completion of a different segment?=20

You're reading something into N1751 that I didn't say, imply or mean.
The issue is not about a processor adding new ordering constraints, but
about whether it is conforming to ALLOW that ordering to happen.

>And if not, then doesn't the principle of Get On With It
>(or whatever you might call it) say that the processor has to=20
>eventually start Pi?

Can you point me at anything in the standard that states that principle?

More seriously, why do you think that the processor can (let alone should)
decide that the best strategy is to stop doing what it thinks is useful
work in order to achieve fairness between images?  Knowing when to do that
and when not to is equivalent to solving the Halting Problem ....

>After all, we don't worry that the PRINT will
>never happen in
>
>=09R =3D SIN(3.04789232)
>=09PRINT *, R
>
>because the processor might never get around to advancing the program
>counter.

That isn't comparable.  That isn't reasonable processor behaviour, and the
industry consensus has always been that such gratuitously perverse behaviou=
r
is a bug.

The issue I raise IS reasonable behaviour.  One standard scheduling strateg=
y
is to run threads while they are consuming CPU, and to look for another to
run only when they enter a wait state.  The same principle is used in most
HPC job schedulers, where a node is dedicated to a job until that job
finishes, and is only then reallocated.

Furthermore, as I said, it really does happen in existing systems. Please=
=20
ask me in Tokyo why even configuring the system to use round-robin=20
scheduling doesn't always work. The executive summary is that, sometimes,=
=20
both threads can be assigned affinity to the same core and the looping=20
thread can end up with a higher priority. Whereupon, my scenario happens.

>Maybe the principle is that an implementation risks being incorrect if it
>introduces extra segment orderings  (Qj must precede Pi)
>not implied by the standard.   In other words, this is a processor problem
>not a user problem.
>
>What am I missing?

Any wording in the standard that I, as the local expert, could use to beat
the vendor over the head with :-(

Many (most?) vendors point-blank refuse to accept bug reports that are not
clear breaches of the standard or their documentation - and sometimes even
if they are! - security failures, or serious performance issues.

No names - no pack drill :-)


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

