From owner-sc22wg5@open-std.org  Thu Nov  6 00:28:03 2008
Return-Path: <owner-sc22wg5@open-std.org>
X-Original-To: sc22wg5-dom7
Delivered-To: sc22wg5-dom7@www2.open-std.org
Received: by www2.open-std.org (Postfix, from userid 521)
	id 1E7D5CA343A; Thu,  6 Nov 2008 00:28:03 +0100 (CET)
X-Original-To: sc22wg5@open-std.org
Delivered-To: sc22wg5@open-std.org
Received: from ppsw-5.csi.cam.ac.uk (ppsw-5.csi.cam.ac.uk [131.111.8.135])
	by www2.open-std.org (Postfix) with ESMTP id 556F3CA3434
	for <sc22wg5@open-std.org>; Thu,  6 Nov 2008 00:28:01 +0100 (CET)
X-Cam-AntiVirus: no malware found
X-Cam-SpamDetails: not scanned
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
Received: from hermes-1.csi.cam.ac.uk ([131.111.8.51]:54377)
	by ppsw-5.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.155]:25)
	with esmtpa (EXTERNAL:nmm1) id 1Kxrn7-00036E-HE (Exim 4.70) for sc22wg5@open-std.org
	(return-path <nmm1@hermes.cam.ac.uk>); Wed, 05 Nov 2008 23:28:01 +0000
Received: from prayer by hermes-1.csi.cam.ac.uk (hermes.cam.ac.uk)
	with local (PRAYER:nmm1) id 1Kxrn7-0004fA-AY (Exim 4.67)
	(return-path <nmm1@hermes.cam.ac.uk>); Wed, 05 Nov 2008 23:28:01 +0000
Received: from [83.67.89.123] by webmail.hermes.cam.ac.uk
	with HTTP (Prayer-1.3.1); 05 Nov 2008 23:28:01 +0000
Date: 05 Nov 2008 23:28:01 +0000
From: "N.M. Maclaren" <nmm1@cam.ac.uk>
To: sc22wg5 <sc22wg5@open-std.org>
Cc: sc22wg5 <sc22wg5@open-std.org>
Subject: Re: [ukfortran] (SC22WG5.3622) A comment on John Wallin's comments	on	Nick MacLaren's comments
Message-ID: <Prayer.1.3.1.0811052328010.31449@hermes-1.csi.cam.ac.uk>
In-Reply-To: <20081105225653.DCEA7CA3428@www2.open-std.org>
References: <20081105225653.DCEA7CA3428@www2.open-std.org>
X-Mailer: Prayer v1.3.1
Mime-Version: 1.0
Content-Type: text/plain; format=flowed; charset=ISO-8859-1
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

John Wallin wrote:
>
> Here is a re-post of my response as per John Reid's request - with a few
> minor changes.

I don't think that you are acceptable to the SC22WG5 list!  Aren't modern
computers wonderful?

> > So serial Fortran will not go away any time soon, and some people will
> > positively want coarray-free compilers (or a fixable mode).
> 
> I think that most people would not want to use a code that only runs on
> a single processor of a larger machine if they had access to even a
> basic PC.  The performance of the low end PC with multiple cores will
> almost always trump the use of a single core on a larger multi-core
> machine.

Yes and no.  There are lots of reasons that they do - though I agree
that pure performance isn't one of them.

> 1) You can lower the priority of the jobs using the renice command.  ...

Priorities haven't worked in 20 years :-(  Are you in Tokyo?  Ask me
why, if you are.

> 2) It would seem sensible to be able set user limits on the number of
> images available.  This is clearly an implementation issue, not
> something intrinsic to the language.  (See the note by Alex)

Been there - done that :-(  Unfortunately, programs written assuming
multiple images won't necessarily work if restricted to a single image
or even two.  Parallel programmers count one, two, many ....

> If you have any such examples that apply to distributed memory systems
> > without special RDMA/coarray hardware, please Email them to me.
> 
> MPI is a distributed memory system, so my treecode runs on everything.
> 
> Here is a simple example of an MPI coding horror.  I have to pass a
> Fortran derived type between nodes.

Oh, the derived type issue.  It's not fundamental to MPI, but is mainly
because MPI is currently a Fortran 77 interface.  A proper Fortran 2003
interface would be a lot better.  It doesn't affect programs that don't
use derived types, of course.

Anyway, I regard the right solution to this is that every derived type
should have code and decode primitives, and that you should transfer
the encoded form.  You need them for a great many purposes, including
unformatted I/O, not just MPI.

> The important thing to note is that every time a grad student adds a
> single element to the data structure, you have to alter the block
> counts and sizes by hand.

Sorry, but no.  There are better solutions.  I agree that none are
pretty, and all involve enforcing strict discipline on the programmers,
but they have been known since time immemorial.  Parameterisation is
the key.

> In short, coarrays would make my head hurt less.

Most of the users I have dealt with have backed off shared-memory
paradigms when they found that they couldn't debug or tune them, and
gone back to MPI.  The problem is that there are, and can be, no tools
to trap race conditions.

> With the new multicore/multi-box architectures, we actually need a
> change in the language to write codes for these machines.

Agreed.  Now, whether coarrays are that change, I shall not say ....

> I would suggest talking to the UPC forum about this.  They have a lot
> of experience with this, and can address it directly.

Not much, actually.  There aren't many versions, and UPC isn't much
used by real scientists.  Also, my investigations indicated that it
doesn't actually do what most people think that it does.  I can send
you the document and test program if you want.

> It is very possible to create codes with deadlocks on any machine.  Of
> course, higher latency combined with lower bandwidth brings this
> problems to the forefront more quickly.  However, this would seem to be
> a problem with the program rather than the language.  Non-synchronized
> memory doesn't behave well, so users need to beware.  Similar problem
> exist in all parallel languages.

I am talking about programs that have no deadlocks in, but where they
are introduced by the implementation.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1@cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679
b