From owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org  Wed Jul  5 21:30:58 2017
Return-Path: <owner-sc22wg5+sc22wg5-dom8=www.open-std.org@open-std.org>
X-Original-To: sc22wg5-dom8
Delivered-To: sc22wg5-dom8@www.open-std.org
Received: by www.open-std.org (Postfix, from userid 521)
	id 3DD183581F6; Wed,  5 Jul 2017 21:30:58 +0200 (CEST)
Delivered-To: sc22wg5@open-std.org
Received: from mail-wr0-f179.google.com (mail-wr0-f179.google.com [209.85.128.179])
	(using TLSv1 with cipher AES128-SHA (128/128 bits))
	(No client certificate requested)
	by www.open-std.org (Postfix) with ESMTP id EAC9935695B
	for <sc22wg5@open-std.org>; Wed,  5 Jul 2017 21:30:57 +0200 (CEST)
Received: by mail-wr0-f179.google.com with SMTP id 77so268808034wrb.1
        for <sc22wg5@open-std.org>; Wed, 05 Jul 2017 12:30:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc;
        bh=hX1v3MWwdNWHe/ner0Vw+0RFRsTIau5vwtrPVDgXwho=;
        b=mwcaq0XC9W2YBYYZBui6+Hh7S08aVPtDhznKsJqGJv0J+7x5qQ48I+8YDp+lEm/5v5
         tV3HVcDND12sW8xHAtfyhy3wDAZ9vyagiwjxr9fCSQRIkLKr6U46OuGBmmL18DrIFtsb
         JM6JOjgMJku4rNH+pbQepK0dTll+L0lfG+Go56ioWnJnHUtWQj2m9cwh5ItM/npGmfDy
         /ucIdM/gFuUdQ6/J4sOO6o68DO7bBVqtcnaFXUzZSKjaStkrPstj6uYNKNSDIYIwmG8v
         bA64XWtjwP9baT5iRsleqpwpHoWbz1t6ni3sdpAeuVRRXlj2ciPpMdMK0XL02zeCvUsl
         jZtw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc;
        bh=hX1v3MWwdNWHe/ner0Vw+0RFRsTIau5vwtrPVDgXwho=;
        b=kNmLsafPctsgMg7xNQwbmpo7Q1Vr2QplzWHY/5lTG8w6sUfwTgmCJDY8vkTvnqUfkp
         z1fUMWTIs5/6OIJ6iwEFTLBny7yPBe8IyFBFQ74RAW+xJR2kn6Q4LKt2paoreNJtZ1P1
         uu5YvSYA2nwYbr4o78S9emueJg8h3WGu97PwlUQVcdtmtmMhLwocLOigQOHWXcn04745
         x3n4mGoglpJdv2+elxzIEemPuhb+W2TCuzLmOi2ffvWyQ5ajZ6v35eWC4aMaVDBzCdnt
         v8XL5aXO8yANSB2nZah8XSDoO5O0fRvzXZqEVNNI63NcnCgV6+Ba5mfmJnd6vUTOJNUm
         q8vQ==
X-Gm-Message-State: AKS2vOwzTo+3FyHcYro1ODSzmeu/OiSy8h2xZIuVinTOFr/QNvNYXtX1
	idqIyIiuSg6++C29GRzeGJdXnZFjEA==
X-Received: by 10.80.183.232 with SMTP id i37mr22821003ede.50.1499283057292;
 Wed, 05 Jul 2017 12:30:57 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.80.224.131 with HTTP; Wed, 5 Jul 2017 12:30:16 -0700 (PDT)
In-Reply-To: <888EAD5C-B10E-4E55-9F63-35F1BBE2F342@nasa.gov>
References: <20170705131003.C2A753587D1@www.open-std.org> <677196EB-62B3-448D-8AD9-6D0E36BAFD32@cray.com>
 <888EAD5C-B10E-4E55-9F63-35F1BBE2F342@nasa.gov>
From: Keith Bierman <khbkhb@gmail.com>
Date: Wed, 5 Jul 2017 13:30:16 -0600
Message-ID: <CAHSokPwxYxdNMrUZ+o2zb56nphhvCM5RSZ7_ww135UyRh5cicw@mail.gmail.com>
Subject: Re: (j3.2006) (SC22WG5.5889) 3 levels of parallelism?
To: fortran standards email list for J3 <j3@mailman.j3-fortran.org>
Cc: sc22wg5 <sc22wg5@open-std.org>
Content-Type: multipart/alternative; boundary="94eb2c0c3d460c3aab0553970acb"
Sender: owner-sc22wg5@open-std.org
Precedence: bulk

--94eb2c0c3d460c3aab0553970acb
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Out of curiosity, how do folks think we could best leverage/hijack an
integrated FPGA resource? Traditionally these have been "coprocessors" with
a great separation from the language and typically hanging off an IO bridge
(conceptually similar to the early array processors). I'm seeing some
movement towards greater / closer integration on the motherboard so that
tight integration with the application program is at least theoretically
within reach.

Typically the folks doing this sort of thing code for the FPGA in verilog
... and integrating with that in C.

Compiling directly to the FPGA would be nice; but assuming that's still
done in verilog ... I assume coroutines would be our best bet ... any other
obvious approaches (that someone is prepared to say anything about ...
without an NDA ;>).

Keith Bierman
khbkhb@gmail.com
303 997 2749

On Wed, Jul 5, 2017 at 1:01 PM, Clune, Thomas L. (GSFC-6101) <
thomas.l.clune@nasa.gov> wrote:

> Bill,
>
> Thanks.    I should have realized that array notation was the missing bit=
.
>
> It will be interesting to see if Nvidia sees the situation in a similar
> light.   Gary? ...
>
> - Tom
>
>
>
> > On Jul 5, 2017, at 2:56 PM, Bill Long <longb@cray.com> wrote:
> >
> >
> >> On Jul 5, 2017, at 8:02 AM, Clune, Thomas L. (GSFC-6101) <
> thomas.l.clune@nasa.gov> wrote:
> >>
> >>
> >> Coarrays and DO CONCURRENT are major advances for parallel programming
> in Fortran.    However, as we look down the road, I think it is important
> for us to consider some of the insights that have come from the HPC
> community.   In particular, there is fairly clear consensus that it is
> important in user code to explicitly manage _3_ different levels of
> parallelism.    This is more explicit in the cases like GPU=E2=80=99s but=
 even
> Intel Phi and conventional processors have shown the importance of
> carefully coding at each of 3 levels.   Roughly speaking, these levels
> correspond to (1) coarse-grained message passing (inter-node),  (2)
> threading (within-node), and (3) vectorization.     But this corresponden=
ce
> is only suggestive - the actual breakdown in GPU=E2=80=99s is somewhat di=
fferent.
> >
> > 1) Coarrays, and the general parallel model that goes with them, cover
> the internode case.  Actually inter-image, to use Fortran language.  The
> mapping of images to nodes is outside the scope of the standard.  Because
> you could map images to cores within a node, this model can be applied to
> option (2) as well - within-node.
> >
> > 2) DO CONCURRENT does not require threading. It provides the compiler
> with sufficient information to ensure that threaded code is =E2=80=9Csafe=
=E2=80=9D.  The
> language, in general, provides semantics that permit various forms of
> optimization.  DO CONCURRENT provides enough to allow the compiler to
> assign different loop iterations to different threads.  Note that, based =
on
> the provided semantics, the compiler can choose threading, or
> vectorization, or both, for the loop, depending on the code involved in t=
he
> loop body.
> >
> > 3) Fortran was a pioneer in =E2=80=9Cvectorization=E2=80=9D with arrays=
 being first
> class objects and =E2=80=9Carray-syntax=E2=80=9D expressions.   Array exp=
ressions can
> usually be vectorized trivially.  If the target hardware can benefit from
> vector code, you should expect the compiler to generate it.  Automatic
> vectorization of loops has been available in Fortran compilers for decade=
s.
> Since Fortran 90, automatic vectorization of array expressions has been t=
he
> norm.
> >
> > Optimistically, compilers claiming to be F2008 conforming should be
> handling all these cases by default.  Although some still require compile=
r
> options to enable the coarray-based SPMD capabilities.   I expect that
> limitation to go away eventually.
> >
> > Cheers,
> > Bill
> >
> >>
> >> In Garching various statements were made that Coarrays are good for
> both (1) and (2).   Likewise statements were made that DO CONCURRENT is
> good for (2) and (3).    And I=E2=80=99m not arguing for or against this.=
 But it
> would behoove us to be certain that we really can _effectively_ address a=
ll
> 3 levels of parallelism with the standard as is.    Otherwise, to ensure
> that Fortran retains its focus on HPC, we should be looking for a suitabl=
e
> extension that enables explicit control at all 3 levels in an architectur=
e
> independent manner.    Maybe it is obvious to others in the committee, in
> which case I=E2=80=99ll be happy to sit back and absorb wisdom.
> >>
> >> Cheers,
> >>
> >> - Tom
> >>
> >> _______________________________________________
> >> J3 mailing list
> >> J3@mailman.j3-fortran.org
> >> http://mailman.j3-fortran.org/mailman/listinfo/j3
> >
> > Bill Long
>        longb@cray.com
> > Principal Engineer, Fortran Technical Support &   voice:  651-605-9024
> > Bioinformatics Software Development                      fax:
> 651-605-9143
> > Cray Inc./ 2131 Lindau Lane/  Suite 1000/  Bloomington, MN  55425
> >
> >
> > _______________________________________________
> > J3 mailing list
> > J3@mailman.j3-fortran.org
> > http://mailman.j3-fortran.org/mailman/listinfo/j3
>
> _______________________________________________
> J3 mailing list
> J3@mailman.j3-fortran.org
> http://mailman.j3-fortran.org/mailman/listinfo/j3
>

--94eb2c0c3d460c3aab0553970acb
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:georgia,=
serif;color:#6633ff">Out of curiosity, how do folks think we could best lev=
erage/hijack an integrated FPGA resource? Traditionally these have been &qu=
ot;coprocessors&quot; with a great separation from the language and typical=
ly hanging off an IO bridge (conceptually similar to the early array proces=
sors). I&#39;m seeing some movement towards greater / closer integration on=
 the motherboard so that tight integration with the application program is =
at least theoretically within reach.</div><div class=3D"gmail_default" styl=
e=3D"font-family:georgia,serif;color:#6633ff"><br></div><div class=3D"gmail=
_default" style=3D"font-family:georgia,serif;color:#6633ff">Typically the f=
olks doing this sort of thing code for the FPGA in verilog ... and integrat=
ing with that in C.=C2=A0</div><div class=3D"gmail_default" style=3D"font-f=
amily:georgia,serif;color:#6633ff"><br></div><div class=3D"gmail_default" s=
tyle=3D"font-family:georgia,serif;color:#6633ff">Compiling directly to the =
FPGA would be nice; but assuming that&#39;s still done in verilog ... I ass=
ume coroutines would be our best bet ... any other obvious approaches (that=
 someone is prepared to say anything about ... without an NDA ;&gt;).</div>=
</div><div class=3D"gmail_extra"><br clear=3D"all"><div><div class=3D"gmail=
_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div>Keith =
Bierman<br><a href=3D"mailto:khbkhb@gmail.com" target=3D"_blank">khbkhb@gma=
il.com</a><br>303 997 2749</div></div></div></div>
<br><div class=3D"gmail_quote">On Wed, Jul 5, 2017 at 1:01 PM, Clune, Thoma=
s L. (GSFC-6101) <span dir=3D"ltr">&lt;<a href=3D"mailto:thomas.l.clune@nas=
a.gov" target=3D"_blank">thomas.l.clune@nasa.gov</a>&gt;</span> wrote:<br><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex">Bill,<br>
<br>
Thanks.=C2=A0 =C2=A0 I should have realized that array notation was the mis=
sing bit.<br>
<br>
It will be interesting to see if Nvidia sees the situation in a similar lig=
ht.=C2=A0 =C2=A0Gary? ...<br>
<br>
- Tom<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
<br>
<br>
&gt; On Jul 5, 2017, at 2:56 PM, Bill Long &lt;<a href=3D"mailto:longb@cray=
.com">longb@cray.com</a>&gt; wrote:<br>
&gt;<br>
&gt;<br>
&gt;&gt; On Jul 5, 2017, at 8:02 AM, Clune, Thomas L. (GSFC-6101) &lt;<a hr=
ef=3D"mailto:thomas.l.clune@nasa.gov">thomas.l.clune@nasa.gov</a>&gt; wrote=
:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; Coarrays and DO CONCURRENT are major advances for parallel program=
ming in Fortran.=C2=A0 =C2=A0 However, as we look down the road, I think it=
 is important for us to consider some of the insights that have come from t=
he HPC community.=C2=A0 =C2=A0In particular, there is fairly clear consensu=
s that it is important in user code to explicitly manage _3_ different leve=
ls of parallelism.=C2=A0 =C2=A0 This is more explicit in the cases like GPU=
=E2=80=99s but even Intel Phi and conventional processors have shown the im=
portance of carefully coding at each of 3 levels.=C2=A0 =C2=A0Roughly speak=
ing, these levels correspond to (1) coarse-grained message passing (inter-n=
ode),=C2=A0 (2) threading (within-node), and (3) vectorization.=C2=A0 =C2=
=A0 =C2=A0But this correspondence is only suggestive - the actual breakdown=
 in GPU=E2=80=99s is somewhat different.<br>
&gt;<br>
&gt; 1) Coarrays, and the general parallel model that goes with them, cover=
 the internode case.=C2=A0 Actually inter-image, to use Fortran language.=
=C2=A0 The mapping of images to nodes is outside the scope of the standard.=
=C2=A0 Because you could map images to cores within a node, this model can =
be applied to option (2) as well - within-node.<br>
&gt;<br>
&gt; 2) DO CONCURRENT does not require threading. It provides the compiler =
with sufficient information to ensure that threaded code is =E2=80=9Csafe=
=E2=80=9D.=C2=A0 The language, in general, provides semantics that permit v=
arious forms of optimization.=C2=A0 DO CONCURRENT provides enough to allow =
the compiler to assign different loop iterations to different threads.=C2=
=A0 Note that, based on the provided semantics, the compiler can choose thr=
eading, or vectorization, or both, for the loop, depending on the code invo=
lved in the loop body.<br>
&gt;<br>
&gt; 3) Fortran was a pioneer in =E2=80=9Cvectorization=E2=80=9D with array=
s being first class objects and =E2=80=9Carray-syntax=E2=80=9D expressions.=
=C2=A0 =C2=A0Array expressions can usually be vectorized trivially.=C2=A0 I=
f the target hardware can benefit from vector code, you should expect the c=
ompiler to generate it.=C2=A0 Automatic vectorization of loops has been ava=
ilable in Fortran compilers for decades. Since Fortran 90, automatic vector=
ization of array expressions has been the norm.<br>
&gt;<br>
&gt; Optimistically, compilers claiming to be F2008 conforming should be ha=
ndling all these cases by default.=C2=A0 Although some still require compil=
er options to enable the coarray-based SPMD capabilities.=C2=A0 =C2=A0I exp=
ect that limitation to go away eventually.<br>
&gt;<br>
&gt; Cheers,<br>
&gt; Bill<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt; In Garching various statements were made that Coarrays are good fo=
r both (1) and (2).=C2=A0 =C2=A0Likewise statements were made that DO CONCU=
RRENT is good for (2) and (3).=C2=A0 =C2=A0 And I=E2=80=99m not arguing for=
 or against this. But it would behoove us to be certain that we really can =
_effectively_ address all 3 levels of parallelism with the standard as is.=
=C2=A0 =C2=A0 Otherwise, to ensure that Fortran retains its focus on HPC, w=
e should be looking for a suitable extension that enables explicit control =
at all 3 levels in an architecture independent manner.=C2=A0 =C2=A0 Maybe i=
t is obvious to others in the committee, in which case I=E2=80=99ll be happ=
y to sit back and absorb wisdom.<br>
&gt;&gt;<br>
&gt;&gt; Cheers,<br>
&gt;&gt;<br>
&gt;&gt; - Tom<br>
&gt;&gt;<br>
&gt;&gt; ______________________________<wbr>_________________<br>
&gt;&gt; J3 mailing list<br>
&gt;&gt; <a href=3D"mailto:J3@mailman.j3-fortran.org">J3@mailman.j3-fortran=
.org</a><br>
&gt;&gt; <a href=3D"http://mailman.j3-fortran.org/mailman/listinfo/j3" rel=
=3D"noreferrer" target=3D"_blank">http://mailman.j3-fortran.org/<wbr>mailma=
n/listinfo/j3</a><br>
&gt;<br>
&gt; Bill Long=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<a href=3D"mailto:longb@cray.c=
om">longb@cray.com</a><br>
&gt; Principal Engineer, Fortran Technical Support &amp;=C2=A0 =C2=A0voice:=
=C2=A0 <a href=3D"tel:651-605-9024" value=3D"+16516059024">651-605-9024</a>=
<br>
&gt; Bioinformatics Software Development=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 fax:=C2=A0 <a href=3D"tel:651-605=
-9143" value=3D"+16516059143">651-605-9143</a><br>
&gt; Cray Inc./ 2131 Lindau Lane/=C2=A0 Suite 1000/=C2=A0 Bloomington, MN=
=C2=A0 55425<br>
&gt;<br>
&gt;<br>
&gt; ______________________________<wbr>_________________<br>
&gt; J3 mailing list<br>
&gt; <a href=3D"mailto:J3@mailman.j3-fortran.org">J3@mailman.j3-fortran.org=
</a><br>
&gt; <a href=3D"http://mailman.j3-fortran.org/mailman/listinfo/j3" rel=3D"n=
oreferrer" target=3D"_blank">http://mailman.j3-fortran.org/<wbr>mailman/lis=
tinfo/j3</a><br>
<br>
______________________________<wbr>_________________<br>
J3 mailing list<br>
<a href=3D"mailto:J3@mailman.j3-fortran.org">J3@mailman.j3-fortran.org</a><=
br>
<a href=3D"http://mailman.j3-fortran.org/mailman/listinfo/j3" rel=3D"norefe=
rrer" target=3D"_blank">http://mailman.j3-fortran.org/<wbr>mailman/listinfo=
/j3</a><br>
</div></div></blockquote></div><br></div>

--94eb2c0c3d460c3aab0553970acb--