mdspan
Document #: | P3242R0 |
Date: | 2024-04-15 |
Project: | Programming Language C++ LEWG |
Reply-to: |
Nicolas Morales <nmmoral@sandia.gov> Christian Trott <crtrott@sandia.gov> Mark Hoemmen <mark.hoemmen@gmail.com> Damien Lebrun-Grandie <lebrungrandt@ornl.gov> |
C++23 introduced mdspan
([P0009R18]), a non-owning
multidimensional array abstraction that has a customizable layout.
Layout customization was originally motivated in [P0009R18] with considerations for
interoperability and performance, particularly on different
architectures. Moreover, [P2630R4] introduced
submdspan
, a slicing function that can yield arbitrarily
strided layouts. However, without standard library support, copying
efficiently between mdspan
s with mixes of complex layouts
is challenging for users.
Many applications, including high-performance computing (HPC), image
processing, computer graphics, etc that benefit from mdspan
also would benefit from basic memory operations provided in standard
algorithms such as copy and fill. Indeed, the authors found that a copy
algorithm would have been quite useful in their implementation of the
copying mdarray
([P1684R5]) constructor. A more
constrained form of copy
is also included in the standard
linear algebra library ([P1673R13]).
However, existing standard library facilities are not sufficient
here. Currently, mdspan
does not have iterators or ranges
that represent the span of the mdspan
. Additionally, it’s
not entirely clear what this would entail.
std::linalg::copy
([P1673R13]) is limited to
mdspan
s of rank 2 or lower.
Moreover, the manner in which an mdspan
is copied (or
filled) is highly performance sensitive, particularly in regards to
caching behavior when traversing mdspan
memory. A naive
user implementation is easy to get wrong in addition to being tedious
for higher rank mdspan
s. Ideally, an implementation would
be free to use information about the layout of the mdspan
known at compile time to perform optimizations; e.g. a continuous span
mdspan
copy for trivial types could be implemented with a
memcpy
.
Finally, providing these generic algorithms would also enable these
operations for types that are representable by mdspan
. For
example, this would naturally include mdarray
, which is
convertible to mdspan
, or for user-defined types whose view
of memory corresponds to mdspan
s (e.g. an image class or
something similar).
Due to the closed nature of mdspan
extents, copy
operations can be checked by the implementation to prevent past-the-end
writes. This is an advantage the proposed copy operation has over the
existing operations in the standard.
The main design direction of this proposal is to provide methods for
copying and filling mdspan
s that may have differing layouts
and accessors, while allowing implementations to provide efficient
implementations for special cases. For example, if a copy occurs between
two mdspan
s with the same layout mapping type that is
contiguous and both use default_accessor
, the intention is
that this could be implemented by a single memcpy
.
Furthermore, accessors as a customization point should be enabled, as
with any other mdspan
operation. For example, a custom
accessor that checks a condition inside of the access
method should still work and check that condition. It’s worth noting
that there may be a high sensitivity of how much implementations able to
optimize if provided custom accessors. For example, optimizations could
be disabled if using a custom accessor that is identical to the default
accessor.
Finally, there is some question as to whether copy
and
fill
should return a value when applied to
mdspan
, as the iterator and ranged-based algorithms do. We
believe that mdspan
copy and fill should return void, as
there is no past-the-end iterator that they could reasonably return.
Currently, we are proposing adding copy
and
fill
algorithms on mdspan
to header
<mdspan>
. We considered other options, namely:
<algorithm>
: This would mean that users of
iterator-based algorithms would need to pull in
<mdspan>
. On the other hand, this is where
iterator-based copy
and fill
live so may be
preferable in that sense.<mdspan_algorithm>
(or similarly any other new
header): This seems like overkill for two functions. However, in the
future, we may want to add new algorithms for mdspan
that
are not strictly covered by existing algorithms in
<algorithm>
, so this option may be more future
proof.We settled on <mdspan>
because as proposed this is
a relatively light-weight addition that reflects operations that are
commonly desired with mdspan
s. However, the authors are
open to changing this.
copy
in
std::linalg
[P1673R13] introduced several linear
algebra operations including std::linalg::copy
. This
operation only applies to mdspan
s with rank ≤ 2.
This paper is proposing a version of copy
that is not
constrained by the number of ranks and differs from
std::linalg::copy
in some important ways outline below.
Note that right now the strict addition of copy
would
potentially cause the following code to be ambiguous, due to ADL-finding
std::copy
:
using std::linalg::copy;
(mds1, mds2); copy
One possibility would be to remove std::linalg::copy
, as
it is a subset of the proposed std::copy
. This was rejected
by the paper authors because of certain requirements in
[linalg.reqs.alg] – that is:
The function may make arbitrarily many objects of any linear algebra value type, value-initializing or direct-initializing them with any existing object of that type.
This requirement is likely undesirable for a generalized copy algorithm.
There is a similar argument against simply generalizing
std::linalg::copy
. In addition to the freedom of
std::linalg::copy
to arbitrarily value or
direct-initializing values, using the linear algebra version of copy
would require the use of unnecessary includes and namespaces. It seems
not very ergonomic for a user to have to use
std::linalg::copy
and include <linalg>
even if the mdspan
operations they are performing are
unrelated to linear algebra.
There are a few additions that are analogous to existing standard
algorithms that are not included in this proposal, both to keep the
proposal small and because some of these algorithms do not make sense in
the context of mdspan
s:
std::move
: Perhaps this should be included for
completeness’s sake. However, it doesn’t seem applicable to the typical
usage of mdspan
.(copy|fill)_n
: As a multidimensional view
mdspan
does not in general follow a specific ordering.
Memory ordering may not be obvious to calling code, so it’s not even
clear how these would work. Any applications intending to copy a subset
of mdspan
should use call copy
on the result
of submdspan
.copy_backward
: As above, there is no specific ordering.
A similar effect could be achieved via transformations with a custom
layout, similar to layout_transpose
in [P1673R13].std::for_each
.
for_each
in particular is a desirable but brings in many
unanswered questions that should be addressed in a different paper.A prototype implementation of this paper can be found in a PR in https://github.com/kokkos/mdspan/pull/325. The PR
includes an example of how vendors could provide an efficient
implementation for certain combinations of mdspan
properties and layout policies.
template<class SrcElementType, class SrcExtents, class SrcLayoutPolicy, class SrcAccessorPolicy,
class DstElementType, class DstExtents, class DstLayoutPolicy, class DstAccessorPolicy>
void copy(mdspan<SrcElementType, SrcExtents, SrcLayoutPolicy, SrcAccessorPolicy> src,
<DstElementType, DstExtents, DstLayoutPolicy, DstAccessorPolicy> dst);
mdspan
template<class ExecutionPolicy,
class SrcElementType, class SrcExtents, class SrcLayoutPolicy, class SrcAccessorPolicy,
class DstElementType, class DstExtents, class DstLayoutPolicy, class DstAccessorPolicy>
void copy(ExecutionPolicy&& policy,
<SrcElementType, SrcExtents, SrcLayoutPolicy, SrcAccessorPolicy> src,
mdspan<DstElementType, DstExtents, DstLayoutPolicy, DstAccessorPolicy> dst); mdspan
1 Constraints:
(1.1)
std::is_assignable_v<typename mdspan<SrcElementType, SrcExtents, SrcLayoutPolicy, SrcAccessorPolicy>::reference, typename mdspan<DstElementType, DstExtents, DstLayoutPolicy, DstAccessorPolicy>::reference>
is true
.
(1.2)
is_constructible_v<DstExtents, SrcExtents>
.
2 Preconditions:
(2.1)
src.extents() == dst.extents()
is
true
.
(2.2)
dst.is_unique()
is true
.
(2.3)
there is no unique multidimensional index i...
in
src.extents()
where there exists a multidimensional index
j...
in dst.extents()
such that
src[i...]
and dst[j...]
refer to the same
element.
3
Effects: for all unique multidimensional indices
i...
in src.extents()
, assigns
src[i...]
to dst[i...]
template<class ElementType, class Extents, class LayoutPolicy, class AccessorPolicy, class T>
void fill(mdspan<ElementType, Extents, LayoutPolicy, AccessorPolicy> dst, const T& value);
template<class ExecutionPolicy,
class ElementType, class Extents, class LayoutPolicy, class AccessorPolicy, class T>
void fill(ExecutionPolicy&& policy,
<ElementType, Extents, LayoutPolicy, AccessorPolicy> dst, const T& value); mdspan
4
Constraints:
std::is_assignable_v<typename mdspan<ElementType, Extents, LayoutPolicy, AccessorPolicy>::reference, const T&>
5
Effects: for all unique multidimensional indices
i...
in dst.extents()
, assigns
value
to dst[i...]
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.