aligned_accessor
:
An mdspan accessor expressing pointer overalignmentDocument #: | P2897R2 |
Date: | 2024-07-15 |
Project: | Programming Language C++ LEWG |
Reply-to: |
Mark Hoemmen <mhoemmen@nvidia.com> Damien Lebrun-Grandie <lebrungrandt@ornl.gov> Nicolas Morales <nmmoral@sandia.gov> Christian Trott <crtrott@sandia.gov> |
default_accessor
is_sufficiently_aligned
is not
constexpr
is_sufficiently_aligned
for all accessors?
is_sufficiently_aligned
is specific to
aligned_accessor
detectably_invalid
:
Generic validity check?detectably_invalid
is_sufficiently_aligned
and
detectably_invalid
is_sufficiently_aligned
is_sufficiently_aligned
designdetectably_invalid
nonmember function exampleMark Hoemmen (mhoemmen@nvidia.com) (NVIDIA)
Damien Lebrun-Grandie (lebrungrandt@ornl.gov) (Oak Ridge National Laboratory)
Nicolas Morales (nmmoral@sandia.gov) (Sandia National Laboratories)
Christian Trott (crtrott@sandia.gov) (Sandia National Laboratories)
Revision 0 (pre-Varna) to be submitted 2023-05-19
Revision 1 (pre-Kona) to be submitted 2023-10-15
Implement changes requested by LEWG review on 2023-10-10
Change gcd
converting constructor Constraint to a
Mandate
Add Example in the wording section that uses
is_sufficiently_aligned
to check the pointer overalignment
precondition
Add Example in the wording section that uses
aligned_alloc
to create an overaligned allocation, to show
that aligned_accessor
exists as part of a system
Add an explicit
constructor from default_accessor,
so that users can type aligned_mdspan y{x}
instead of
aligned_mdspan y{x.data_handle(), x.mapping()}
. Add an
explanation in the design discussion section.
Implement other wording changes
aligned_accessor
’s Mandates that
byte_alignment >= alignof(ElementType)
is
true
. This prevents construction of an invalid
aligned_accessor
object.Add more design discussion based on LEWG review on 2023-10-10
Explain why we do not include an aligned_mdspan
alias
Explain aligned_accessor
construction
safety
Revision 2 (post - St. Louis) to be submitted 2024-07-15
Implement required changes from LEWG review of R1 on 2024-06-28
constexpr
from
is_sufficiently_aligned
Discuss optional suggestions from LEWG review of R1 on 2024-06-28
Add explicit
converting constructor vs. named cast
(“naughty_cast
”) discussion
Add detectably_invalid
discussion
Ask LEWG to consider the alternative design that makes
is_sufficiently_aligned
a nonmember function in
<bit>
instead of a member function of
aligned_accessor
, while LWG review of R2 proceeds
concurrently
P2389R2 was voted into the Working Draft at St. Louis, so replace
use of dextents
in examples with
dims
.
Add non-wording section explaining why
aligned_accessor
has no explicit
constructor
from less to more alignment
Add Compiler Explorer link with full implementation and demo
Revision 3 (post - St. Louis) to be submitted 2024-07-15
We propose adding aligned_accessor
to the C++ Standard
Library. This class template is an mdspan
accessor policy
that uses assume_aligned
to decorate pointer access. We
think it belongs in the Standard Library for two reasons. First, it
would serve as a common vocabulary type for interfaces that take
mdspan
to declare their minimum alignment requirements.
Second, it extends to mdspan
accesses the optimizations
that compilers can perform to pointers decorated with
assume_aligned
.
aligned_accessor
is analogous to the various
atomic_accessor_*
templates proposed by P2689. Both that
proposal and this one start with a Standard Library feature that
operates on a “raw” pointer (assume_aligned
or the various
atomic_ref*
templates), and then propose an
mdspan
accessor policy that straightforwardly wraps the
lower-level feature.
We had originally written aligned_accessor
as an example
in P2642, which proposes “padded” mdspan layouts. We realized that
aligned_accessor
was more generally applicable and that
standardization would help the padded layouts proposed by P2642 reach
their maximum value.
offset_policy
is
default_accessor
data_handle_type
is
ElementType*
Permitted implicit conversions
from nonconst to const ElementType
,
from more overalignment to less overalignment, and
from overalignment to no overalignment
(default_accessor
)
explicit
converting constructor from
default_accessor
lets users assert overalignment
is_sufficiently_aligned
checks pointer
alignment
The offset_policy
alias is
default_accessor<ElementType>
, because even if a
pointer p
is aligned, p + i
might not be.
The data_handle_type
alias is ElementType*
.
It needs no further adornment, because alignment is asserted at the
point of access, namely in the access
function. Some
implementations might have an easier time optimizing if they also apply
some implementation-specific attribute to data_handle_type
itself. Examples of such attributes include
__declspec(align_value(byte_alignment))
and
__attribute__((align_value(byte_alignment)))
. However,
these attributes should not apply to the result of offset
,
for the same reason that offset_policy
is
default_accessor
and not aligned_accessor
.
The converting constructor from aligned_accessor
is
analogous to default_accessor
’s constructor, in that it
exists to permit conversion from nonconst element_type
to
const element_type
. It additionally permits implicit
conversion from more overalignment to less overalignment – something
that we expect users may need to do. For example, users may start with
aligned_accessor<float, 128>
, because their
allocation function promises 128-byte alignment. However, they may then
need to call a function that takes an mdspan
with
aligned_accessor<float, 32>
, which declares the
function’s intent to use 8-wide SIMD of float
.
The is_sufficiently_aligned
function checks whether a
pointer has sufficient alignment to be used correctly with the class.
This makes it easier for users to check preconditions, without needing
to know how to cast a pointer to an integer of the correct size and
signedness.
Per discussion below, we would like LEWG to consider the alternative
design that removes is_sufficiently_aligned
from
aligned_accessor
and adds it to the
<bit>
header as a separate nonmember function. We
think LWG review of aligned_accessor
can proceed
concurrently. We present wording for this design alternative below.
We considered making aligned_accessor
“wrap” any
accessor type that meets the right requirements. For example,
aligned_accessor
could take the inner accessor as a
template parameter, store an instance of it, and dispatch to its member
functions. That would give users a way to apply multiple accessor
“attributes” to their data handle, such as atomic access (see P2689) and
overalignment.
We decided against this approach for three reasons. First, we would
have no way to validate that the user’s accessor type has the correct
behavior. We could check that their accessor’s
data_handle_type
is a pointer type, but we could not check
that their accessor’s access
function actually dereferences
the pointer. For instance, access
might instead interpret
the pointer as a file handle or a key into a distributed data store.
Second, even if the inner accessor’s access
function
actually did return the result of dereferencing the pointer, the outer
access
function might not be able to recover the effects of
the inner access
function, because access
computes a reference
, not a pointer. In order for
aligned_accessor
’s access
function to get back
that pointer, it would need to reach past the inner accessor’s public
interface. That would defeat the purpose of generic nesting.
Third, any way (not just this one) of nesting two generic accessors
raises the question of order dependence. Even if it were possible to
apply the effects of both the inner and outer accessors’
access
functions in sequence, it might be unpleasantly
surprising to users if the effects depended on the order of nesting. A
similar question came up in the “properties” proposal P0900, which we
quote here.
Practically speaking, it would be considered a best practice of a high-quality implementation to ensure that a property’s implementation of
properties::element_type_t
(and other traits) are invariant with respect to ordering with other known properties (such as those in the standard library), but with this approach it would be impossible to make that guarantee formal, particularly with respect to other vendor-defined and user-defined properties unknown to the property implementer.
For these reasons, we have made aligned_accessor
stand-alone, instead of having it modify another user-provided
accessor.
default_accessor
LEWG’s 2023-10-10 review of R0 pointed out that in R0,
aligned_accessor
lacks an explicit
constructor
from default_accessor
. Having that constructor would make
it easier for users to create an aligned mdspan
from an
unaligned mdspan
. Making it explicit
would
prevent implicit conversion. Thus, we have decided to add this
explicit
constructor in R1.
Without the explicit
constructor, users have two options
for turning a nonaligned mdspan
into an aligned
mdspan
. First, as in the following example, users could
“take apart” the input nonaligned mdspan
and use the pieces
to construct an aligned mdspan
, whose type they name
completely.
void compute_with_aligned(
::mdspan<float, std::dims<2>, std::layout_left> matrix)
std{
const std::size_t byte_alignment = 4 * alignof(float);
using aligned_matrix_t = std::mdspan<float, std::dims<2>,
::layout_left, std::aligned_accessor<float, byte_alignment>>;
std
{matrix.data_handle(), matrix.mapping()};
aligned_matrix_t aligned_matrix// ... use aligned_matrix ...
}
Second, as in the following example, users could construct an
aligned_accessor
explicitly and use constructor template
argument deduction (CTAD) to construct the aligned mdspan
from its pieces.
void compute_with_aligned(
::mdspan<float, std::dims<2>, std::layout_left> matrix)
std{
const std::size_t byte_alignment = 4 * alignof(float);
::mdspan aligned_matrix{matrix.data_handle(), matrix.mapping(),
std::aligned_accessor<float, byte_alignment>{}};
std// ... use aligned_matrix ...
}
The first approach would likely be more common. This is because
mdspan
users commonly define their own type aliases for
mdspan
, with application-specific names that make code more
self-documenting. The aligned_matrix_t
definition above is
an an example.
Adding an explicit
constructor from
default_accessor
lets users get the same effect more
concisely, without needing to “take apart” the input
mdspan
.
void compute_with_aligned(std::mdspan<float, std::dims<2, int>, std::layout_left> matrix)
{
const std::size_t byte_alignment = 4 * alignof(float);
using aligned_mdspan = std::mdspan<float, std::dims<2, int>,
::layout_left, std::aligned_accessor<float, byte_alignment>>;
std
{matrix};
aligned_mdspan aligned_matrix// ... use aligned_matrix ...
}
The explicit
constructor does not decrease safety, in
the sense that users were always allowed to convert from an
mdspan
with default_accessor
to an
mdspan
with aligned_accessor
. Before, users
could perform this conversion by typing the following.
{matrix.data_handle(), matrix.mapping()}; aligned_matrix_t aligned_matrix
Now, users can do the same thing with fewer characters.
{matrix}; aligned_matrix_t aligned_matrix
As explained in the previous section, aligned_accessor
has an explicit
converting constructor from
default_accessor
so that users can assert overalignment. It
also has an (implicit) converting constructor from another
aligned_accessor
with more alignment, to an
aligned_accessor
with less alignment. However,
aligned_accessor
does not have an
explicit
converting constructor from another
aligned_accessor
with less alignment, to an
aligned_accessor
with more alignment. Why not?
Consider the three typical use cases for
aligned_accessor
.
User knows an allocation’s alignment at compile time.
User knows an allocation’s alignment at run time, but not at compile time. For example, the value might depend on run-time detection of particular hardware features.
User doesn’t know whether an allocation is overaligned. They might need to ask some system at run time, or check the pointer value themselves, in order to decide whether to call code that expects a particular alignment.
In Case (1), users would normally declare the maximum alignment. They
would want to preserve this information at compile time as much as
possible, by keeping the aligned_accessor
mdspan
with maximum compile-time alignment for the entire
scope of its use. Users would only want implicit conversions to less
alignment or default_accessor
when calling functions whose
parameter types encode these requirements.
Case (2) reduces to Case (3).
Case (3) reduces to Case (1). This works like any conversion from
run-time type to compile-time type, with a fixed list of possible
compile-time types (the alignments). As soon as a user’s
mdspan
enters a scope where the alignment is known at
compile time, the user would want to preserve that compile-time
information and maximize the alignment for as large of a scope as
possible.
None of these cases involve starting with more alignment, going to
less (but still some) alignment, and then going back to more alignment
again. Code that does that probably does not correctly use the types of
function parameters to express its overalignment requirements. It’s like
code that uses dynamic_cast
a lot. Users can still convert
from less or more alignment by creating the result’s
aligned_accessor
manually. However, we don’t want to
encourage this pattern, so we don’t offer an explicit conversion for
it.
In LEWG’s 2023-10-10 review of R0, participants observed that this
proposal’s examples define an example-specific type alias for
mdspan
with aligned_accessor
. They asked
whether our proposal should include a standard alias
aligned_mdspan
. We do not object to such an alias,
but we do not find it very useful, for the following reasons.
Users of mdspan
commonly define their own type
aliases whose names are meaningful for their applications.
It would not save much typing.
Examples may define aliases to make them more concise. One example in
this proposal defines the following alias for an mdspan
of
float
with alignment byte_alignment
.
template<size_t byte_alignment>
using aligned_mdspan = std::mdspan<float, std::dims<1, int>,
::layout_right, std::aligned_accessor<float, byte_alignment>>; std
This lets the example use aligned_mdspan<32>
and
aligned_mdspan<16>
.
The above alias is specific to a particular example. A general version of alias would look like this.
template<class ElementType, class Extents, class Layout,
size_t byte_alignment>
using aligned_mdspan = std::mdspan<ElementType, Extents, Layout,
::aligned_accessor<ElementType, byte_alignment>>; std
This alias would save some typing. However, mdspan “power
users” rarely type out all the template arguments. First, they can rely
on CTAD to create mdspan
s, and auto
to return
them. Second, users commonly already define their own aliases whose
names have an application-specific meaning. They define these aliases
once and use them throughout the application. For instance,
users might define the following.
template<class ElementType>
using vector_t = std::mdspan<ElementType,
::dims<1>, std::layout_left>;
stdtemplate<class ElementType>
using matrix_t = std::mdspan<ElementType,
::dims<2>, std::layout_left>;
std
template<class ElementType, size_t byte_alignment>
using aligned_vector_t = std::mdspan<ElementType,
::dims<1>, std::layout_left,
std::aligned_accessor<ElementType, byte_alignment>>;
stdtemplate<class ElementType, size_t byte_alignment>
using aligned_matrix_t = std::mdspan<ElementType,
::dims<2>, std::layout_left,
std::aligned_accessor<ElementType, byte_alignment>>; std
Such users may never type the characters “mdspan
” again.
For this reason, while we do not object to an
aligned_mdspan
alias, we do not find the proliferation of
aliases particularly ergonomic.
LEWG’s 2023-10-10 review of R0 expressed concern that
mdspan
’s constructor has no way to check
aligned_accessor
’s alignment requirements. Users can call
aligned_accessor
’s is_sufficiently_aligned(p)
static
member function with a pointer p
to
check this themselves, before constructing the mdspan
.
However, mdspan
’s constructor generally has no way to check
whether its accessor finds the caller’s data handle acceptable.
This is true for any accessor type, not just for
aligned_accessor
. It is a design feature of
mdspan
that accessors can be stateless. Most of them have
no state. Even if they have state, they do not need to be constructed
with or store the data handle. As a result, an accessor might not see a
data handle until access
or offset
is called.
Both of those member functions are performance critical, so they cannot
afford an extra branch on every call. Compare to
vector::operator[]
, which has preconditions but is not
required to perform bounds checks. Using exceptions in the manner of
vector::at
could reduce performance and would also make
mdspan
unusable in a freestanding or no-exceptions
context.
Note that aligned_accessor
does not introduce
additional preconditions beyond those of the existing C++
Standard Library feature assume_aligned
. In the words of
one LEWG reviewer, aligned_accessor
is not any more
“pointy” than assume_aligned
; it just passes the point
through without “blunting” it.
Before submitting R0 of this paper, we considered an approach
specific to aligned_accessor
, that would wrap the pointer
in a special data handle type. The data handle type would have an
explicit
constructor that takes a raw pointer, with a
precondition that the raw pointer have sufficient alignment. The
constructor would be explicit
, because it would have a
precondition. This design would force the precondition back to
mdspan
construction time. Users would have to construct the
mdspan
like this.
* raw_pointer = get_pointer_from_somewhere();
element_typeusing acc_type = aligned_accessor<element_type, byte_alignment>;
{acc_type::data_handle_type{raw_pointer}, mapping, acc_type{}}; mdspan x
We rejected this approach in favor of
is_sufficiently_aligned
for the following reasons.
Wrapping the pointer in a custom data handle class would make
every access
or offset
call need to reach
through the data handle’s interface, instead of just taking the raw
pointer directly. The access
function, and to some extent
also offset
, need to be as fast as possible. Their
performance depends on compilers being able to optimize through function
calls. The authors of mdspan
carefully balanced generality
with function call depth and other code complexity factors that may
hinder compilers from optimizing. Performance of
aligned_accessor
matters as much or even more than
performance of default_accessor
, because
aligned_accessor
exists to communicate optimization
potential.
The alignment precondition would still exist. Requiring the data
handle type to throw an exception if the pointer is not sufficiently
aligned would make mdspan
unusable in a freestanding or
no-exceptions context.
Users should not have to pay for unneeded checks. The two
examples in the wording express the two most common cases. If users get
a pointer from a function like aligned_alloc
, then they
already know its alignment, because they asked for it. If users are
computing alignment at run time to dispatch to a more optimized code
path, then they know alignment before dispatch. In both cases, users
already know the alignment before constructing the
mdspan
.
The data handle is still a pointer, it’s just a pointer with a
constraint on its values. Users would reasonably expect to be able to
use the result of data_handle()
with existing interfaces
that expect a raw pointer.
An LEWG poll on 2023-10-10, “[b]lock aligned_accessor
progressing until we have a way of checking alignment requirements
during mdspan
construction,” resulted in no consensus.
Attendance was 14.
Strongly Favor | Weakly Favor | Neutral | Weakly Against | Strongly Against |
---|---|---|---|---|
0 | 1 | 1 | 2 | 2 |
LEWG expressed an (unpolled) interest that we explore
mdspan
safety in subsequent work after the fall 2023 Kona
WG21 meeting. LEWG asked us to explore safety in a way that is not
specific to aligned_accessor
. Part of that exploration is
in the section below “Generalize is_sufficiently_aligned
for all accessors?”. We plan further exploration of this topic
elsewhere.
is_sufficiently_aligned
is not constexpr
LEWG reviewed R1 of this proposal at the June 2024 St. Louis WG21
meeting, and polled 1/10/0/0/1 (SF/F/N/A/SA) to remove
constexpr
from is_sufficiently_aligned
. This
is because it is not clear how to implement the function in a way that
could ever be a constant expression. The straightforward cross-platform
way to implement this would bit_cast
the pointer to
uintptr_t
. However, bit_cast
is not
constexpr
when converting from a pointer to an integer, per
[bit.cast] 3. Any
reinterpret_cast
similarly could not be a core constant
expression, per
[expr.const] 5.15.
One LEWG reviewer pointed out that some compilers have a built-in
operation (e.g., Clang and GCC have __builtin_bit_cast
)
that might form a constant expression when bit_cast
does
not. On the other hand, the authors could not foresee a need for
is_sufficiently_aligned
to be constexpr
and
did not want to constrain implementations to use compiler-specific
functionality.
is_sufficiently_aligned
for all accessors?is_sufficiently_aligned
is specific to
aligned_accessor
The is_sufficiently_aligned
function exists so users can
check the pointer’s alignment precondition before constructing an
mdspan
with it. This precondition check is specific to
aligned_accessor
. Furthermore, the function has a
precondition that the pointer points to a valid element. Standard C++
offers no way for users to check that. More importantly for
mdspan
users, Standard C++ offers no way to check whether a
pointer and a layout mapping’s required_span_size()
form a
valid range.
detectably_invalid
: Generic validity check?During the June 2024 St. Louis WG21 meeting, one LEWG reviewer
(please see Acknowledgments below) pointed out that code that is generic
on the accessor type currently has no way to check whether a given data
handle is valid. Specifically, given a size_t
size
(e.g., the required_span_size()
of a
given layout mapping), there is no way to check whether [ 0, size
) forms an accessible range (see
[mdspan.accessor.general]
2) of a given data handle and accessor. The reviewer suggested
adding a new member function
bool detectably_invalid(data_handle_type handle, size_t size) const noexcept;
to all mdspan
accessors. This would return
true
if the implementation can show that [ 0, size
) is not an accessible range for
handle
and the accessor, and true
otherwise.
The word “detectably” in the name would remind users that this is a
“best effort” check. It might return false
even if the
handle is invalid or if [ 0,
size
) is not an
accessible range. Also, it might return different values on different
implementations, depending on their ability to check e.g., pointer range
validity. The function would have the following design features.
It must be a non-static
member function, because in
general, accessors may have state that determines validity of the data
handle.
It must be const
because precondition-checking code
should avoid observable side effects.
It must be noexcept
because precondition-checking
code should not throw.
With such a function, users could write generic checked
mdspan
creation code like the following.
template<class LayoutMapping, class Accessor>
auto create_mdspan_with_check(
typename Accessor::data_handle_type handle,
LayoutMapping mapping,)
Accessor accessor{
if (accessor.detectably_invalid(handle, mapping.required_span_size())) {
throw std::out_of_range("Invalid data handle and/or size");
}
return mdspan{handle, mapping, accessor};
}
detectably_invalid
We didn’t include this feature in the original mdspan
design because most data handle types have no way to say with full
accuracy whether a handle and size are valid. We didn’t want to give
users the false impression that a validity check was doing anything
meaningful. Standard C++ has no way to check a raw pointer
T*
and a size, though some implementations such as CHERI
C++ ([Davis 2019] and [Watson 2020]) and run-time profiling and
debugging systems such as Valgrind do have this feature. We designed
mdspan
accessors to be able to wrap libraries that
implement a partitioned global address space (PGAS) programming model
for accessing remote data over a network. (See
P0009R18,
Section 2.7, “Why custom accessors?”.) Such libraries include the
one-sided communication interface in MPI (the Message Passing Interface
for distributed-memory parallel programming) or NVSHMEM (NVIDIA’s
implementation of the SHMEM standard). Those libraries define their own
data handle to represent remote data. For example, MPI uses an
MPI_Win
“window” object. NVSHMEM uses a C++ pointer to
represent a “symmetric address” that points to an allocation from the
“symmetric heap” (that is accessible to all participating parallel
processes). Such libraries generally do not have validity checks for
their handles.
On the other hand, a detectably_invalid
function would
let happen any checks that could happen. For instance, a
hypothetical “GPU device memory accessor” (not proposed for the C++
Standard, but existing in projects like
RAPIDS RAFT) might permit
access to an allocation of GPU “device” memory from only GPU “device”
code, not from ordinary “host” code. A common use case for GPU
allocations is to allocate device memory in host code, then pass the
pointer to device code for use there. Thus, it would be reasonable to
create an mdspan
in host code with that accessor. The
accessor could use a CUDA run-time function like
cudaPointerGetAttributes
to check if the pointer points to valid GPU memory. Even
default_accessor
could have a simple check like this.
bool detectably_invalid(data_handle_type ptr, size_t size) const {
return ptr == nullptr && size != 0;
}
C++23 defines the generic interface of accessors through the accessor
policy requirements
[mdspan.accessor.reqmts].
Adding detectably_invalid
to these requirements would be a
breaking change to C++23. Thus, generic code that wanted to call this
function would need to fill in default behavior for both Standard
accessors defined in C++23, and user-defined accessors that comply with
the C++23 accessor requirements. The following
detectably_invalid
nonmember function (not proposed in this
paper) shows one way users could do that. Please see Appendix A below
for the full source code of a demonstration, along with a Compiler
Explorer link. This demonstration shows that breaking backwards
compatibility with C++23 is unnecessary, because users can
straightforwardly work around the lack of a
detectably_invalid
member function in C++23 - compliant
accessors. Not standardizing this nonmember function work-around would
also give users the freedom to fill in different default behavior. For
example, some users may prefer to consider every (data handle, size)
pair invalid unless proven otherwise, as a way to force use of custom
accessors that have the ability to make accurate checks.
template<class Accessor>
concept has_detectably_invalid = requires(Accessor acc) {
typename Accessor::data_handle_type;
{ std::as_const(acc).detectably_invalid(
::declval<typename Accessor::data_handle_type>(),
std::declval<std::size_t>()
std) } noexcept -> std::same_as<bool>;
};
template<class Accessor>
bool detectably_invalid(Accessor&& accessor,
typename std::remove_cvref_t<Accessor>::data_handle_type handle,
::size_t size)
std{
if constexpr (has_detectably_invalid<std::remove_cvref_t<Accessor>>) {
return std::as_const(accessor).detectably_invalid(handle, size);
}
else {
return false;
}
}
is_sufficiently_aligned
and
detectably_invalid
One could argue that if aligned_accessor
had
detectably_invalid
, that would make
is_sufficiently_aligned
unnecessary. We disagree; we think
is_sufficiently_aligned
is useful by itself, whether or not
detectably_invalid
exists, for the following reasons.
Users will often want to check alignment separately from pointer range validity.
Checking alignment may be much less expensive than checking pointer range validity.
Regarding (1), we think the most common use case for
aligned_accessor
’s explicit
converting
constructor from default_accessor
would be explicit
construction of an mdspan
with
aligned_accessor
from an mdspan
with
default_accessor
. The latter exists, so the user has
already asserted that the range formed by its data handle and
required_span_size()
is valid. Thus, the only thing the
user would need to check would be whether the data handle is
sufficiently aligned.
The same LEWG reviewer who suggested detectably_invalid
had originally thought it would make
is_sufficiently_aligned
unnecessary. However, after
reviewing R2 of this paper, that reviewer changed their mind. They now
agree with us that is_sufficiently_aligned
is useful by
itself. All their concerns would be addressed by making
is_sufficiently_aligned
a nonmember function, rather than a
member function of aligned_accessor
.
is_sufficiently_aligned
The reviewer responded to our argument above by suggesting that we
remove is_sufficiently_aligned
from
aligned_accessor
and make it a separate nonmember function.
This is a reasonable alternative suggestion. We think that LWG review of
aligned_accessor
can proceed concurrently while LEWG
considers this alternative.
Into which header should this new function go? We think it belongs in
<bit>
, because it is fundamentally a bit arithmetic
operation. There should be no reason why that could not be freestanding,
like everything else in the <bit>
header. Another
option would be to put it in <memory>
right after
assume_aligned
, and to mark it as freestanding (since
assume_aligned
is).
The only other thing an accessor’s user might want to check besides a
(data handle, size) pair would be converting construction from another
type of accessor. All mdspan
components –
extents
, layout mappings, and accessors – implement
conversions with preconditions via explicit
constructors.
(For more detail, please see the section below, “Explicit conversions as
the model for precondition-asserting conversions.”) Accessors do
not store their data handles, so the only reason to check
whether converting construction is valid would be if the input or result
accessor has separate run-time state. (Otherwise, the check could be a
constraint or static_assert
.) It’s rare for an accessor to
need run-time state, so we don’t expect to need this feature in generic
code. It would also be a separable addition from the feature of checking
a data handle and size. Nevertheless, one could consider a design. We
would favor just overloading detectably_invalid
for
accessors, as there would be no risk of ambiguity. Converting
constructors only take one argument, so there would be no ambiguity
between calling detectably_invalid
with an accessor and
calling it with a data handle and size.
The function describes a property: “this (data handle, size) pair
is not known to be invalid.” It’s an adjective (like
“valid
” or “is_valid
”), not a verb (like
“check” as in “check_valid
”).
The function does not promise perfect accuracy. In the common case, it says whether it can detect whether the handle and size are _in_valid. Whether or not they are valid might be harder to say.
As discussed above, users may also want to check converting constructors from other accessor types. However, there would be no risk of ambiguity between that and checking a data handle and size. Therefore, there’s no need for the function’s name to include the type of the thing being checked (e.g., “range”).
Specifically, the function should not contain the word “pointer,”
because a data handle is not necessarily a pointer. Even if
data_handle_type
is a pointer type, a data handle might not
necessarily be a pointer to the elements in the Standard C++ sense. For
example, it might be some opaque handle that a library represents as a
type alias of void*
.
These points together suggest the name
detectably_invalid
.
Adding detectably_invalid
to the accessor
requirements and existing Standard accessors in C++26 would be a
breaking change to C++23. Nevertheless, users could write code that
fills in reasonable behavior for C++23 accessors.
Few C++ implementations offer a way to check validity of a
pointer range. Thus, users would experience
detectably_invalid
as mostly not useful for the common case
of default_accessor
and other accessors that access a
pointer range.
Item (1) reduces the urgency of this suggestion for C++26. Item
(2) reduces its potential to improve the mdspan
user
experience in a practical way. Therefore, we do not suggest adding
detectably_invalid
to the accessor requirements in this
proposal. However, we do not discourage further work in separate
proposals.
We would like LEWG to consider the alternative design that
removes is_sufficiently_aligned
from
aligned_accessor
and adds it to the C++ Standard Library as
a separate nonmember function. We think LWG review of
aligned_accessor
can proceed concurrently.
During the June 2024 St. Louis WG21 meeting, one LEWG reviewer asked
about the explicit
constructor from
default_accessor
. This constructor lets users assert that a
pointer has sufficient alignment to be accessed by the
aligned_accessor
. The reviewer argued that this was an
“unsafe” conversion, and wanted these “unsafe” conversions to be even
more explicit than an explicit
constructor: e.g., a new
*_cast
function template. We do not agree with this idea;
this section explains why.
aligned_accessor
Suppose that some function that users can’t change returns an
mdspan
of float
with
default_accessor
, even though users know that the
mdspan
is overaligned to 8 * sizeof(float)
bytes. The function’s parameter(s) don’t matter for this example.
<float, dims<1>, layout_right, default_accessor<float>>
mdspan(SomeParameters params); overaligned_view
Suppose also that users want to call some other function that they
can’t change. This function takes an mdspan
of
float
with aligned_accessor<float, 8>
.
Its return type doesn’t matter for this example.
(
SomeReturnType use_overaligned_view<float, dims<1>, layout_right, aligned_accessor<float, 8>>); mdspan
How do users call use_overaligned_view
with the object
returned from overaligned_view
? The status quo offers two
ways. Both of them rely on
aligned_accessor<float, 8>
’s explicit
converting constructor from
default_accessor<float>
.
Use mdspan
’s explicit
converting
constructor.
Construct the new mdspan
explicitly from its data
handle, layout mapping, and accessor. (This is the ideal use case for
CTAD, as an mdspan
is nothing more than its data handle,
layout mapping, and accessor.)
Way (1) looks like this.
auto x = overaligned_view(params);
auto result = use_overaligned_view(
<float, dims<1>, layout_right,
mdspan<float, 8>>(x)
aligned_accessor);
Way (2) looks like this. Note use of CTAD.
auto x = overaligned_view(params);
auto result = use_overaligned_view(
{x.data_handle(), x.mapping(),
mdspan<float, 8>>(x.accessor())}
aligned_accessor);
Which way is less verbose depends on mdspan
’s template
arguments. Both ways, though, force the user to name the type
aligned_accessor<float, 8>
explicitly. Users know
that they have pulled out a sharp knife from the toolbox. It’s verbose,
it’s nondefault, and it’s a class with a short definition. Users can go
to the specification, see assume_aligned
, and know they are
dealing with a low-level function that has a precondition.
mdspan
uses
explicit conversions to assert preconditionsThe entire system of mdspan
components was designed so
that
conversions with preconditions happen through
explicit
conversions (mostly converting constructors);
while
conversions without preconditions happen through implicit conversions.
Changing this would break backwards compatibility with C++23. For example, one can see this with converting constructors for
extents
(for conversions from run-time to
compile-time extents, or conversions from wider to narrower index type):
[mdspan.extents.cons];
and
layout_left::mapping
, and all the other layout
mappings currently in the Standard that are not
layout_stride
or layout_transpose
(for
conversions from e.g., layout_stride::mapping
, which assert
that the strides are compatible): e.g.,
[mdspan.layout.left.cons].
This is consistent with C++ Standard Library class templates, in that
construction asserts any preconditions. For example, if users construct
a string_view
or span
from a pointer
ptr
and a size size
, this asserts that the
range [ ptr
,
ptr + size
) is
accessible.
naughty_cast
Everything we have described above is the status quo. What did the
one LEWG reviewer want to see? They wanted all conversions with
preconditions to use a “cast” function with an easily searchable name,
analogous to static_cast
. As a placeholder, we’ll call it
“naughty_cast
.” For the above
use_overaligned_view
example, the naughty_cast
analog of Way (2) would look like this.
auto x = overaligned_view(params);
auto result = use_overaligned_view(
{x.data_handle(), x.mapping(),
mdspan<aligned_accessor<float, 8>>>(x.accessor())}
naughty_cast);
One could imagine defining naughty_cast
of
mdspan
by naughty_cast
of its components. This
would enable an analog of Way (1).
auto x = overaligned_view(params);
auto result = use_overaligned_view(naughty_cast<
<float, dims<1>, layout_right,
mdspan<float, 8>>>(x)
aligned_accessor);
Another argument for naughty_cast
besides searchability
is to make conversions with preconditions “loud,” that is, easily seen
in the code by human developers. However, the original Way (1) and Way
(2) both are loud already in that they require a lot of extra code that
spells out the result’s accessor type explicitly. The status quo’s
difference in “volume” is implicit conversion
auto result = use_overaligned_view(x);
versus explicit construction.
auto result = use_overaligned_view(
{x.data_handle(), x.mapping(),
mdspan<float, 8>(x)});
aligned_accessor);
Adding naughty_cast
to the latter doesn’t make it much
louder.
auto result = use_overaligned_view(
{x.data_handle(), x.mapping(),
mdspan<aligned_accessor<float, 8>>(x)});
naughty_cast);
There are other disadvantages to a naughty_cast
design.
The point of that design would be to remove or make
non-public
all the explicit
constructors from
mdspan
’s components. That functionality would need to move
somewhere. A typical implementation technique for a custom cast function
is to rely on specializations of a struct with two template parameters,
one for the input type and one for the output type of the cast. The
naughty_caster
struct example below shows how one could do
that.
template<class Output, class Input>
struct naughty_caster {};
template<class Output, class Input>
(const Input& input) {
Output naughty_castreturn naughty_caster<Output, Input>::cast(input);
}
template<class OutputElementType, size_t ByteAlignment,
class InputElementType>
requires (is_convertible_v<InputElementType(*)[],
(*)[]>)
OutputElementTypestruct naughty_caster {
using output_type =
<OutputElementType, ByteAlignment>;
aligned_accessorusing input_type = default_accessor<InputElementType>;
static output_type cast(const input_type&) {
return {};
}
};
This technique takes a lot of effort and code, when by far the common
case is that cast
has a trivial body. For any accessors
with state, it would almost certainly call for breaks of encapsulation,
like making the naughty_caster
specialization a
friend
of the input and/or output.
We emphasize that users are meant to write custom accessors. The
intended typical author of a custom accessor is a performance expert who
is not necessarily a C++ expert. It takes quite a bit of C++ experience
to learn how to use encapsulation-breaking techniques safely; the lazy
approaches all just expose implementation details or defeat the “safety”
that naughty_cast
is supposed to introduce. Given that the
main motivation of naughty_cast
is safety, we shouldn’t
make it harder for users to write safe code.
More importantly, naughty_cast
would obfuscate
accessors. The architects of mdspan
meant accessors to have
to have a small number of “moving parts” and to define all those parts
in a single place. Contrast default_accessor
with the
contiguous iterator requirements, for instance. The
naughty_cast
design would force custom accessors (and
custom layouts) to define their different parts in different places,
rather than all in one class. WG21 has moved away from this scattered
design approach. For example,
P2855R1 (“Member customization
points for Senders and Receivers”) changes P2300 (std::execution) to use
member functions instead of tag_invoke
-based customization
points.
mdspan
’s current designFor all these reasons, we do not support replacing
mdspan
’s current “conversions with preconditions are
explicit conversions” design with a cast function design.
We have tested an implementation of this proposal with the reference mdspan implementation. Appendix B below lists the source code of a full implementation.
template<size_t byte_alignment>
using aligned_mdspan =
::mdspan<float, std::dims<1, int>, std::layout_right, std::aligned_accessor<float, byte_alignment>>;
std
// Interfaces that require 32-byte alignment,
// because they want to do 8-wide SIMD of float.
extern void vectorized_axpy(aligned_mdspan<32> y, float alpha, aligned_mdspan<32> x);
extern float vectorized_norm(aligned_mdspan<32> y);
// Interfaces that require 16-byte alignment,
// because they want to do 4-wide SIMD of float.
extern void fill_x(aligned_mdspan<16> x);
extern void fill_y(aligned_mdspan<16> y);
// Helper functions for making overaligned array allocations.
template<class ElementType>
struct delete_raw {
void operator()(ElementType* p) const {
::free(p);
std}
};
template<class ElementType>
using allocation = std::unique_ptr<ElementType[], delete_raw<ElementType>>;
template<class ElementType, std::size_t byte_alignment>
<ElementType> allocate_raw(const std::size_t num_elements)
allocation{
const std::size_t num_bytes = num_elements * sizeof(ElementType);
void* ptr = std::aligned_alloc(byte_alignment, num_bytes);
return {ptr, delete_raw<ElementType>{}};
}
float user_function(size_t num_elements, float alpha)
{
// Code using the above two interfaces needs to allocate to the max alignment.
// Users could also query aligned_accessor::byte_alignment
// for the various interfaces and take the max.
constexpr size_t max_byte_alignment = 32;
auto x_alloc = allocate_raw<float, max_byte_alignment>(num_elements);
auto y_alloc = allocate_raw<float, max_byte_alignment>(num_elements);
<max_byte_alignment> x(x_alloc.get());
aligned_mdspan<max_byte_alignment> y(y_alloc.get());
aligned_mdspan
(x); // automatic conversion from 32-byte aligned to 16-byte aligned
fill_x(y); // automatic conversion again
fill_y
(y, alpha, x);
vectorized_axpyreturn vectorized_norm(y);
}
Davis et al., “CheriABI: Enforcing Valid Pointer Provenance and Minimizing Pointer Privilege in the POSIX C Run-time Environment,” ASPLOS ’19, April 2019, pp. 379 - 393. Available online [last accessed 2024-07-05]: https://dl.acm.org/doi/10.1145/3297858.3304042
Watson et al., “CHERI C/C++ Programming Guide,” Technical Report UCAM-CL-TR-947, University of Cambridge Computer Laboratory, June 2020. Available online [last accessed 2024-07-05]: https://doi.org/10.48456/tr-947
detectably_invalid
, credit (with permission) to
David Sankel (Adobe), dsankel@adobe.com
Text in blockquotes is not proposed wording, but rather instructions for generating proposed wording. The � character is used to denote a placeholder section number which the editor shall determine.
In [version.syn], add
#define __cpp_lib_aligned_accessor YYYYMML // also in <mdspan>
Adjust the placeholder value as needed so as to denote this proposal’s date of adoption.
To the Header
<mdspan>
synopsis [mdspan.syn], afterclass default_accessor
and beforeclass mdspan
, add the following.
// [mdspan.accessor.aligned], class template aligned_accessor
template<class ElementType, size_t byte_alignment>
class aligned_accessor;
At the end of [mdspan.accessor.default] and before [mdspan.mdspan], add all the material that follows.
� Class template aligned_accessor
[mdspan.accessor.aligned]
�.1 Overview [mdspan.accessor.aligned.overview]
template<class ElementType, size_t the_byte_alignment>
struct aligned_accessor {
using offset_policy = default_accessor<ElementType>;
using element_type = ElementType;
using reference = ElementType&;
using data_handle_type = ElementType*;
static constexpr size_t byte_alignment = the_byte_alignment;
constexpr aligned_accessor() noexcept = default;
template<class OtherElementType, size_t other_byte_alignment>
constexpr aligned_accessor(
<OtherElementType, other_byte_alignment>) noexcept;
aligned_accessor
template<class OtherElementType>
explicit constexpr aligned_accessor(
<OtherElementType>) noexcept;
default_accessor
constexpr operator default_accessor<element_type>() const {
return {};
}
constexpr reference access(data_handle_type p, size_t i) const noexcept;
constexpr typename offset_policy::data_handle_type
(data_handle_type p, size_t i) const noexcept;
offset
static bool is_sufficiently_aligned(data_handle_type p);
};
1 Mandates:
2
aligned_accessor
meets the accessor policy
requirements.
3
ElementType
is required to be a complete object type that
is neither an abstract class type nor an array type.
4
Each specialization of aligned_accessor
is a trivially
copyable type that models semiregular
.
5
[0, n) is an accessible range
for an object p
of type data_handle_type
and
an object of type aligned_accessor
if and only if [p
, p
+ n) is a valid range.
template<class OtherElementType, size_t other_byte_alignment>
constexpr aligned_accessor(
<OtherElementType, other_byte_alignment>) noexcept {} aligned_accessor
1
Constraints:
is_convertible_v<OtherElementType(*)[], element_type(*)[]>
is true
.
2
Mandates:
gcd(other_byte_alignment, byte_alignment) == byte_alignment
is true
.
template<class OtherElementType>
explicit constexpr aligned_accessor(
<OtherElementType>) noexcept {}; default_accessor
3
Constraints:
is_convertible_v<OtherElementType(*)[], element_type(*)[]>
is true
.
constexpr reference access(data_handle_type p, size_t i) const noexcept;
4
Preconditions: p
points to an object
X
of a type similar ([conv.qual]) to
element_type
, where X
has alignment
byte_alignment
([basic.align]).
5
Effects: Equivalent to:
return assume_aligned<byte_alignment>(p)[i];
constexpr typename offset_policy::data_handle_type
(data_handle_type p, size_t i) const noexcept; offset
6
Preconditions: p
points to an object
X
of a type similar ([conv.qual]) to
element_type
, where X
has alignment
byte_alignment
([basic.align]).
7
Effects: Equivalent to: return p + i;
static bool is_sufficiently_aligned(data_handle_type p);
8
Preconditions: p
points to an object
X
of a type similar ([conv.qual]) to
element_type
.
9
Returns: true
if X
has alignment at
least byte_alignment
, else false
.
[Example: The following function compute
uses
is_sufficiently_aligned
to check whether a given
mdspan
with default_accessor
has a data handle
with sufficient alignment to be used with
aligned_accessor<float, 4 * sizeof(float)>
. If so,
the function dispatches to a function
compute_using_fourfold_overalignment
that requires fourfold
overalignment of arrays, but can therefore use hardware-specific
instructions, such as four-wide SIMD (Single Instruction Multiple Data)
instructions. Otherwise, compute
dispatches to a possibly
less optimized function
compute_without_requiring_overalignment
that has no
overalignment requirement.
extern void
(
compute_using_fourfold_overalignment::mdspan<float, std::dims<1>, std::layout_right,
std::aligned_accessor<float, 4 * alignof(float)>> x);
std
extern void
(
compute_without_requiring_overalignment::mdspan<float, std::dims<1>> x);
std
void compute(std::mdspan<float, std::dims<1>> x)
{
auto accessor = aligned_accessor<float, 4 * sizeof(float)>{};
auto x_handle = x.data_handle();
if (accessor.is_sufficiently_aligned(x_handle)) {
(mdspan{x_handle, x.mapping(), accessor});
compute_using_fourfold_overalignment}
else {
(x);
compute_without_requiring_overalignment}
}
–end example]
[Example: The following example shows how users can fulfill
the preconditions of aligned_accessor
by using existing C++
Standard Library functionality to create overaligned allocations. First,
the allocate_overaligned
helper function uses
aligned_alloc
to create an overaligned allocation.
template<class ElementType>
struct delete_with_free {
void operator()(ElementType* p) const {
::free(p);
std}
};
template<class ElementType>
using allocation = std::unique_ptr<ElementType[], delete_with_free<ElementType>>;
template<class ElementType, size_t byte_alignment>
<ElementType> allocate_overaligned(const size_t num_elements)
allocation{
const size_t num_bytes = num_elements * sizeof(ElementType);
void* ptr = std::aligned_alloc(byte_alignment, num_bytes);
return {ptr, delete_with_free<ElementType>{}};
}
Second, this example presumes that some third-party library provides
functions requiring arrays of float
to have 32-byte
alignment.
template<size_t byte_alignment>
using aligned_mdspan = std::mdspan<float, std::dims<1, int>,
::layout_right, std::aligned_accessor<float, byte_alignment>>;
std
extern void vectorized_axpy(aligned_mdspan<32> y, float alpha, aligned_mdspan<32> x);
extern float vectorized_norm(aligned_mdspan<32> y);
Third and finally, the user’s function user_function
would begin by allocating “raw” overaligned arrays with
allocate_overaligned
. It would then create aligned
mdspan
with them, and pass the resulting
mdspan
into the third-party library’s functions.
float user_function(size_t num_elements, float alpha)
{
constexpr size_t max_byte_alignment = 32;
auto x_alloc = allocate_overaligned<float, max_byte_alignment>(num_elements);
auto y_alloc = allocate_overaligned<float, max_byte_alignment>(num_elements);
<max_byte_alignment> x(x_alloc.get());
aligned_mdspan<max_byte_alignment> y(y_alloc.get());
aligned_mdspan
// ... fill the elements of x and y ...
(y, alpha, x);
vectorized_axpyreturn vectorized_norm(y);
}
–end example]
is_sufficiently_aligned
designText in blockquotes is not proposed wording, but rather instructions for generating proposed wording. The instructions here are expressed as a “diff” atop P2897R2, that is, as if the wording proposed in P2897R2 had already been applied to the Working Draft. The � character is used to denote a placeholder section number which the editor shall determine.
From the
aligned_accessor
synopsis in [mdspan.accessor.aligned.overview], remove the member functionstatic bool is_sufficiently_aligned(data_handle_type p);
.
From the list of members of
aligned_accessor
in [mdspan.accessor.aligned.members], removestatic bool is_sufficiently_aligned(data_handle_type p);
from after line 7 (“Effects: Equivalent to:return p + i;
”), and remove line 8 (“Preconditions: …”) and line 9 (“Returns: …”). (Retain the Example usingis_sufficiently_aligned
that immediately follows the deleted lines.)
To the Header
<bit>
synopsis [bit.syn], afterenum class endian
and before the}
that closesnamespace std
, add the following.
// [bit.aligned], is_sufficiently_aligned
template<class ElementType, size_t byte_alignment>
bool is_sufficiently_aligned(ElementType*);
After [bit.endian], add a new section [bit.aligned] and put in it all the material that follows.
template<class ElementType, size_t byte_alignment>
bool is_sufficiently_aligned(ElementType* p);
1
Preconditions: p
points to an object
X
of a type similar ([conv.qual]) to
ElementType
.
2
Returns: true
if X
has alignment at
least byte_alignment
, else false
.
detectably_invalid
nonmember function exampleThis section is nonnormative. This is the full source code with tests
for the detectably_invalid
nonmember function example
above. Please see this
Compiler Explorer link for
a test with five different compilers: GCC 14.1, Clang 18.1.0, MSVC
v19.40 (VS17.10), and nvc++ 24.5.
#include <cassert>
#include <concepts>
#include <cstdint>
#include <iostream>
#include <stdexcept>
#include <type_traits>
#include <utility>
template<class Accessor>
concept has_detectably_invalid = requires(Accessor acc) {
typename Accessor::data_handle_type;
{ std::as_const(acc).detectably_invalid(
::declval<typename Accessor::data_handle_type>(),
std::declval<std::size_t>()
std) } noexcept -> std::convertible_to<bool>;
};
template<class Accessor>
bool detectably_invalid(Accessor&& accessor,
typename std::remove_cvref_t<Accessor>::data_handle_type handle,
::size_t size)
std{
if constexpr (has_detectably_invalid<std::remove_cvref_t<Accessor>>) {
return std::as_const(accessor).detectably_invalid(handle, size);
}
else {
return false;
}
}
struct A {
using data_handle_type = float*;
static bool detectably_invalid(data_handle_type ptr, std::size_t size) noexcept {
return ptr == nullptr && size != 0;
}
};
struct B {
using data_handle_type = float*;
};
struct C {
using data_handle_type = float*;
// This is nonconst, so it's not actually called.
bool detectably_invalid(data_handle_type ptr, std::size_t size) {
throw std::runtime_error("C::detectably_invalid: uh oh");
}
};
struct D {
using data_handle_type = float*;
// This is const but not noexcept, so it's not actually called.
bool detectably_invalid(data_handle_type ptr, std::size_t size) const {
throw std::runtime_error("D::detectably_invalid: uh oh");
}
};
int main()
{
float* ptr = nullptr;
assert(not detectably_invalid(A{}, ptr, 0));
assert(detectably_invalid(A{}, ptr, 1));
{};
A aassert(not detectably_invalid(a, ptr, 0));
assert(detectably_invalid(a, ptr, 1));
const A a_c{};
assert(not detectably_invalid(a_c, ptr, 0));
assert(detectably_invalid(a_c, ptr, 1));
assert(not detectably_invalid(B{}, ptr, 0));
assert(not detectably_invalid(B{}, ptr, 1));
// B doesn't know how to check pointer validity.
assert(not detectably_invalid(B{}, ptr, 0));
assert(not detectably_invalid(B{}, ptr, 1));
{};
B bassert(not detectably_invalid(b, ptr, 0));
assert(not detectably_invalid(b, ptr, 1));
const B b_c{};
assert(not detectably_invalid(b_c, ptr, 0));
assert(not detectably_invalid(b_c, ptr, 1));
// If users make detectably_invalid nonconst or not noexcept,
// the nonmember function falls back to a default implementation.
try {
assert(not detectably_invalid(C{}, ptr, 0));
assert(not detectably_invalid(C{}, ptr, 1));
}
catch (const std::runtime_error& e) {
::cerr << "C{} threw runtime_error: " << e.what() << "\n";
std}
try {
const C c_c{};
assert(not detectably_invalid(c_c, ptr, 0));
assert(not detectably_invalid(c_c, ptr, 1));
}
catch (const std::runtime_error& e) {
::cerr << "const C threw runtime_error: " << e.what() << "\n";
std}
try {
{};
C cassert(not detectably_invalid(c, ptr, 0));
assert(not detectably_invalid(c, ptr, 1));
}
catch (const std::runtime_error& e) {
::cerr << "nonconst C threw runtime_error: " << e.what() << "\n";
std}
try {
assert(not detectably_invalid(D{}, ptr, 0));
assert(not detectably_invalid(D{}, ptr, 1));
}
catch (const std::runtime_error& e) {
::cerr << "D{} threw runtime_error: " << e.what() << "\n";
std}
try {
const D d_c{};
assert(not detectably_invalid(d_c, ptr, 0));
assert(not detectably_invalid(d_c, ptr, 1));
}
catch (const std::runtime_error& e) {
::cerr << "const D threw runtime_error: " << e.what() << "\n";
std}
try {
{};
D dassert(not detectably_invalid(d, ptr, 0));
assert(not detectably_invalid(d, ptr, 1));
}
catch (const std::runtime_error& e) {
::cerr << "nonconst D threw runtime_error: " << e.what() << "\n";
std}
::cerr << "Made it to the end\n";
stdreturn 0;
}
This Compiler Explorer
link gives a full implementation of aligned_accessor
and a demonstration. We show the full source code from that link here
below.
#include <https://raw.githubusercontent.com/kokkos/mdspan/single-header/mdspan.hpp>
#include <bit>
#include <cassert>
#include <cmath>
#if defined(_MSC_VER)
# include <cstdlib> // MSVC's _aligned_malloc
#endif
#include <exception>
#include <functional>
#include <memory>
#include <numeric>
#include <type_traits>
namespace stdex = std::experimental;
// P2389 (voted into C++ at June 2024 STL plenary)
namespace std {
template<size_t Rank, class IndexType = size_t>
using dims = dextents<IndexType, Rank>;
} // namespace std
namespace {
template<class ElementType, std::size_t byte_alignment>
class aligned_accessor {
public:
static_assert(std::has_single_bit(byte_alignment),
"byte_alignment must be a power of two.");
static_assert(byte_alignment >= alignof(ElementType),
"Insufficient byte alignment for ElementType");
using offset_policy = stdex::default_accessor<ElementType>;
using element_type = ElementType;
using reference = ElementType&;
using data_handle_type = ElementType*;
constexpr aligned_accessor() noexcept = default;
template<
class OtherElementType,
::size_t other_byte_alignment>
stdrequires(std::is_convertible_v<
(*)[], element_type(*)[]>)
OtherElementTypeconstexpr aligned_accessor(
<OtherElementType, other_byte_alignment>)
aligned_accessornoexcept
{
constexpr size_t the_gcd =
::gcd(other_byte_alignment, byte_alignment);
stdstatic_assert(the_gcd == byte_alignment);
}
template<class OtherElementType>
requires(std::is_convertible_v<
(*)[], element_type(*)[]>)
OtherElementTypeconstexpr explicit aligned_accessor(
::default_accessor<OtherElementType>) noexcept
stdex{}
constexpr
operator stdex::default_accessor<element_type>() const
{
return {};
}
constexpr reference
(data_handle_type p, size_t i) const noexcept
access{
return std::assume_aligned<byte_alignment>(p)[i];
}
constexpr typename offset_policy::data_handle_type
(data_handle_type p, size_t i) const noexcept {
offsetreturn p + i;
}
static bool
(data_handle_type p) noexcept
is_sufficiently_aligned{
return alignof(ElementType) >= byte_alignment &&
::bit_cast<uintptr_t>(p) % byte_alignment == 0;
std}
};
template<size_t byte_alignment>
using aligned_mdspan =
::mdspan<float, std::dims<1, int>, std::layout_right,
std<float, byte_alignment>>;
aligned_accessor
// Interfaces that require 32-byte alignment,
// because they want to do 8-wide SIMD of float.
void
(aligned_mdspan<32> y,
vectorized_axpbyfloat alpha, aligned_mdspan<32> x, float beta)
{
assert(x.extent(0) == y.extent(0));
for (int k = 0; k < x.extent(0); ++k) {
[k] = beta * y[k] + alpha * x[k];
y}
}
// 1-norm of the vector y
float vectorized_norm(aligned_mdspan<32> y)
{
float one_norm = 0.0f;
for (int k = 0; k < y.extent(0); ++k) {
+= std::fabs(y[k]);
one_norm }
return one_norm;
}
// Interfaces that require 16-byte alignment,
// because they want to do 4-wide SIMD of float.
void fill_x(aligned_mdspan<16> x) {
for (int k = 0; k < x.extent(0); ++k) {
[k] = static_cast<float>(k + 2);
x}
}
void fill_y(aligned_mdspan<16> y) {
for (int k = 0; k < y.extent(0); ++k) {
[k] = static_cast<float>(k - 1);
y}
}
// Helper functions for making overaligned array allocations.
template<class ElementType>
struct delete_raw {
void operator()(ElementType* p) const {
::free(p);
std}
};
template<class ElementType>
using allocation =
::unique_ptr<ElementType[], delete_raw<ElementType>>;
std
template<class ElementType, std::size_t byte_alignment>
<ElementType> allocate_raw(const std::size_t num_elements)
allocation{
const std::size_t num_bytes = num_elements * sizeof(ElementType);
float* ptr = reinterpret_cast<float*>(
#if defined(_MSC_VER)
(byte_alignment, num_bytes)
_aligned_malloc#else
::aligned_alloc(byte_alignment, num_bytes)
std#endif
);
return {ptr, delete_raw<ElementType>{}};
}
float user_function(size_t num_elements, float alpha, float beta)
{
constexpr size_t max_byte_alignment = 32;
// unique_ptr<float[], deleter_something>
auto x_alloc = allocate_raw<float, max_byte_alignment>(num_elements);
auto y_alloc = allocate_raw<float, max_byte_alignment>(num_elements);
<max_byte_alignment> x(x_alloc.get(), num_elements);
aligned_mdspan<max_byte_alignment> y(y_alloc.get(), num_elements);
aligned_mdspan
// expect aligned_mdspan<16>, get aligned_mdspan<32>
(x); // automatic conversion from 32-byte aligned to 16-byte aligned
fill_x(y); // automatic conversion again
fill_y
// expect aligned_mdspan<32>, get aligned_mdspan<32>
(y, alpha, x, beta);
vectorized_axpbyreturn vectorized_norm(y);
}
} // namespace (anonymous)
int main(int argc, char* argv[])
{
float result = user_function(10, 1.0f, -1.0f);
// 3 + 3 + ... + 3 = 30
assert(result == 30.0f);
return 0;
}