Document Number: | |
---|---|
Date: | |
Revises: | |
Editor: | NVIDIA Corporation |
Note: this is an early draft. It’s known to be incomplet and incorrekt, and it has lots of bad formatting.
This Technical Specification describes requirements for implementations of an interface that computer programs written in the C++ programming language may use to invoke algorithms with parallel execution. The algorithms described by this Technical Specification are realizable across a broad class of computer architectures.
This Technical Specification is non-normative. Some of the functionality described by this Technical Specification may be considered for standardization in a future version of C++, but it is not currently part of any C++ standard. Some of the functionality in this Technical Specification may never be standardized, and other functionality may be standardized in a substantially changed form.
The goal of this Technical Specification is to build widespread existing practice for parallelism in the C++ standard algorithms library. It gives advice on extensions to those vendors who wish to provide them.
The following referenced document is indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 14882:2017 is herein called the C++ Standard. The library described in ISO/IEC 14882:2017 clauses 20-33 is herein called the C++ Standard Library. The C++ Standard Library components described in ISO/IEC 14882:2017 clauses 28, 29.8 and 23.10.10 are herein called the C++ Standard Algorithms Library.
Unless otherwise specified, the whole of the C++ Standard's Library
introduction (
Since the extensions described in this Technical Specification are
experimental and not part of the C++ Standard Library, they should not be
declared directly within namespace std
. Unless otherwise specified, all
components described in this Technical Specification are declared in namespace
std::experimental::parallelism_v2
.
std
.
— end note ]
Unless otherwise specified, references to such entities described in this
Technical Specification are assumed to be qualified with
std::experimental::parallelism_v2
, and references to entities described in the C++
Standard Library are assumed to be qualified with std::
.
Extensions that are expected to eventually be added to an existing header
<meow>
are provided inside the <experimental/meow>
header,
which shall include the standard contents of <meow>
as if by
#include <meow>
An implementation that provides support for this Technical Specification shall define the feature test macro(s) in Table 1.
Doc. No. | Title | Primary Section | Macro Name | Value | Header |
---|---|---|---|---|---|
P0155R0 | Task Block R5 | __cpp_lib_experimental_- parallel_task_block |
201711 |
<experimental/exception_list> <experimental/task_block> |
|
P0076R4 | Vector and Wavefront Policies | __cpp_lib_experimental_- execution_vector_policy |
201711 |
<experimental/algorithm> <experimental/execution> |
|
P0075R2 | Template Library for Parallel For Loops | __cpp_lib_experimental_- parallel_for_loop |
201711 | <experimental/algorithm> |
|
P0214R9 | Data-Parallel Vector Types & Operations | __cpp_lib_experimental_- parallel_simd
| 201803 | <experimental/simd> |
<experimental/exception_list>
synopsisnamespace std::experimental { inline namespace parallelism_v2 { class exception_list : public exception { public: using iterator = unspecified; size_t size() const noexcept; iterator begin() const noexcept; iterator end() const noexcept; const char* what() const noexcept override; }; } }
The class exception_list
owns a sequence of exception_ptr
objects.
The type exception_list::iterator
fulfills the requirements of
ForwardIterator
.
size_t size() const noexcept;
exception_ptr
objects contained within the exception_list
.
iterator begin() const noexcept;
exception_ptr
object contained within the exception_list
.
iterator end() const noexcept;
const char* what() const noexcept override;
<experimental/execution>
synopsis#include <execution> namespace std::experimental { inline namespace parallelism_v2 { namespace execution {// 6.2, Unsequenced execution policy class unsequenced_policy;// 6.3, Vector execution policy class vector_policy;// 6.4, Execution policy objects inline constexpr unsequenced_policy unseq{ unspecified }; inline constexpr vector_policy vec{ unspecified }; } } }
class unsequenced_policy{ unspecified };
The class unsequenced_policy
is an execution policy type used as a unique type to disambiguate
parallel algorithm overloading and indicate that a parallel algorithm's
execution may be vectorized, e.g., executed on a single thread using
instructions that operate on multiple data items.
The invocations of element access functions in parallel algorithms invoked with an execution policy of type unsequenced_policy
are permitted to execute in an unordered fashion in the calling thread,
unsequenced with respect to one another within the calling thread.
During the execution of a parallel algorithm with the experimental::execution::unsequenced_policy
policy, if the invocation of an element access function exits via an uncaught exception, terminate()
will be called.
class vector_policy{ unspecified };
The class vector_policy
is an execution policy type used as a unique type to disambiguate
parallel algorithm overloading and indicate that a parallel algorithm's
execution may be vectorized. Additionally, such vectorization will
result in an execution that respects the sequencing constraints of
wavefront application ([parallel.alg.general.wavefront]). unsequenced_policy
, for example.
— end note ]
The invocations of element access functions in parallel algorithms invoked with an execution policy of type vector_policy
are permitted to execute in unordered fashion in the calling thread,
unsequenced with respect to one another within the calling thread,
subject to the sequencing constraints of wavefront application ([parallel.alg.general.wavefront]) for the last argument to for_loop
, for_loop_n
, for_loop_strided
, or for_loop_strided_n
.
During the execution of a parallel algorithm with the experimental::execution::vector_policy
policy, if the invocation of an element access function exits via an uncaught exception, terminate()
will be called.
inline constexpr execution::unsequenced_policy unseq{ unspecified }; inline constexpr execution::vector_policy vec{ unspecified };
The header <experimental/execution>
declares a global object associated with each type of execution policy defined by this Technical Specification.
For the purposes of this section, an evaluation is a value computation or side effect of an expression, or an execution of a statement. Initialization of a temporary object is considered a subexpression of the expression that necessitates the temporary object.
An evaluation A contains an evaluation B if:
An evaluation A is ordered before an evaluation B if A is deterministically
sequenced before B.
For an evaluation A ordered before an evaluation B, both contained in the same invocation of an element access function, A is a vertical antecedent of B if:
goto
statement or asm
declaration that jumps to a statement outside of S, orswitch
statement executed within S that transfers control into a substatement of a nested selection or iteration statement, orthrow
longjmp
.
In the following, Xi and Xj refer to evaluations of the same expression
or statement contained in the application of an element access function corresponding to the ith and
jth elements of the input sequence.
Horizontally matched is an equivalence relationship between two evaluations of the same expression. An evaluation Bi is horizontally matched with an evaluation Bj if:
Let f be a function called for each argument list in a sequence of argument lists. Wavefront application of f requires that evaluation Ai be sequenced before evaluation Bj if i < j and:
<experimental/algorithm>
synopsis#include <algorithm> namespace std::experimental { inline namespace parallelism_v2 { namespace execution {// 7.2.5, No vec template<class F> auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)());// 7.2.6, Ordered update class template<class T> class ordered_update_t;// 7.2.7, Ordered update function template template<class T> ordered_update_t<T> ordered_update(T& ref) noexcept; } // Exposition only: Suppress template argument deduction. template<class T> struct no_deduce { using type = T; }; template<class T> using no_deduce_t = typename no_deduce<T>::type;// 7.2.2, Reductions Support for reductions template<class T, class BinaryOperation> unspecified reduction(T& var, const T& identity, BinaryOperation combiner); template<class T> unspecified reduction_plus(T& var); template<class T> unspecified reduction_multiplies(T& var); template<class T> unspecified reduction_bit_and(T& var); template<class T> unspecified reduction_bit_or(T& var); template<class T> unspecified reduction_bit_xor(T& var); template<class T> unspecified reduction_min(T& var); template<class T> unspecified reduction_max(T& var);// 7.2.3, Inductions Support for inductions template<class T> unspecified induction(T&& var); template<class T, class S> unspecified induction(T&& var, S stride);// 7.2.4, For loop for_loop template<class I, class... Rest> void for_loop(no_deduce_t<I> start, I finish, Rest&&... rest); template<class ExecutionPolicy, class I, class... Rest> void for_loop(ExecutionPolicy&& exec, no_deduce_t<I> start, I finish, Rest&&... rest); template<class I, class S, class... Rest> void for_loop_strided(no_deduce_t<I> start, I finish, S stride, Rest&&... rest); template<class ExecutionPolicy, class I, class S, class... Rest> void for_loop_strided(ExecutionPolicy&& exec, no_deduce_t<I> start, I finish, S stride, Rest&&... rest); template<class I, class Size, class... Rest> void for_loop_n(I start, Size n, Rest&&... rest); template<class ExecutionPolicy, class I, class Size, class... Rest> void for_loop_n(ExecutionPolicy&& exec, I start, Size n, Rest&&... rest); template<class I, class Size, class S, class... Rest> void for_loop_n_strided(I start, Size n, S stride, Rest&&... rest); template<class ExecutionPolicy, class I, class Size, class S, class... Rest> void for_loop_n_strided(ExecutionPolicy&& exec, I start, Size n, S stride, Rest&&... rest); } }
Each of the function templates in this subclause ([parallel.alg.reductions]) returns a reduction object of unspecified type having a reduction value type and encapsulating a reduction identity value for the reduction, a combiner function object, and a live-out object from which the initial value is obtained and into which the final value is stored.
An algorithm uses reduction objects by allocating an unspecified number of instances, known as accumulators, of the reduction value
type.
Modifications to the accumulator by the application of element
access functions accrue as partial results. At some point before the
algorithm
returns, the partial results are combined, two at a time, using
the reduction object’s combiner operation until a single value remains,
which
is then assigned back to the live-out object. plus<T>
, incrementing
the accumulator would be consistent with the combiner but doubling it or assigning to it would not.
— end note ]
template<class T, class BinaryOperation>
unspecified reduction(T& var, const T& identity, BinaryOperation combiner);
CopyConstructible
and MoveAssignable
. The expression var = combiner(var, var)
shall be well-formed.T
, reduction identity identity
, combiner function object combiner
, and using the object referenced by var
as its live-out object.template<class T>
unspecified reduction_plus(T& var); template<class T>
unspecified reduction_multiplies(T& var); template<class T>
unspecified reduction_bit_and(T& var); template<class T>
unspecified reduction_bit_or(T& var); template<class T>
unspecified reduction_bit_xor(T& var); template<class T>
unspecified reduction_min(T& var); template<class T>
unspecified reduction_max(T& var);
CopyConstructible
and MoveAssignable
.T
, reduction identity and combiner operation as specified in table var
as its live-out object.Function | Reduction Identity | Combiner Operation |
---|---|---|
reduction_plus |
T() |
x + y |
reduction_multiplies |
T(1) |
x * y |
reduction_bit_and |
(~T()) |
X & y |
reduction_bit_or |
T() |
x | y |
reduction_bit_xor |
T() |
x ^ y |
reduction_min |
var |
min(x, y) |
reduction_max |
var |
max(x, y) |
y
and sets s
to the sum of the squares.
extern int n; extern float x[], y[], a; float s = 0; for_loop(execution::vec, 0, n, reduction(s, 0.0f, plus<>()), [&](int i, float& accum) { y[i] += a*x[i]; accum += y[i]*y[i]; } );— end example ]
Each of the function templates in this section return an induction object of unspecified type having an induction value type and encapsulating an initial value i of that type and, optionally, a stride.
For each element in the input range, an algorithm over input sequence S computes an induction value from an induction variable and ordinal position p within S by the formula i + p * stride if a stride was specified or i + p otherwise. This induction value is passed to the element access function.
An induction object may refer to a live-out object to hold the final value of the induction sequence. When the algorithm using the induction object completes, the live-out object is assigned the value i + n * stride, where n is the number of elements in the input range.
template<class T>
unspecified induction(T&& var); template<class T, class S>
unspecified induction(T&& var, S stride);
remove_cv_t<remove_reference_t<T>>
,
initial value var
, and (if specified) stride stride
. If T
is an lvalue reference
to non-const
type, then the object referenced by var
becomes the live-out object for the
induction object; otherwise there is no live-out object.
template<class I, class... Rest>
void for_loop(no_deduce_t<I> start, I finish, Rest&&... rest); template<class ExecutionPolicy,
class I, class... Rest>
void for_loop(ExecutionPolicy&& exec,
no_deduce_t<I> start, I finish, Rest&&... rest);
template<class I, class S, class... Rest>
void for_loop_strided(no_deduce_t<I> start, I finish,
S stride, Rest&&... rest); template<class ExecutionPolicy,
class I, class S, class... Rest>
void for_loop_strided(ExecutionPolicy&& exec,
no_deduce_t<I> start, I finish,
S stride, Rest&&... rest);
template<class I, class Size, class... Rest>
void for_loop_n(I start, Size n, Rest&&... rest); template<class ExecutionPolicy,
class I, class Size, class... Rest>
void for_loop_n(ExecutionPolicy&& exec,
I start, Size n, Rest&&... rest);
template<class I, class Size, class S, class... Rest>
void for_loop_n_strided(I start, Size n, S stride, Rest&&... rest); template<class ExecutionPolicy,
class I, class Size, class S, class... Rest>
void for_loop_n_strided(ExecutionPolicy&& exec,
I start, Size n, S stride, Rest&&... rest);
ExecutionPolicy
, I
shall be an integral type
or meet the requirements of a forward iterator type; otherwise, I
shall be an integral
type or meet the requirements of an input iterator type. Size
shall be an integral type
and n
shall be non-negative. S
shall have integral type and stride
shall have non-zero value. stride
shall be negative only if I
has integral
type or meets the requirements of a bidirectional iterator. The rest
parameter pack shall
have at least one element, comprising objects returned by invocations of reduction
([parallel.alg.reduction]) and/or induction
([parallel.alg.induction]) function templates
followed by exactly one invocable element-access function, f. For the overloads with an
ExecutionPolicy
, f shall meet the requirements of CopyConstructible
;
otherwise, f shall meet the requirements of MoveConstructible
.
rest
parameter pack. The
length of the input sequence is:
n
, if specified,
finish - start
if neither n
nor stride
is specified,
1 + (finish-start-1)/stride
if stride
is positive,
1 + (start-finish-1)/-stride
.
start
. Each subsequent element is generated by adding
stride
to the previous element, if stride
is specified, otherwise by incrementing
the previous element. advance
and distance
.
— end note ]
I
is an
iterator type, the iterators in the input sequence are not dereferenced before
being passed to f.
— end note ]
rest
parameter pack
excluding f, an additional argument is passed to each application of f as follows:
induction
, then the additional argument is the
induction value for that induction object corresponding to the position of the application of f in the input
sequence.
template<class F>
auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)());
std::forward<F>(f)()
. When invoked within an element access function
in a parallel algorithm using vector_policy
, if two calls to no_vec
are
horizontally matched within a wavefront application of an element access function over input
sequence S, then the execution of f
in the application for one element in S is
sequenced before the execution of f
in the application for a subsequent element in
S; otherwise, there is no effect on sequencing.
f
.
f
exits via an exception, then terminate
will be called, consistent
with all other potentially-throwing operations invoked with vector_policy
execution.
extern int* p; for_loop(vec, 0, n[&](int i) { y[i] +=y[i+1]; if(y[i] < 0) { no_vec([]{ *p++ = i; }); } });The updates
*p++ = i
will occur in the same order as if the policy were seq
.
— end example ]
template<class T> class ordered_update_t { T& ref_; // exposition only public: ordered_update_t(T& loc) noexcept : ref_(loc) {} ordered_update_t(const ordered_update_t&) = delete; ordered_update_t& operator=(const ordered_update_t&) = delete; template <class U> auto operator=(U rhs) const noexcept { return no_vec([&]{ return ref_ = std::move(rhs); }); } template <class U> auto operator+=(U rhs) const noexcept { return no_vec([&]{ return ref_ += std::move(rhs); }); } template <class U> auto operator-=(U rhs) const noexcept { return no_vec([&]{ return ref_ -= std::move(rhs); }); } template <class U> auto operator*=(U rhs) const noexcept { return no_vec([&]{ return ref_ *= std::move(rhs); }); } template <class U> auto operator/=(U rhs) const noexcept { return no_vec([&]{ return ref_ /= std::move(rhs); }); } template <class U> auto operator%=(U rhs) const noexcept { return no_vec([&]{ return ref_ %= std::move(rhs); }); } template <class U> auto operator>>=(U rhs) const noexcept { return no_vec([&]{ return ref_ >>= std::move(rhs); }); } template <class U> auto operator<<=(U rhs) const noexcept { return no_vec([&]{ return ref_ <<= std::move(rhs); }); } template <class U> auto operator&=(U rhs) const noexcept { return no_vec([&]{ return ref_ &= std::move(rhs); }); } template <class U> auto operator^=(U rhs) const noexcept { return no_vec([&]{ return ref_ ^= std::move(rhs); }); } template <class U> auto operator|=(U rhs) const noexcept { return no_vec([&]{ return ref_ |= std::move(rhs); }); } auto operator++() const noexcept { return no_vec([&]{ return ++ref_; }); } auto operator++(int) const noexcept { return no_vec([&]{ return ref_++; }); } auto operator--() const noexcept { return no_vec([&]{ return --ref_; }); } auto operator--(int) const noexcept { return no_vec([&]{ return ref_--; }); } };
An object of type ordered_update_t<T>
is a proxy for an object of type T
intended to be used within a parallel application of an element access function using a
policy object of type vector_policy
. Simple increments, assignments, and compound
assignments to the object are forwarded to the proxied object, but are sequenced as though
executed within a no_vec
invocation.
template<T>
ordered_update_t<T> ordered_update(T& loc) noexcept;
{ loc }
.
<experimental/task_block>
synopsisnamespace std::experimental { inline namespace parallelism_v2 { class task_cancelled_exception; class task_block; template<class F> void define_task_block(F&& f); template<class f> void define_task_block_restore_thread(F&& f); } }
task_cancelled_exception
namespace std::experimental { inline namespace parallelism_v2 { class task_cancelled_exception : public exception { public: task_cancelled_exception() noexcept; virtual const char* what() const noexcept override; }; } }
The class task_cancelled_exception
defines the type of objects thrown by
task_block::run
or task_block::wait
if they detect than an
exception is pending within the current parallel block. See
task_cancelled_exception
member function what
virtual const char* what() const noexcept
task_block
namespace std::experimental { inline namespace parallelism_v2 { class task_block { private: ~task_block(); public: task_block(const task_block&) = delete; task_block& operator=(const task_block&) = delete; void operator&() const = delete; template<class F> void run(F&& f); void wait(); }; } }
The class task_block
defines an interface for forking and joining parallel tasks. The define_task_block
and define_task_block_restore_thread
function templates create an object of type task_block
and pass a reference to that object to a user-provided function object.
An object of class task_block
cannot be constructed,
destroyed, copied, or moved except by the implementation of the task
block library. Taking the address of a task_block
object via operator&
is ill-formed. Obtaining its address by any other means (including addressof
) results in a pointer with an unspecified value; dereferencing such a pointer results in undefined behavior.
A task_block
is active if it was created by the nearest enclosingnearest enclosing task blocktask block, where task block“task block” refers to an
invocation of define_task_block
or define_task_block_restore_thread
and nearest enclosing“nearest enclosing” means the most
recent invocation that has not yet completed. Code designated for execution in another thread by means other
than the facilities in this section (e.g., using thread
or async
) are not enclosed in the task block and a
task_block
passed to (or captured by) such code is not active within that code. Performing any operation on a
task_block
that is not active results in undefined behavior.
When the argument to task_block::run
is called, no task_block
is active, not even the task_block
on which run
was called.
(The function object should not, therefore, capture a task_block
from the surrounding block.)
define_task_block([&](auto& tb) { tb.run([&]{ tb.run([] { f(); }); // Error: tb is not active within run define_task_block([&](auto& tb2) { // Define new task block tb2.run(f); ... }); }); ... });— end example ]
task_block
member function template run
template<class F> void run(F&& f);
F
shall be MoveConstructible
. DECAY_COPY(std::forward<F>(f))()
shall be a valid expression.
*this
shall be the active task_block
.
DECAY_COPY(std::forward<F>(f))()
, where DECAY_COPY(std::forward<F>(f))
is evaluated synchronously within the current thread. The call to the resulting copy of the function object is
permitted to run on an unspecified thread created by the implementation in an unordered fashion relative to
the sequence of operations following the call to run(f)
(the continuation), or indeterminately sequenced
within the same thread as the continuation. The call to run
synchronizes with the call to the function
object. The completion of the call to the function object synchronizes with the next invocation of wait
on
the same task_block
or completion of the nearest enclosing task block (i.e., the define_task_block
or
define_task_block_restore_thread
that created this task_block
).
task_cancelled_exception
, as described in run
function may return on a thread other than the one on which it was called; in such cases,
completion of the call to run
synchronizes with the continuation.
run
is ordered similarly to an ordinary function call in a single thread.
— end note ]
f
may be immediate or may be delayed until
compute resources are available. run
might or might not return before the invocation of f
completes.
task_block
member function wait
void wait();
*this
shall be the active task_block
.task_block
have completed.
task_cancelled_exception
, as described in wait
function may return on a thread other than the one on which it was called; in such cases, completion of the call to wait
synchronizes with subsequent operations.
define_task_block([&](auto& tb) { tb.run([&]{ process(a, w, x); }); // Process a[w] through a[x] if (y < x) tb.wait(); // Wait if overlap between [w,x) and [y,z) process(a, y, z); // Process a[y] through a[z] });— end example ]
define_task_block
template<class F>
void define_task_block(F&& f);
template<class F>
void define_task_block_restore_thread(F&& f);
tb
of type task_block
, the expression f(tb)
shall be well-formed
task_block
tb
and calls f(tb)
.
exception_list
, as specified in f
have finished execution.
define_task_block
function may return on a thread other than the one on which it was called
unless there are no task blocks active on entry to define_task_block
(see define_task_block
returns on a different thread,
it synchronizes with operations following the call. define_task_block
define_task_block_restore_thread
function always returns on the same thread as the one on which it was called.
f
will (directly or indirectly) call tb.run(function-object)
.
Every task_block
has an associated exception list. When the task block starts, its associated exception list is empty.
When an exception is thrown from the user-provided function object passed to define_task_block
or
define_task_block_restore_thread
, it is added to the exception list for that task block. Similarly, when
an exception is thrown from the user-provided function object passed into task_block::run
, the exception
object is added to the exception list associated with the nearest enclosing task block. In both cases, an
implementation may discard any pending tasks that have not yet been invoked. Tasks that are already in
progress are not interrupted except at a call to task_block::run
or task_block::wait
as described below.
If the implementation is able to detect that an exception has been thrown by another task within
the same nearest enclosing task block, then task_block::run
or task_block::wait
may throw
task_canceled_exception
; these instances of task_canceled_exception
are not added to the exception
list of the corresponding task block.
When a task block finishes with a non-empty exception list, the exceptions are aggregated into an exception_list
object, which is then thrown from the task block.
The order of the exceptions in the exception_list
object is unspecified.
The data-parallel library consists of data-parallel types and operations on these types. A data-parallel type consists of elements of an underlying arithmetic type, called the element type. The number of elements is a constant for each data-parallel type and called the width of that type.
Throughout this Clause, the term data-parallel type refers to all supported simd
and simd_mask
class templates. A data-parallel object is an object of data-parallel type.
An element-wise operation applies a specified operation to the elements of one or more data-parallel objects. Each such application is unsequenced with respect to the others. A unary element-wise operation is an element-wise operation that applies a unary operation to each element of a data-parallel object. A binary element-wise operation is an element-wise operation that applies a binary operation to corresponding elements of two data-parallel objects.
Throughout this Clause, the set of vectorizable types for a data-parallel type comprises all cv-unqualified arithmetic types other than bool
.
<experimental/simd>
synopsisnamespace std::experimental { inline namespace parallelism_v2 { namespace simd_abi { struct scalar {}; template<int N> struct fixed_size {}; template<class T> inline constexpr int max_fixed_size = implementation-defined; template<class T> using compatible = implementation-defined; template<class T> using native = implementation-defined; template<class T, size_t N> struct deduce { using type = see below; }; template<class T, size_t N> using deduce_t = typename deduce<T, N>::type; } struct element_aligned_tag {}; struct vector_aligned_tag {}; template<size_t> struct overaligned_tag {}; inline constexpr element_aligned_tag element_aligned{}; inline constexpr vector_aligned_tag vector_aligned{}; template<size_t N> inline constexpr overaligned_tag<N> overaligned{};// 9.2.2, simd type traits template<class T> struct is_abi_tag; template<class T> inline constexpr bool is_abi_tag_v = is_abi_tag<T>::value; template<class T> struct is_simd; template<class T> inline constexpr bool is_simd_v = is_simd<T>::value; template<class T> struct is_simd_mask; template<class T> inline constexpr bool is_simd_mask_v = is_simd_mask<T>::value; template<class T> struct is_simd_flag_type; template<class T> inline constexpr bool is_simd_flag_type_v = is_simd_flag_type<T>::value; template<class T, class Abi = simd_abi::compatible<T>> struct simd_size; template<class T, class Abi = simd_abi::compatible<T>> inline constexpr size_t simd_size_v = simd_size<T,Abi>::value; template<class T, class U = typename T::value_type> struct memory_alignment; template<class T, class U = typename T::value_type> inline constexpr size_t memory_alignment_v = memory_alignment<T,U>::value;// 9.3, Class template simd template<class T, class Abi = simd_abi::compatible<T>> class simd; template<class T> using native_simd = simd<T, simd_abi::native<T>>; template<class T, int N> using fixed_size_simd = simd<T, simd_abi::fixed_size<N>>;// 9.5, Class template simd_mask template<class T, class Abi = simd_abi::compatible<T>> class simd_mask; template<class T> using native_simd_mask = simd_mask<T, simd_abi::native<T>>; template<class T, int N> using fixed_size_simd_mask = simd_mask<T, simd_abi::fixed_size<N>>;// 9.4.5, Casts template<class T, class U, class Abi> see below simd_cast(const simd<U, Abi>&); template<class T, class U, class Abi> see below static_simd_cast(const simd<U, Abi>&); template<class T, class Abi> fixed_size_simd<T, simd_size_v<T, Abi>> to_fixed_size(const simd<T, Abi>&) noexcept; template<class T, class Abi> fixed_size_simd_mask<T, simd_size_v<T, Abi>> to_fixed_size(const simd_mask<T, Abi>&) noexcept; template<class T, int N> native_simd<T> to_native(const fixed_size_simd<T, N>&) noexcept; template<class T, int N> native_simd_mask<T> to_native(const fixed_size_simd_mask<T, N>&) noexcept; template<class T, int N> simd<T> to_compatible(const fixed_size_simd<T, N>&) noexcept; template<class T, int N> simd_mask<T> to_compatible(const fixed_size_simd_mask<T, N>&) noexcept; template<size_t... Sizes, class T, class Abi> tuple<simd<T, simd_abi::deduce_t<T, Sizes>>...> split(const simd<T, Abi>&); template<size_t... Sizes, class T, class Abi> tuple<simd_mask<T, simd_mask_abi::deduce_t<T, Sizes>>...> split(const simd_mask<T, Abi>&); template<class V, class Abi> array<V, simd_size_v<typename V::value_type, Abi> / V::size()> split(const simd<typename V::value_type, Abi>&); template<class V, class Abi> array<V, simd_size_v<typename V::value_type, Abi> / V::size()> split(const simd_mask<typename V::value_type, Abi>&); template<class T, class... Abis> simd<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>> concat(const simd<T, Abis>&...); template<class T, class... Abis> simd_mask<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>> concat(const simd_mask<T, Abis>&...);// 9.6.4, Reductions template<class T, class Abi> bool all_of(const simd_mask<T, Abi>&) noexcept; template<class T, class Abi> bool any_of(const simd_mask<T, Abi>&) noexcept; template<class T, class Abi> bool none_of(const simd_mask<T, Abi>&) noexcept; template<class T, class Abi> bool some_of(const simd_mask<T, Abi>&) noexcept; template<class T, class Abi> int popcount(const simd_mask<T, Abi>&) noexcept; template<class T, class Abi> int find_first_set(const simd_mask<T, Abi>&); template<class T, class Abi> int find_last_set(const simd_mask<T, Abi>&); bool all_of(see below) noexcept; bool any_of(see below) noexcept; bool none_of(see below) noexcept; bool some_of(see below) noexcept; int popcount(see below) noexcept; int find_first_set(see below) noexcept; int find_last_set(see below) noexcept;// 9.2.3, Class templates const_where_expression and where_expression template<class M, class T> class const_where_expression; template<class M, class T> class where_expression;// 9.6.5, Where functions template<class T> struct nodeduce { using type = T; }; // exposition only template<class T> using nodeduce_t = typename nodeduce<T>::type; // exposition only template<class T, class Abi> where_expression<simd_mask<T, Abi>, simd<T, Abi>> where(const typename simd<T, Abi>::mask_type&, simd<T, Abi>&) noexcept; template<class T, class Abi> const_where_expression<simd_mask<T, Abi>, simd<T, Abi>> where(const typename simd<T, Abi>::mask_type&, const simd<T, Abi>&) noexcept; template<class T, class Abi> where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>> where(const nodeduce_t<simd_mask<T, Abit>>&, simd_mask<T, Abi>&) noexcept; template<class T, class Abi> const_where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>> where(const nodeduce_t<simd_mask<T, Abit>>&, const simd_mask<T, Abi>&) noexcept; template<class T> where_expression<bool, T> where(see below k, T& d) noexcept; template<class T> const_where_expression<bool, T> where(see below k, const T& d) noexcept;// 9.4.4, Reductions template<class T, class Abi, class BinaryOperation = plus<>> T reduce(const simd<T, Abi>&, BinaryOperation = {}); template<class M, class V, class BinaryOperation> typename V::value_type reduce(const const_where_expression<M, V>& x, typename V::value_type identity_element, BinaryOperation binary_op); template<class M, class V> typename V::value_type reduce(const const_where_expression<M, V>& x, plus<> binary_op = {}); template<class M, class V> typename V::value_type reduce(const const_where_expression<M, V>& x, multiplies<> binary_op); template<class M, class V> typename V::value_type reduce(const const_where_expression<M, V>& x, bit_and<> binary_op); template<class M, class V> typename V::value_type reduce(const const_where_expression<M, V>& x, bit_or<> binary_op); template<class M, class V> typename V::value_type reduce(const const_where_expression<M, V>& x, bit_xor<> binary_op); template<class T, class Abi> T hmin(const simd<T, abi>&); template<class T, class Abi> typename V::value_type hmin(const const_where_expression<M, V>&); template<class T, class Abi> T hmax(const simd<T, abi>&); template<class T, class Abi> typename V::value_type hmax(const const_where_expression<M, V>&);// 9.4.6, Algorithms template<class T, class Abi> simd<T, Abi> min(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept; template<class T, class Abi> simd<T, Abi> max(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept; template<class T, class Abi> pair<simd<T, Abi>, simd<T, Abi>> minmax(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept; template<class T, class Abi> simd<T, Abi> clamp(const simd<T, Abi>& v, const simd<T, Abi>& lo, const simd<T, Abi>& hi) noexcept; } }
The header <experimental/simd>
defines class templates, tag types, trait types, and function templates for element-wise operations on data-parallel objects.
simd
ABI tagsnamespace simd_abi { struct scalar {}; template<int N> struct fixed_size {}; template<class T> inline constexpr in max_fixed_size = implementation-defined; template<class T> using compatible = implementation-defined; template<class T> using native = implementation-defined; }
An ABI tag is a type in the std::experimental::parallelism_v2::simd_abi
namespace that indicates a choice of size and binary representation for objects of data-parallel type. simd
and simd_mask
.
Use of the scalar
tag type requires data-parallel types to store a single element (i.e., simd<T, simd_abi::scalar>::size()
returns 1). scalar
is not an alias for fixed_size<1>
.
— end note ]
The value of max_fixed_size<T>
is at least 32.
Use of the simd_abi::fixed_size<N>
tag type requires data-parallel types to store N
elements (i.e. simd<T, simd_abi::fixed_size<N>>::size()
is N
). simd<T, fixed_size<N>>
and simd_mask<T, fixed_size<N>>
with N > 0
and N <= max_fixed_size<T>
shall be supported. Additionally, for every supported simd<T, Abi>
(see Abi
is an ABI tag that is not a specialization of simd_abi::fixed_size
, N == simd<T, Abi>::size()
shall be supported.
simd<T, fixed_size<T, fixed_size<N>>
with N > max_fixed_size<T>
is supported. The value of max_fixed_size<T>
can depend on compiler flags and can change between different compiler versions.
— end note ]
simd
and simd_mask
specializations using the same simd_abi::fixed_size<N>
tag. Otherwise, the efficiency of simd<T, Abi>
is likely to be better than for simd<T, fixed_size<simd_size_v<T, Abi>>>
(with Abi
not a specialization of simd_abi::fixed_size
).
— end note ]
An implementation may define additional extended ABI tag types in the std::experimental::parallelism_v2::simd_abi
namespace, to support other forms of data-parallel computation.
compatible<T>
is an implementation-defined alias for an ABI tag. T
that ensures ABI compatibility between translation units on the target architecture.
— end note ]
[ Example: Consider a target architecture supporting the extended ABI tags __simd128
and __simd256
, where the __simd256
type requires an optional ISA extension on said architecture. Also, the target architecture does not support long double
with either ABI tag. The implementation therefore defines
compatible<T>
is an alias for __simd128
for all vectorizable T
, except long double
, and
compatible<long double>
as an alias for scalar
.
native<T>
is an implementation-defined alias for an ABI tag. T
that is supported on the currently targeted system. For target architectures without ISA extensions, the native<T>
and compatible<T>
aliases will likely be the same. For target architectures with ISA extensions, compiler flags may influence the native<T>
alias while compatible<T>
will be the same independent of such flags.
— end note ]
[ Example: Consider a target architecture supporting the extended ABI tags __simd128
and __simd256
, where hardware support for __simd256
only exists for floating-point types. The implementation therefore defines native<T>
as an alias for
__simd256
if T
is a floating-point type, and
__simd128
otherwise.
template<T, size_t N> struct deduce { using type = see below; };
The member type
shall be present if and only if
T
is a vectorizable type, and
simd_abi::fixed_size<N>
is supported (see
Where present, the member typedef type
shall name an ABI tag type that satisfies
simd_size<T, type> == N
, and
simd<T, type>
is default constructible (see N
is 1
, the member typedef type
is simd_abi::scalar
. Otherwise, if there are multiple ABI tag types that satisfy the constraints, the member typedef type
is implementation-defined. simd_abi::fixed_size<N>
.
— end note ]
The behavior of a program that adds specializations for deduce
is undefined.
simd
type traitstemplate<class T> struct is_abi_tag { see below };
The type is_abi_tag<T>
is a UnaryTypeTrait
with a BaseCharacteristic
of true_type
if T
is a standard or extended ABI tag, and false_type
otherwise.
The behavior of a program that adds specializations for is_abi_tag
is undefined.
template<class T> struct is_simd { see below };
The type is_simd<T>
is a UnaryTypeTrait
with a BaseCharacteristic
of true_type
if T
is a specialization of the simd
class template, and false_type
otherwise.
The behavior of a program that adds specializations for is_simd
is undefined.
template<class T> struct is_simd_mask { see below };
The type is_simd_mask<T>
is a UnaryTypeTrait
with a BaseCharacteristic
of true_type
if T
is a specialization of the simd_mask
class template, and false_type
otherwise.
The behavior of a program that adds specializations for is_simd_mask
is undefined.
template<class T> struct is_simd_flag_type { see below };
The type is_simd_flag_type<class T>
is a UnaryTypeTrait
with a BaseCharacteristic
of true_type
if T
is one of
element_aligned_tag
, or
vector_aligned_tag
, or
overaligned_tag<N>
with N > 0
and N
an integral power of two,
false_type
otherwise.
The behavior of a program that adds specializations for is_simd_flag_type
is undefined.
template<class T, class Abi = simd_abi::compatible<T>> struct simd_size { see below };
simd_size<T, Abi>
shall have a member value
if and only if
T
is a vectorizable type, and
is_abi_tag_v<Abi>
is true
.
If value
is present, the type simd_size<T, Abi>
is a BinaryTypeTrait
with a BaseCharacteristic
of integral_constant<size_t, N>
with N
equal to the number of elements in a simd<T, Abi>
object. simd<T, Abi>
is not supported for the currently targeted system, simd_size<T, Abi>::value
produces the value simd<T, Abi>::size()
would return if it were supported.
— end note ]
The behavior of a program that adds specializations for simd_size
is undefined.
template<class T, class U = typename T::value_type> struct memory_alignment { see below };
memory_alignment<T, U>
shall have a member value
if and only if
is_simd_mask_v<T>
is true
and U
is bool
, or
is_simd_v<T>
is true
and U
is a vectorizable type.
If value
is present, the type memory_alignment<T, U>
is a BinaryTypeTrait
with a BaseCharacteristic
of integral_constant<size_t, N>
for some implementation-defined N
(see value
identifies the alignment restrictions on pointers used for (converting) loads and stores for the give type T
on arrays of type U
.
— end note ]
The behavior of a program that adds specializations for memory_alignment
is undefined.
const_where_expression
and where_expression
template<class M, class T> class const_where_expression { const M mask; // exposition only T& data; // exposition only public: const_where_expression(const const_where_expression&) = delete; const_where_expression& operator=(const const_where_expression&) = delete; T operator-() const &&; T operator+() const &&; T operator~() const &&; template<class U, class Flags> void copy_to(U* mem, Flags f) const &&; }; template<class M, class T> class where_expression : public const_where_expression<M, T> { public: template<class U> void operator=(U&& x) &&; template<class U> void operator+=(U&& x) &&; template<class U> void operator-=(U&& x) &&; template<class U> void operator*=(U&& x) &&; template<class U> void operator/=(U&& x) &&; template<class U> void operator%=(U&& x) &&; template<class U> void operator&=(U&& x) &&; template<class U> void operator|=(U&& x) &&; template<class U> void operator^=(U&& x) &&; template<class U> void operator<<=(U&& x) &&; template<class U> void operator>>=(U&& x) &&; void operator++() && void operator++(int) && void operator--() && void operator--(int) && template<class U, class Flags> void copy_from(const U* mem, Flags) &&; };
The class templates const_where_expression
and where_expression
abstract the notion of selecting elements of a given object of arithmetic or data-parallel type.
The first templates argument M
shall be cv-unqualified bool
or a cv-unqualified simd_mask
specialization.
If M
is bool
, T
shall be a cv-unqualified arithmetic type. Otherwise, T
shall either be M
or typename M::simd_type
.
In this subclause, if M
is bool
, data[0]
is used interchangably for data
, mask[0]
is used interchangably for mask
, and M::size()
is used interchangably for 1
.
The selected indices signify the integers i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}. The selected elements signify the elements data[i]
for all selected indices i
.
In this subclause, the type value_type
is an alias for T
if M
is bool
, or an alias for typename T::value_type
if is_simd_mask_v<M>
is true
.
where
functions mask
with the first argument to where
and data
with the second argument to where
.
— end note ]
T operator-() const &&;
T operator+() const &&;
T operator~() const &&;
data
with the indicated unary operator applied to all selected elements.
template<class U, class Flags> void copy_to(U* mem, Flags) const &&;
Flags
is vector_aligned_tag
, mem
shall point to storage aligned by memory_alignment_v<T, U>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligned by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. If M
is not bool
, the largest i
∊ [0, M::size())
where mask[i]
is true
is less than the number of values pointed to by mem
.
mem[i] = static_cast<U>(data[i])
for all selected indices i
.
is_simd_flag_type_v<Flags>
is true
, and
U
is bool
and value_type
is bool
, or
U
is a vectorizable type and value_type
is not bool
.
template<class U> void operator=(U&& x) &&;
data[i]
with static_cast<T>(std::forward<U>(x))[i]
for all selected indices i
.
U
is convertible to T
.
template<class U> void operator+=(U&& x) &&;
template<class U> void operator-=(U&& x) &&;
template<class U> void operator*=(U&& x) &&;
template<class U> void operator/=(U&& x) &&;
template<class U> void operator%=(U&& x) &&;
template<class U> void operator&=(U&& x) &&;
template<class U> void operator|=(U&& x) &&;
template<class U> void operator^=(U&& x) &&;
template<class U> void operator<<=(U&& x) &&;
template<class U> void operator>>=(U&& x) &&;
data[i]
with static_cast<T>(data @ std::forward<U>(x))[i]
(where @
denotes the indicated operator) for all selected indices i
.
data @ std::forward<U>(x)
is convertible to T
.
It is unspecified whether the binary operator, implied by the compound
assignment operator, is executed on all elements or only on the selected
elements.
void operator++() &&;
void operator++(int) &&;
void operator--() &&;
void operator--(int) &&;
T
.
template<class U, class Flags> void copy_from(const U* mem, Flags) &&;
Flags
is vector_aligned_tag
, mem
shall point to storage aligned by memory_alignment_v<T, U>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligned by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. If is_simd_flag_type_v<U>
is true
, for all selected indices i
, i
shall be less than the number of values pointed to by mem
.
data[i] = static_cast<value_type>(mem[i])
for all selected indices i
.
is_simd_flag_type_v<Flags>
is true
, and
U
is bool
and value_type
is bool
, or
U
is a vectorizable type and value_type
is not bool
.
simd
simd
overviewtemplate<class T, class Abi> class simd { public: using value_type = T; using reference = see below; using mask_type = simd_mask<T, Abi> using abi_type = Abi; static constexpr size_t size() noexcept; simd() = default; // implicit conversion constructor template<class U> simd(const simd<U, simd_abi::fixed_size<size()>>&); // implicit broadcast constructor (see below for constraints) template<class U> simd(U&& value); // generator constructor (see below for constraints) template<class G> explicit simd(G&& gen); // load constructor template<class U, class Flags> simd(const U* mem, Flags f);// 9.3.4, Copy functions template<class U, class Flags> copy_from(const U* mem, Flags f); template<class U, class Flags> copy_to(U* mem, Flags f);// 9.3.5, Subscript operators reference operator[](size_t); value_type operator[](size_t) const;// 9.3.6, Unary operators simd& operator++(); simd operator++(int); simd& operator--(); simd operator--(int); mask_type operator!() const; simd operator~() const; simd operator+() const; simd operator-() const;// 9.4.1, Binary operators friend simd operator+(const simd&, const simd&); friend simd operator-(const simd&, const simd&); friend simd operator*(const simd&, const simd&); friend simd operator/(const simd&, const simd&); friend simd operator%(const simd&, const simd&); friend simd operator&(const simd&, const simd&); friend simd operator|(const simd&, const simd&); friend simd operator^(const simd&, const simd&); friend simd operator<<(const simd&, const simd&); friend simd operator>>(const simd&, const simd&); friend simd operator<<(const simd&, int); friend simd operator>>(const simd&, int);// 9.4.2, Compound assignment friend simd& operator+=(simd&, const simd&); friend simd& operator-=(simd&, const simd&); friend simd& operator*=(simd&, const simd&); friend simd& operator/=(simd&, const simd&); friend simd& operator%=(simd&, const simd&); friend simd& operator&=(simd&, const simd&); friend simd& operator|=(simd&, const simd&); friend simd& operator^=(simd&, const simd&); friend simd& operator<<=(simd&, const simd&); friend simd& operator>>=(simd&, const simd&); friend simd& operator<<=(simd&, int); friend simd& operator>>=(simd&, int);// 9.4.3, Compare operators friend mask_type operator==(const simd&, const simd&); friend mask_type operator!=(const simd&, const simd&); friend mask_type operator>=(const simd&, const simd&); friend mask_type operator<=(const simd&, const simd&); friend mask_type operator>(const simd&, const simd&); friend mask_type operator<(const simd&, const simd&); };
The class template simd
is a data-parallel type. The width of a given simd
specialization is a constant expression, determined by the template parameters.
Every specialization of simd
shall be a complete type. The specialization simd<T, Abi>
is supported if T
is a vectorizable type and
Abi
is simd_abi::scalar
, or
Abi
is simd_abi::fixed_size<N>
, with N
is constrained as defined in Abi
is an extended ABI tag, it is implementation-defined whether simd<T, Abi>
is supported. simd<T, Abi>
is not supported, the
specialization shall have a deleted default constructor, deleted
destructor, deleted copy constructor, and deleted copy assignment.
__simd_x
and __gpu_y
. When the compiler is invoked to translate to a machine that has support for the __simd_x
ABI tag for all arithmetic types other than long double
and no support for the __gpu_y
ABI tag, then:
simd<T, simd_abi::__gpu_y>
is not supported for any T
and has a deleted constructor.
simd<long double, simd_abi::__simd_x>
is not supported and has a deleted constructor.
simd<double, simd_abi::__simd_x>
is supported.
simd<long double, simd_abi::scalar>
is supported.
Default intialization performs no initialization of the elements; value-initialization initializes each element with T()
.
static constexpr size_t size() noexcept;
simd<T, Abi>
.
Implementations should enable explicit conversion from and to
implementation-defined types. This adds one or more of the following
declarations to class simd
:
explicit operator implementation-defined() const;
explicit simd(const implementation-defined& init);
[ Example:
Consider an implementation that supports the type __vec4f
and the function __vec4f _vec4f_addsub(__vec4f, __vec4f)
for the currently targeted system.
A user may require the use of _vec4f_addsub
for maximum performance and thus writes:
using V = simd<float, simd_abi::__simd128>; V addsub(V a, V b) { return static_cast<V>(_vec4f_addsub(static_cast<__vec4f>(a), static_cast<__vec4f>(b))); }— end example ]
A reference
is an object that refers to an element in a simd
or simd_mask
object. reference::value_type
is the same type as simd::value_type
or simd_mask::value_type
, respectively.
Class reference
is for exposition only. An
implementation is permitted to provide equivalent functionality without
providing a class with this name.
class reference // exposition only { public: reference() = delete; reference(const reference&) = delete; operator value_type() const noexcept; template<class U> reference operator=(U&& x) &&; template<class U> reference operator+=(U&& x) &&; template<class U> reference operator-=(U&& x) &&; template<class U> reference operator*=(U&& x) &&; template<class U> reference operator/=(U&& x) &&; template<class U> reference operator%=(U&& x) &&; template<class U> reference operator|=(U&& x) &&; template<class U> reference operator&=(U&& x) &&; template<class U> reference operator^=(U&& x) &&; template<class U> reference operator<<=(U&& x) &&; template<class U> reference operator>>=(U&& x) &&; reference operator++() &&; value_type operator++(int) &&; reference operator--() &&; value_type operator--(int) &&; friend void swap(reference&& a, reference&& b) noexcept; friend void swap(value_type&& a, reference&& b) noexcept; friend void swap(reference&& a, value_type&& b) noexcept; };
operator value_type() const noexcept;
*this
.
template<class U> reference operator=(U&& x) &&;
simd
or simd_mask
with static_cast<value_type>(std::forward<U>(x))
.
*this
.
declval<value_type &>() = std::forward>U>(x)
is well-formed.
template<class U> reference operator+=(U&& x) &&;
template<class U> reference operator-=(U&& x) &&;
template<class U> reference operator*=(U&& x) &&;
template<class U> reference operator/=(U&& x) &&;
template<class U> reference operator%=(U&& x) &&;
template<class U> reference operator|=(U&& x) &&;
template<class U> reference operator&=(U&& x) &&;
template<class U> reference operator^=(U&& x) &&;
template<class U> reference operator<<=(U&& x) &&;
template<class U> reference operator>>=(U&& x) &&;
simd
or simd_mask
and std::forward<U>(x)
.
*this
.
declval<value_type &>() @= std::forward<U>(x)
(where @=
denotes the indicated compound assignment operator) is well-formed.
reference operator++() &&;
reference operator--() &&;
simd
or simd_mask
.
*this
.
value_type
.
value_type operator++(int) &&;
value_type operator--(int) &&;
simd
or simd_mask
.
value_type
.
friend void swap(reference&& a, reference&& b) noexcept;
friend void swap(value_type& a, reference&& b) noexcept;
friend void swap(reference&& a, value_type& b) noexcept;
a
and b
refer to.
template<class U> simd(U&&);
value_type
.
value_type
.
From
identify the type remove_cv_t<remove_reference_t<U>>
. This constructor shall not participate in overload resolution unless:
From
is a vectorizable type and every possibly value of From
can be represented with type value_type
, or
From
is not an arithmetic type and is implicitly convertible to value_type
, or
From
is int
, or
From
is unsigned int
and value_type
is an unsigned integral type.
template<class U> simd(const simd<U, simd_abi::fixed_size<size()>>& x);
static_cast<T>(x[i])
for all i
∊ [0, size())
.
abi_type
is simd_abi::fixed_size<size()>
, and
U
can be represented with type value_type
, and
U
and value_type
are integral, the integer conversion rank [conv.rank] of value_type
is greater than the integer conversion rank of U
.
template<class G> simd(G&& gen);
gen(integral_constant<size_t, i>())
.
simd(gen(integral_constant<size_t, i>()))
is well-formed for all i
∊ [0, size())
.
The calls to gen
are unsequenced with respect to each other. Vectorization-unsafe standard library functions may not be invoked by gen
([algorithms.parallel.exec]).
template<class U, class Flags> simd(const U* mem, Flags);
Flags
is vector_aligned_tag
, mem
shall point to storage aligend by memory_alignment_v<simd, U>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligned by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. [mem, mem + size())
is a valid range.
static_cast<T>(mem[i])
for all i
∊ [0, size())
.
is_simd_flag_type_v<Flags>
is true
, and
U
is a vectorizable type.
template<class U, class Flags> void copy_from(const U* mem, Flags);
Flags
is vector_aligned_tag
, mem
shall point to storage aligned by memory_alignment_v<simd, U>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligend by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. [mem, mem + size())
is a valid range.
simd
object such that the i-th element is assigned with static_cast<T>(mem[i])
for all i
∊ [0, size())
.
is_simd_flag_type_v<Flags>
is true
, and
U
is a vectorizable type.
template<class U, class Flags> void copy_to(U* mem, Flags) const;
Flags
is vector_aligned_tag
, mem
shall point to storage aligned by memory_alignment_v<simd, U>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligned by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. [mem, mem + size())
is a valid range.
simd
elements as if mem[i] = static_cast<U>(operator[](i))
for all i
∊ [0, size())
.
is_simd_flag_type_v<Flags>
is true
, and
U
is a vectorizable type.
reference operator[](size_t i);
i < size()
.
reference
(see value_type operator[](size_t i) const;
i < size()
.
Effects in this subclause are applied as unary element-wise operations.
simd& operator++();
*this
.
simd operator++(int);
*this
before incrementing.
simd& operator--();
*this
.
simd operator--(int);
*this
before decrementing.
mask_type operator!() const;
simd_mask
object with the i-th element set to !operator[](i)
for all i
∊ [0, size())
.
simd operator~() const;
simd
object where each bit is the inverse of the corresponding bit in *this
.
T
is an integral type.
simd operator+() const;
*this
.
simd operator-() const;
simd
object where the i-th element is initialized to -operator[](i)
for all i
∊ [0, size())
.
friend simd operator+(const simd& lhs, const simd& rhs);
friend simd operator-(const simd& lhs, const simd& rhs);
friend simd operator*(const simd& lhs, const simd& rhs);
friend simd operator/(const simd& lhs, const simd& rhs);
friend simd operator%(const simd& lhs, const simd& rhs);
friend simd operator&(const simd& lhs, const simd& rhs);
friend simd operator|(const simd& lhs, const simd& rhs);
friend simd operator^(const simd& lhs, const simd& rhs);
friend simd operator<<(const simd& lhs, const simd& rhs);
friend simd operator>>(const simd& lhs, const simd& rhs);
simd
object initialized with the results of the element-wise application of the indicated operator.
value_type
.
friend simd operator<<(const simd& v, int n);
friend simd operator>>(const simd& v, int n);
simd
object where the i-th element is initialized to the result of applying the indicated operator to v[i]
and n
for all i
∊ [0, size())
.
value_type
.
friend simd& operator+=(simd& lhs, const simd& rhs);
friend simd& operator-=(simd& lhs, const simd& rhs);
friend simd& operator*=(simd& lhs, const simd& rhs);
friend simd& operator/=(simd& lhs, const simd& rhs);
friend simd& operator%=(simd& lhs, const simd& rhs);
friend simd& operator&=(simd& lhs, const simd& rhs);
friend simd& operator|=(simd& lhs, const simd& rhs);
friend simd& operator^=(simd& lhs, const simd& rhs);
friend simd& operator<<=(simd& lhs, const simd& rhs);
friend simd& operator>>=(simd& lhs, const simd& rhs);
friend simd& operator<<=(simd& lhs, int n);
friend simd& operator>>=(simd& lhs, int n);
lhs
.
value_type
.
friend mask_type operator==(const simd&, const simd&);
friend mask_type operator!=(const simd&, const simd&);
friend mask_type operator>=(const simd&, const simd&);
friend mask_type operator<=(const simd&, const simd&);
friend mask_type operator>(const simd&, const simd&);
friend mask_type operator<(const simd&, const simd&);
simd_mask
object initialized with the results of the element-wise application of the indicated operator.
In this subclause, BinaryOperation
shall be a binary element-wise operation.
template<class T, class Abi, class BinaryOperation = plus<>>
T reduce(const simd<T, Abi>& x, BinaryOperation binary_op = {});
binary_op
shall be callable with two arguments of type T
returning T
, or callable with two arguments of type simd<T, A1>
returning simd<T, A1>
for every A1
that is an ABI tag type.
GENERALIZED_SUM(binary_op, x.data[i], ...)
for all i
∊ [0, size())
.
binary_op
.
template<class M, class V, class BinaryOperation>
typename V::value_type reduce(const const_where_expression<M, V>& x, typename V::value_type identity_element,
BinaryOperation binary_op = {});
binary_op
shall be callable with two arguments of type T
returning T
, or callable with two arguments of type simd<T, A1>
returning simd<T, A1>
for every A1
that is an ABI tag type. The results of binary_op(identity_element, x)
and binary_op(x, identity_element)
shall be equal to x
for all finite values x
representable by V::value_type
.
none_of(x.mask)
, returns identity_element
. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...)
for all i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}.
binary_op
.
template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, plus<> binary_op);
none_of(x.mask)
, returns 0
. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...)
for all i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}.
template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, multiplies<> binary_op);
none_of(x.mask)
, returns 1
. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...)
for all i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}.
template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, bit_and<> binary_op);
is_integral_v<V::value_type>
is true
.
none_of(x.mask)
, returns ~V::value_type()
. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...)
for all i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}.
template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, bit_or<> binary_op);
template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, bit_xor<> binary_op);
is_integral_v<V::value_type>
is true
.
none_of(x.mask)
, returns 0
. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...)
for all i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}.
template<class T, class Abi> T hmin(const simd<T, Abi>& x);
x[j]
for which x[j] <= x[i]
for all i
∊ [0, size())
.
template<class T, class V> typename V::value_type hmin(const const_where_expression<M, V>& x);
none_of(x.mask)
, the return value is numeric_limits<V::value_type>::max()
. Otherwise, returns the value of an element x.data[j]
for which x.mask[j] == true
and x.data[j] <= x.data[i]
for all i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}.
template<class T, class Abi> T hmax(const simd<T, Abi>& x);
x[j]
for which x[j] >= x[i]
for all i
∊ [0, size())
.
template<class T, class V> typename V::value_type hmax(const const_where_expression<M, V>& x);
none_of(x.mask)
, the return value is numeric_limits<V::value_type>::lowest()
. Otherwise, returns the value of an element x.data[j]
for which x.mask[j] == true
and x.data[j] >= x.data[i]
for all i
∊ {j ∊ ℕ0 ∣ j < M::size()
⋀ mask[
j]
}.
template<class T, class U, class Abi> see below simd_cast(const simd<U, Abi>& x)
Let To
identify T::value_type
if is_simd_v<T>
is true
, or T
otherwise.
simd
object with the i-th element initialized to static_cast<To>(x[i])
for all i
∊ [0, size())
.
U
can be represented with type To
, and
is_simd_v<T>
is false
, or
T::size() == simd<U, Abi>::size()
is true
.
The return type is
T
if is_simd_v<T>
is true
, otherwise
simd<T, Abi>
is U
is T
, otherwise
simd<T, simd_abi::fixed_size<simd<U, Abi>::size()>>
template<class T, class U, class Abi> see below static_simd_cast(const simd<U, Abi>& x);
Let To
identify T::value_type
if is_simd_v<T>
is true
or T
otherwise.
simd
object with the i-th element initialized to static_cast<To>(x[i])
for all i
∊ [0, size())
.
is_simd_v<T>
is false
, or
T::size() == simd<U, Abi>::size()
is true
.
The return type is
T
if is_simd_v<T>
is true
, otherwise
simd<T, Abi>
if either U
is T
or U
and T
are integral types that only differ in signedness, otherwise
simd<T, simd_abi::fixed_size<simd<U, Abi>::size()>>
.
template<class T, class Abi>
fixed_size_simd<T, simd_size_v<T, Abi>> to_fixed_size(const simd<T, Abi>& x) noexcept;
template<class T, class Abi>
fixed_size_simd_mask<T, simd_size_v<T, Abi>> to_fixed_size(const simd_mask<T, Abi>& x) noexcept;
x[i]
for all i
∊ [0, size())
.
template<class T, int N> native_simd<T> to_native(const fixed_size_simd<T, N>& x) noexcept;
template<class T, int N> native_simd_mask<T> to_native(const fixed_size_simd_mask<T, N>& x) noexcept;
x[i]
for all i
∊ [0, size())
.
simd_size_v<T, simd_abi::native<T>> == N
is true
.
template<class T, int N> simd<T> to_compatible(const fixed_size_simd<T, N>& x) noexcept;
template<class T, int N> simd_mask<T> to_compatible(const fixed_size_simd_mask<T, N>& x) noexcept;
x[i]
for all i
∊ [0, size())
.
simd_size_v<T, simd_abi::compatible<T>> == N
is true
.
template<size_t... Sizes, class T, class Abi>
tuple<simd<T, simd_abi::deduce_t<T, Sizes>>...>
split(const simd<T, Abi>& x);
template<size_t... Sizes, class T, class Abi>
tuple<simd_mask<T, simd_abi::deduce_t<T, Sizes>>...>
split(const simd_mask<T, Abi>& x);
tuple
of data-parallel objects with the i-th simd
/simd_mask
element of the j-th tuple
element initialized to the value of the element x
with index i + sum of the first j values in the Sizes
pack.
Sizes
pack is equal to simd_size_v<T, Abi>
.
template<class V, class Abi>
array<V, simd_size_v<typename V::value_type, Abi> / V::size()>
split(const simd<typename V::value_type, Abi>& x);
template<class V, class Abi>
array<V, simd_size_v<typename V::value_type, Abi> / V::size()>
split(const simd_mask<typename V::value_type, Abi>& x);
array
of data-parallel objects with the i-th simd
/simd_mask
element of the j-th array
element initialized to the value of the element in x
with index i + j * V::size()
.
simd_size_v<typename V::value_type, Abi>
is an integral multiple of V::size()
, and
simd
parameter is_simd_v<V>
is true
, for the overload with a simd_mask
parameter is_simd_mask_v<V>
is true
.
template<class T, class... Abis>
simd<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>> concat(const simd<T, Abis>&... xs);
template<class T, class... Abis>
simd_mask<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>> concat(const simd_mask<T, Abis>&... xs);
xs
pack of data-parallel objects: The i-th simd
/simd_mask
element of the j-th parameter in the xs
pack is copied to the return value's element with index i + the sum of the width of the first j parameters in the xs
pack.
template<class T, class Abi> simd<T, Abi> min(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;
std::min(a[i], b[i])
for all i
∊ [0, size())
.
template<class T, class Abi> simd<T, Abi> max(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;
std::max(a[i], b[i])
for all i
∊ [0, size())
.
template<class T, class Abi>
pair<simd<T, Abi>, simd<T, Abi>> minmax(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;
pair
initialized with
std::min(a[i], b[i])
for all i
∊ [0, size())
in the first
member, and
std::max(a[i], b[i])
for all i
∊ [0, size())
in the second
member.
template<class T, class Abi> simd<T, Abi>
clamp(const simd<T, Abi>& v, const simd<T, Abi>& lo, const simd<T, Abi>& hi);
lo
shall be greater than the corresponding element in hi
.
std::clamp(v[i], lo[i], hi[i])
for all i
∊ [0, size())
.
For each set of overloaded functions within <cmath>
, there shall be additional overloads sufficient to ensure that if any argument corresponding to a double
parameter has type simd<T, Abi>
, where is_floating_point_v<T>
is true
, then:
double
parameters shall be convertible to simd<T, Abi>
.
double*
parameters shall be of type simd<T, Abi>*
.
U
shall be convertible to fixed_size_simd<U, simd_size_v<T, Abi>>
.
U*
, where U
is integral, shall be of type fixed_size_simd<U, simd_size_v<T, Abi>>*
.
double
, the return type of the additional overloads is simd<T, Abi>
. Otherwise, if the corresponding return type is bool
, the return type of the additional overload is simd_mask<T, Abi>
. Otherwise, the return type is fixed_size_simd<R, simd_size_v<T, Abi>>
, with R
denoting the corresponding return type.
simd<T, Abi>
but are not of type simd<T, Abi>
is well-formed.
Each function overload produced by the above rules applies the indicated <cmath>
function element-wise. The results per element are not required to be
bitwise equal to the application of the function which is overloaded for
the element type.
The behavior is undefined if a domain, pole, or range error
occurs when the input argument(s) are applied to the indicated <cmath>
function.
If abs
is called with an argument of type simd<X, Abi>
for which is_unsigned_v<X>
is true
, the program is ill-formed.
simd_mask
simd_mask
overviewtemplate<class T, class Abi> class simd_mask { public: using value_type = bool; using reference = see below; using simd_type = simd<T, Abi>; using abi_type = Abi; static constexpr size_t size() noexcept; simd_mask() = default; // broadcast constructor explicit simd_mask(value_type) noexcept; // implicit type conversion constructor template<class U> simd_mask(const simd_mask<U, simd_abi::fixed_size<size()>>&) noexcept; // load constructor template<class Flags> simd_mask(const value_Type* mem, Flags);// 9.5.3, Copy functions template<class Flags> void copy_from(const value_type* mem, Flags); template<class Flags> void copy_to(value_type* mem, Flags);// 9.5.4, Subscript operators reference operator[](size_t); value_type operator[](size_t) const;// 9.5.5, Unary operators simd_mask operator!() const noexcept;// 9.6.1, Binary operators friend simd_mask operator&&(const simd_mask&, const simd_mask&) noexcept; friend simd_mask operator||(const simd_mask&, const simd_mask&) noexcept; friend simd_mask operator&(const simd_mask&, const simd_mask&) noexcept; friend simd_mask operator|(const simd_mask&, const simd_mask&) noexcept; friend simd_mask operator^(const simd_mask&, const simd_mask&) noexcept;// 9.6.2, Compound assignment friend simd_mask& operator&=(simd_mask&, const simd_mask&) noexcept; friend simd_mask& operator|=(simd_mask&, const simd_mask&) noexcept; friend simd_mask& operator^=(simd_mask&, const simd_mask&) noexcept;// 9.6.3, Comparisons friend simd_mask operator==(const simd_mask&, const simd_mask&) noexcept; friend simd_mask operator!=(const simd_mask&, const simd_mask&) noexcept; };
The class template simd_mask
is a data-parallel type with the element type bool
. The width of a given simd_mask
specialization is a constant expression, determined by the template parameters. Specifically, simd_mask<T, Abi>::size() == simd<T, Abi>::size()
.
Every specialization of simd_mask
shall be a complete type. The specialization simd_mask<T, Abi>
is supported if T
is a vectorizable type and
Abi
is simd_abi::scalar
, or
Abi
is simd_abi::fixed_size<N>
, with N
constrained as defined in (Abi
is an extended ABI tag, it is implementation-defined whether simd_mask<T, Abi>
is supported. simd_mask<T, Abi>
is not supported, the
specialization shall have a deleted default constructor, deleted
destructor, deleted copy constructor, and deleted copy assignment.
Default initialization performs no intialization of the elements; value-initialization initializes each element with false
.
static constexpr size_t size() noexcept;
simd<T, Abi>
.
Implementations should enable explicit conversion from and to
implementation-defined types. This adds one or more of the following
declarations to class simd_mask
:
explicit operator implementation-defined() const; explicit simd_mask(const implementation-defined& init) const;
The member type reference
has the same interface as simd<T, Abi>::reference
, except its value_type
is bool
. (
explicit simd_mask(value_type x) noexcept
x
.
template<class U> simd_mask(const simd_mask<U, simd_abi::fixed_size<size()>>& x) noexcept;
simd_mask
where the i-th element equals x[i]
for all i
∊ [0, size())
.
abi_type
is simd_abi::fixed_size<size()>
.
template<class Flags> simd_mask(const value_type* mem, Flags);
Flags
is vector_aligned_tag
, mem
shall point to storage aligned by memory_alignment_v<simd_mask>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligned by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. [mem, mem + size())
is a valid range.
mem[i]
for all i
∊ [0, size())
.
is_simd_flag_type_v<Flags>
is true
.
template<class Flags> void copy_from(const value_type* mem, Flags);
Flags
is vector_aligned_tag
, mem
shall point to storage aligned by memory_alignment_v<simd_mask>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligned by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. [mem, mem + size())
is a valid range.
simd_mask
object such that the i-th element is replaced with mem[i]
for all i
∊ [0, size())
.
is_simd_flag_type_v<Flags>
is true
.
template<class Flags> void copy_to(value_type* mem, Flags);
Flags
is vector_aligned_tag
, mem
shall point to storage aligned by memory_alignment_v<simd_mask>
. If the template parameter Flags
is overaligned_tag<N>
, mem
shall point to storage aligned by N
. If the template parameter Flags
is element_aligned_tag
, mem
shall point to storage aligned by alignof(U)
. [mem, mem + size())
is a valid range.
simd_mask
elements as if mem[i] = operator[](i)
for all i
∊ [0, size())
.
is_simd_flag_type_v<Flags>
is true
.
reference operator[](size_t i);
i < size()
.
reference
(see value_type operator[](size_t i) const;
i < size()
.
simd_mask operator!() const noexcept;
operator!
.
friend simd_mask operator&&(const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator||(const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator& (const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator| (const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator^ (const simd_mask&, const simd_mask&) noexcept;
simd_mask
object initialized with the results of the element-wise appliation of the indicated operator.
friend simd_mask& operator&=(simd_mask& lhs, const simd_mask& rhs) noexcept;
friend simd_mask& operator|=(simd_mask& lhs, const simd_mask& rhs) noexcept;
friend simd_mask& operator^=(simd_mask& lhs, const simd_mask& rhs) noexcept;
lhs
.
friend simd_mask operator==(const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator!=(const simd_mask&, const simd_mask&) noexcept;
template<class T, class Abi> bool all_of(const simd_mask<T, Abi>& k) noexcept;
true
if all boolean elements in k
are true
, false
otherwise.
template<class T, class Abi> bool any_of(const simd_mask<T, Abi>& k) noexcept;
true
if at least one boolean element in k
is true
, false
otherwise.
template<class T, class Abi> bool none_of(const simd_mask<T, Abi>& k) noexcept;
true
if none of the one boolean elements in k
is true
, false
otherwise.
template<class T, class Abi> bool some_of(const simd_mask<T, Abi>& k) noexcept;
true
if at least one of the one boolean elements in k
is true
and at least one of the boolean elements in k
is false
, false
otherwise.
template<class T, class Abi> int popcount(const simd_mask<T, Abi>& k) noexcept;
k
that are true
.
template<class T, class Abi> int find_first_set(const simd_mask<T, Abi>& k);
any_of(k)
returns true
.
i
where k[i]
is true
.
template<class T, class Abi> int find_last_set(const simd_mask<T, Abi>& k);
any_of(k)
returns true
.
i
where k[i]
is true
.
bool all_of(see below) noexcept;
bool any_of(see below) noexcept;
bool none_of(see below) noexcept;
bool some_of(see below) noexcept;
int popcount(see below) noexcept;
all_of
and any_of
return their arguments; none_of
returns the negation of its argument; some_of
returns false
; popcount
returns the integral representation of its argument.
bool
.
int find_first_set(see below) noexcept;
int find_last_set(see below) noexcept;
true
.
0
.
bool
.
template<class T, class Abi>
where_expression<simd_mask<T, Abi>, simd<T, Abi>> where(const typename simd<T, Abi>::mask_type& k,
simd<T, Abi>& v) noexcept;
template<class T, class Abi>
const_where_expression<simd_mask<T, Abi>, simd<T, Abi>> where(const typename simd<T, Abi>::mask_type& k,
const simd<T, Abi>& v) noexcept;
template<class T, class Abi>
where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>> where(const nodeduce_t<simd_mask<T, Abi>>& k,
simd_mask<T, Abi>& v) noexcept;
template<class T, class Abi>
const_where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>> where(const nodeduce_t<simd_mask<T, Abi>>& k,
const simd_mask<T, Abi>& v) noexcept;
mask
and data
initialized with k
and v
respectively.
template<class T> where_expression<bool T> where(see below k, T& v) noexcept;
template<class T>
const_where_expression<bool, T> where(see below k, const T& v) noexcept;
T
is neither a simd
nor a simd_mask
specialization, and
bool
.
mask
and data
initialized with k
and v
respectively.