Document Number:	N4742
Date:	2018-04-02
Revises:	N4725
Editor:	Jared Hoberock NVIDIA Corporation jhoberock@nvidia.com

Parallel algorithms

7.1

Wavefront Application

[parallel.alg.wavefront]

For the purposes of this section, an evaluation is a value computation or side effect of an expression, or an execution of a statement. Initialization of a temporary object is considered a subexpression of the expression that necessitates the temporary object.

An evaluation A contains an evaluation B if:

A and B are not potentially concurrent ([intro.races]); and
the start of A is the start of B or the start of A is sequenced before the start of B; and
the completion of B is the completion of A or the completion of B is sequenced before the completion of A.

[ Note: This includes evaluations occurring in function invocations. — end note ]

An evaluation A is ordered before an evaluation B if A is deterministically sequenced before B. [ Note: If A is indeterminately sequenced with respect to B or A and B are unsequenced, then A is not ordered before B and B is not ordered before A. The ordered before relationship is transitive. — end note ]

For an evaluation A ordered before an evaluation B, both contained in the same invocation of an element access function, A is a vertical antecedent of B if:

there exists an evaluation S such that:
- S contains A, and
- S contains all evaluations C (if any) such that A is ordered before C and C is ordered before B,
- but S does not contain B, and
control reached B from A without executing any of the following:
- a goto statement or asm declaration that jumps to a statement outside of S, or
- a switch statement executed within S that transfers control into a substatement of a nested selection or iteration statement, or
- a throw [ Note: even if caught — end note ] , or
- a longjmp.

[ Note: Vertical antecedent is an irreflexive, antisymmetric, nontransitive relationship between two evaluations. Informally, A is a vertical antecedent of B if A is sequenced immediately before B or A is nested zero or more levels within a statement S that immediately precedes B. — end note ]

In the following, X_i and X_j refer to evaluations of the same expression or statement contained in the application of an element access function corresponding to the i^th and j^th elements of the input sequence. [ Note: There might be several evaluations X_k, Y_k, etc. of a single expression or statement in application k, for example, if the expression or statement appears in a loop within the element access function. — end note ]

Horizontally matched is an equivalence relationship between two evaluations of the same expression. An evaluation B_i is horizontally matched with an evaluation B_j if:

both are the first evaluations in their respective applications of the element access function, or
there exist horizontally matched evaluations A_i and A_j that are vertical antecedents of evaluations B_i and B_j, respectively.

[ Note: Horizontally matched establishes a theoretical lock-step relationship between evaluations in different applications of an element access function. — end note ]

Let f be a function called for each argument list in a sequence of argument lists. Wavefront application of f requires that evaluation A_i be sequenced before evaluation B_j if i < j and:

A_i is sequenced before some evaluation B_i and B_i is horizontally matched with B_j, or
A_i is horizontally matched with some evaluation A_j and A_j is sequenced before B_{j_.}

[ Note: Wavefront application guarantees that parallel applications i and j execute such that progress on application j never gets ahead of application i. — end note ]

[ Note: The relationships between A_i and B_i and between A_j and B_j are sequenced before, not vertical antecedent. — end note ]

7.2

Non-Numeric Parallel Algorithms

[parallel.alg.ops]

7.2.1

Header `<experimental/algorithm>` synopsis

[parallel.alg.ops.synopsis]

#include <algorithm>

namespace std::experimental {
inline namespace parallelism_v2 {
namespace execution {
  // 7.2.5, No vec
  template<class F>
    auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)());

  // 7.2.6, Ordered update class
  template<class T>
    class ordered_update_t;

  // 7.2.7, Ordered update function template
  template<class T>
    ordered_update_t<T> ordered_update(T& ref) noexcept;
}

// Exposition only: Suppress template argument deduction.
template<class T> struct no_deduce { using type = T; };
template<class T> using no_deduce_t = typename no_deduce<T>::type;

// 7.2.2, Reductions Support for reductions
template<class T, class BinaryOperation>
  unspecified reduction(T& var, const T& identity, BinaryOperation combiner);
template<class T>
  unspecified reduction_plus(T& var);
template<class T>
  unspecified reduction_multiplies(T& var);
template<class T>
  unspecified reduction_bit_and(T& var);
template<class T>
  unspecified reduction_bit_or(T& var);
template<class T>
  unspecified reduction_bit_xor(T& var);
template<class T>
  unspecified reduction_min(T& var);
template<class T>
  unspecified reduction_max(T& var);

// 7.2.3, Inductions Support for inductions
template<class T>
  unspecified induction(T&& var);
template<class T, class S>
  unspecified induction(T&& var, S stride);

// 7.2.4, For loop for_loop
template<class I, class... Rest>
  void for_loop(no_deduce_t<I> start, I finish, Rest&&... rest);
template<class ExecutionPolicy,
         class I, class... Rest>
  void for_loop(ExecutionPolicy&& exec,
                no_deduce_t<I> start, I finish, Rest&&... rest);
template<class I, class S, class... Rest>
  void for_loop_strided(no_deduce_t<I> start, I finish,
                        S stride, Rest&&... rest);
template<class ExecutionPolicy,
         class I, class S, class... Rest>
  void for_loop_strided(ExecutionPolicy&& exec,
                        no_deduce_t<I> start, I finish,
                        S stride, Rest&&... rest);
template<class I, class Size, class... Rest>
  void for_loop_n(I start, Size n, Rest&&... rest);
template<class ExecutionPolicy,
         class I, class Size, class... Rest>
  void for_loop_n(ExecutionPolicy&& exec,
                  I start, Size n, Rest&&... rest);
template<class I, class Size, class S, class... Rest>
  void for_loop_n_strided(I start, Size n, S stride, Rest&&... rest);
template<class ExecutionPolicy,
         class I, class Size, class S, class... Rest>
  void for_loop_n_strided(ExecutionPolicy&& exec,
                          I start, Size n, S stride, Rest&&... rest);
}
}

7.2.2

Reductions

[parallel.alg.reductions]

Each of the function templates in this subclause ([parallel.alg.reductions]) returns a reduction object of unspecified type having a reduction value type and encapsulating a reduction identity value for the reduction, a combiner function object, and a live-out object from which the initial value is obtained and into which the final value is stored.

An algorithm uses reduction objects by allocating an unspecified number of instances, known as accumulators, of the reduction value type. [ Note: An implementation might, for example, allocate an accumulator for each thread in its private thread pool. — end note ] Each accumulator is initialized with the object’s reduction identity, except that the live-out object (which was initialized by the caller) comprises one of the accumulators. The algorithm passes a reference to an accumulator to each application of an element-access function, ensuring that no two concurrently executing invocations share the same accumulator. An accumulator can be shared between two applications that do not execute concurrently, but initialization is performed only once per accumulator.

Modifications to the accumulator by the application of element access functions accrue as partial results. At some point before the algorithm returns, the partial results are combined, two at a time, using the reduction object’s combiner operation until a single value remains, which is then assigned back to the live-out object. [ Note: in order to produce useful results, modifications to the accumulator should be limited to commutative operations closely related to the combiner operation. For example if the combiner is plus<T>, incrementing the accumulator would be consistent with the combiner but doubling it or assigning to it would not. — end note ]

template<class T, class BinaryOperation>
unspecified reduction(T& var, const T& identity, BinaryOperation combiner);

Requires:: T shall meet the requirements of CopyConstructible and MoveAssignable. The expression var = combiner(var, var) shall be well-formed.
Returns:: Aa reduction object of unspecified type having reduction value type T, reduction identity identity, combiner function object combiner, and using the object referenced by var as its live-out object.

template<class T>
unspecified reduction_plus(T& var);template<class T>
unspecified reduction_multiplies(T& var);template<class T>
unspecified reduction_bit_and(T& var);template<class T>
unspecified reduction_bit_or(T& var);template<class T>
unspecified reduction_bit_xor(T& var);template<class T>
unspecified reduction_min(T& var);template<class T>
unspecified reduction_max(T& var);

Requires:: T shall meet the requirements of CopyConstructible and MoveAssignable.
Returns:: Aa reduction object of unspecified type having reduction value type T, reduction identity and combiner operation as specified in table Table 2 and using the object referenced by var as its live-out object.

7.2.3

Inductions

[parallel.alg.inductions]

Each of the function templates in this section return an induction object of unspecified type having an induction value type and encapsulating an initial value i of that type and, optionally, a stride.

For each element in the input range, an algorithm over input sequence S computes an induction value from an induction variable and ordinal position p within S by the formula i + p * stride if a stride was specified or i + p otherwise. This induction value is passed to the element access function.

An induction object may refer to a live-out object to hold the final value of the induction sequence. When the algorithm using the induction object completes, the live-out object is assigned the value i + n * stride, where n is the number of elements in the input range.

template<class T>
unspecified induction(T&& var);template<class T, class S>
unspecified induction(T&& var, S stride);

Returns:: Aan induction object with induction value type remove_cv_t<remove_reference_t<T>>, initial value var, and (if specified) stride stride. If T is an lvalue reference to non-const type, then the object referenced by var becomes the live-out object for the induction object; otherwise there is no live-out object.

7.2.4

For loop

[parallel.alg.forloop]

template<class I, class... Rest>
void for_loop(no_deduce_t<I> start, I finish, Rest&&... rest);template<class ExecutionPolicy,
      class I, class... Rest>
void for_loop(ExecutionPolicy&& exec,
              no_deduce_t<I> start, I finish, Rest&&... rest);

template<class I, class S, class... Rest>
void for_loop_strided(no_deduce_t<I> start, I finish,
                      S stride, Rest&&... rest);template<class ExecutionPolicy,
      class I, class S, class... Rest>
void for_loop_strided(ExecutionPolicy&& exec,
                      no_deduce_t<I> start, I finish,
                      S stride, Rest&&... rest);

template<class I, class Size, class... Rest>
void for_loop_n(I start, Size n, Rest&&... rest);template<class ExecutionPolicy,
      class I, class Size, class... Rest>
void for_loop_n(ExecutionPolicy&& exec,
                I start, Size n, Rest&&... rest);
          
template<class I, class Size, class S, class... Rest>
void for_loop_n_strided(I start, Size n, S stride, Rest&&... rest);template<class ExecutionPolicy, 
      class I, class Size, class S, class... Rest>
void for_loop_n_strided(ExecutionPolicy&& exec,
                        I start, Size n, S stride, Rest&&... rest);

Requires:: For the overloads with an ExecutionPolicy, I shall be an integral type or meet the requirements of a forward iterator type; otherwise, I shall be an integral type or meet the requirements of an input iterator type. Size shall be an integral type and n shall be non-negative. S shall have integral type and stride shall have non-zero value. stride shall be negative only if I has integral type or meets the requirements of a bidirectional iterator. The rest parameter pack shall have at least one element, comprising objects returned by invocations of reduction ([parallel.alg.reduction]) and/or induction ([parallel.alg.induction]) function templates followed by exactly one invocable element-access function, f. For the overloads with an ExecutionPolicy, f shall meet the requirements of CopyConstructible; otherwise, f shall meet the requirements of MoveConstructible.
Effects:: Applies f to each element in the input sequence, as described below, with additional arguments corresponding to the reductions and inductions in the rest parameter pack. The length of the input sequence is:

n, if specified,

otherwise finish - start if neither n nor stride is specified,

otherwise 1 + (finish-start-1)/stride if stride is positive,

otherwise 1 + (start-finish-1)/-stride.

The first element in the input sequence is start. Each subsequent element is generated by adding stride to the previous element, if stride is specified, otherwise by incrementing the previous element. [ Note: As described in the C++ standard, section [algorithms.general], arithmetic on non-random-access iterators is performed using advance and distance. — end note ] [ Note: The order of the elements of the input sequence is important for determining ordinal position of an application of f, even though the applications themselves may be unordered. — end note ]
The first argument to f is an element from the input sequence. [ Note: if I is an iterator type, the iterators in the input sequence are not dereferenced before being passed to f. — end note ] For each member of the rest parameter pack excluding f, an additional argument is passed to each application of f as follows:

If the pack member is an object returned by a call to a reduction function listed in section [parallel.alg.reductions], then the additional argument is a reference to an accumulator of that reduction object.

If the pack member is an object returned by a call to induction, then the additional argument is the induction value for that induction object corresponding to the position of the application of f in the input sequence.
Complexity:: Applies f exactly once for each element of the input sequence.
Remarks:: If f returns a result, the result is ignored.

7.2.5

No vec

[parallel.alg.novec]

template<class F>
auto no_vec(F&& f) noexcept -> decltype(std::forward<F>(f)());

Effects:: Evaluates std::forward<F>(f)(). When invoked within an element access function in a parallel algorithm using vector_policy, if two calls to no_vec are horizontally matched within a wavefront application of an element access function over input sequence S, then the execution of f in the application for one element in S is sequenced before the execution of f in the application for a subsequent element in S; otherwise, there is no effect on sequencing.
Returns:: Tthe result of f.
Notes:: If f exits via an exception, then terminate will be called, consistent with all other potentially-throwing operations invoked with vector_policy execution. [ Example:
extern int* p; for_loop(vec, 0, n[&](int i) { y[i] +=y[i+1]; if(y[i] < 0) { no_vec([]{ *p++ = i; }); } });
The updates *p++ = i will occur in the same order as if the policy were seq. — end example ]

7.2.6

Ordered update class

[parallel.alg.ordupdate.class]

template<class T>
class ordered_update_t {
  T& ref_; // exposition only
public:
  ordered_update_t(T& loc) noexcept
    : ref_(loc) {}
  ordered_update_t(const ordered_update_t&) = delete;
  ordered_update_t& operator=(const ordered_update_t&) = delete;

  template <class U>
    auto operator=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ = std::move(rhs); }); }
  template <class U>
    auto operator+=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ += std::move(rhs); }); }
  template <class U>
    auto operator-=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ -= std::move(rhs); }); }
  template <class U>
    auto operator*=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ *= std::move(rhs); }); }
  template <class U>
    auto operator/=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ /= std::move(rhs); }); }
  template <class U>
    auto operator%=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ %= std::move(rhs); }); }
  template <class U>
    auto operator>>=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ >>= std::move(rhs); }); }
  template <class U>
    auto operator<<=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ <<= std::move(rhs); }); }
  template <class U>
    auto operator&=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ &= std::move(rhs); }); }
  template <class U>
    auto operator^=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ ^= std::move(rhs); }); }
  template <class U>
    auto operator|=(U rhs) const noexcept
      { return no_vec([&]{ return ref_ |= std::move(rhs); }); }

  auto operator++() const noexcept
    { return no_vec([&]{ return ++ref_; }); }
  auto operator++(int) const noexcept
    { return no_vec([&]{ return ref_++; }); }
  auto operator--() const noexcept
    { return no_vec([&]{ return --ref_; }); }
  auto operator--(int) const noexcept
    { return no_vec([&]{ return ref_--; }); }
};

An object of type ordered_update_t<T> is a proxy for an object of type T intended to be used within a parallel application of an element access function using a policy object of type vector_policy. Simple increments, assignments, and compound assignments to the object are forwarded to the proxied object, but are sequenced as though executed within a no_vec invocation. [ Note: The return-value deduction of the forwarded operations results in these operations returning by value, not reference. This formulation prevents accidental collisions on accesses to the return value. — end note ]

7.2.7

Ordered update function template

[parallel.alg.ordupdate.func]

template<T>
ordered_update_t<T> ordered_update(T& loc) noexcept;

Returns:: { loc }.

Data-Parallel Types

[parallel.simd]

9.1

General

[parallel.simd.general]

The data-parallel library consists of data-parallel types and operations on these types. A data-parallel type consists of elements of an underlying arithmetic type, called the element type. The number of elements is a constant for each data-parallel type and called the width of that type.

Throughout this Clause, the term data-parallel type refers to all supported 9.3.1 specializations of the simd and simd_mask class templates. A data-parallel object is an object of data-parallel type.

An element-wise operation applies a specified operation to the elements of one or more data-parallel objects. Each such application is unsequenced with respect to the others. A unary element-wise operation is an element-wise operation that applies a unary operation to each element of a data-parallel object. A binary element-wise operation is an element-wise operation that applies a binary operation to corresponding elements of two data-parallel objects.

Throughout this Clause, the set of vectorizable types for a data-parallel type comprises all cv-unqualified arithmetic types other than bool.

[ Note: The intent is to support acceleration through data-parallel execution resources, such as SIMD registers and instructions or execution units driven by a common instruction decoder. If such execution resources are unavailable, the interfaces support a transparent fallback to sequential execution. — end note ]

9.2

Header `<experimental/simd>` synopsis

[parallel.simd.synopsis]

namespace std::experimental {
inline namespace parallelism_v2 {
  namespace simd_abi {
  
    struct scalar {};
    template<int N> struct fixed_size {};
    template<class T> inline constexpr int max_fixed_size = implementation-defined;
    template<class T> using compatible = implementation-defined;
    template<class T> using native = implementation-defined;
  
    template<class T, size_t N> struct deduce { using type = see below; };
    template<class T, size_t N> using deduce_t = typename deduce<T, N>::type;
  }

  struct element_aligned_tag {};
  struct vector_aligned_tag {};
  template<size_t> struct overaligned_tag {};
  inline constexpr element_aligned_tag element_aligned{};
  inline constexpr vector_aligned_tag vector_aligned{};
  template<size_t N> inline constexpr overaligned_tag<N> overaligned{};

  // 9.2.2, simd type traits
  template<class T> struct is_abi_tag;
  template<class T> inline constexpr bool is_abi_tag_v = is_abi_tag<T>::value;

  template<class T> struct is_simd;
  template<class T> inline constexpr bool is_simd_v = is_simd<T>::value;

  template<class T> struct is_simd_mask;
  template<class T> inline constexpr bool is_simd_mask_v = is_simd_mask<T>::value;

  template<class T> struct is_simd_flag_type;
  template<class T> inline constexpr bool is_simd_flag_type_v =
    is_simd_flag_type<T>::value;

  template<class T, class Abi = simd_abi::compatible<T>> struct simd_size;
  template<class T, class Abi = simd_abi::compatible<T>>
  inline constexpr size_t simd_size_v = simd_size<T,Abi>::value;

  template<class T, class U = typename T::value_type> struct memory_alignment;
  template<class T, class U = typename T::value_type>
  inline constexpr size_t memory_alignment_v = memory_alignment<T,U>::value;

  // 9.3, Class template simd
  template<class T, class Abi = simd_abi::compatible<T>> class simd;
  template<class T> using native_simd = simd<T, simd_abi::native<T>>;
  template<class T, int N> using fixed_size_simd = simd<T, simd_abi::fixed_size<N>>;

  // 9.5, Class template simd_mask
  template<class T, class Abi = simd_abi::compatible<T>> class simd_mask;
  template<class T> using native_simd_mask = simd_mask<T, simd_abi::native<T>>;
  template<class T, int N> using fixed_size_simd_mask =
    simd_mask<T, simd_abi::fixed_size<N>>;

  // 9.4.5, Casts
  template<class T, class U, class Abi> see below simd_cast(const simd<U, Abi>&);
  template<class T, class U, class Abi> see below static_simd_cast(const simd<U, Abi>&);

  template<class T, class Abi>
    fixed_size_simd<T, simd_size_v<T, Abi>>
      to_fixed_size(const simd<T, Abi>&) noexcept;
  template<class T, class Abi>
    fixed_size_simd_mask<T, simd_size_v<T, Abi>>
      to_fixed_size(const simd_mask<T, Abi>&) noexcept;
  template<class T, int N>
    native_simd<T> to_native(const fixed_size_simd<T, N>&) noexcept;
  template<class T, int N>
    native_simd_mask<T> to_native(const fixed_size_simd_mask<T, N>&) noexcept;
  template<class T, int N>
    simd<T> to_compatible(const fixed_size_simd<T, N>&) noexcept;
  template<class T, int N>
    simd_mask<T> to_compatible(const fixed_size_simd_mask<T, N>&) noexcept;

  template<size_t... Sizes, class T, class Abi>
    tuple<simd<T, simd_abi::deduce_t<T, Sizes>>...>
      split(const simd<T, Abi>&);
  template<size_t... Sizes, class T, class Abi>
    tuple<simd_mask<T, simd_mask_abi::deduce_t<T, Sizes>>...>
      split(const simd_mask<T, Abi>&);
  template<class V, class Abi>
    array<V, simd_size_v<typename V::value_type, Abi> / V::size()>
      split(const simd<typename V::value_type, Abi>&);
  template<class V, class Abi>
    array<V, simd_size_v<typename V::value_type, Abi> / V::size()>
      split(const simd_mask<typename V::value_type, Abi>&);

  template<class T, class... Abis>
    simd<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>>
      concat(const simd<T, Abis>&...);
  template<class T, class... Abis>
    simd_mask<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>>
      concat(const simd_mask<T, Abis>&...);

  // 9.6.4, Reductions
  template<class T, class Abi> bool all_of(const simd_mask<T, Abi>&) noexcept;
  template<class T, class Abi> bool any_of(const simd_mask<T, Abi>&) noexcept;
  template<class T, class Abi> bool none_of(const simd_mask<T, Abi>&) noexcept;
  template<class T, class Abi> bool some_of(const simd_mask<T, Abi>&) noexcept;
  template<class T, class Abi> int popcount(const simd_mask<T, Abi>&) noexcept;
  template<class T, class Abi> int find_first_set(const simd_mask<T, Abi>&);
  template<class T, class Abi> int find_last_set(const simd_mask<T, Abi>&);

  bool all_of(see below) noexcept;
  bool any_of(see below) noexcept;
  bool none_of(see below) noexcept;
  bool some_of(see below) noexcept;
  int popcount(see below) noexcept;
  int find_first_set(see below) noexcept;
  int find_last_set(see below) noexcept;

  // 9.2.3, Class templates const_where_expression and where_expression
  template<class M, class T> class const_where_expression;
  template<class M, class T> class where_expression;

  // 9.6.5, Where functions
  template<class T> struct nodeduce { using type = T; }; // exposition only
  template<class T> using nodeduce_t = typename nodeduce<T>::type; // exposition only

  template<class T, class Abi>
    where_expression<simd_mask<T, Abi>, simd<T, Abi>>
      where(const typename simd<T, Abi>::mask_type&, simd<T, Abi>&) noexcept;

  template<class T, class Abi>
    const_where_expression<simd_mask<T, Abi>, simd<T, Abi>>
      where(const typename simd<T, Abi>::mask_type&, const simd<T, Abi>&) noexcept;

  template<class T, class Abi>
    where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>>
      where(const nodeduce_t<simd_mask<T, Abit>>&, simd_mask<T, Abi>&) noexcept;

  template<class T, class Abi>
    const_where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>>
      where(const nodeduce_t<simd_mask<T, Abit>>&, const simd_mask<T, Abi>&) noexcept;

  template<class T>
    where_expression<bool, T>
      where(see below k, T& d) noexcept;

  template<class T>
    const_where_expression<bool, T>
      where(see below k, const T& d) noexcept;

  // 9.4.4, Reductions
  template<class T, class Abi, class BinaryOperation = plus<>>
    T reduce(const simd<T, Abi>&,
             BinaryOperation = {});

  template<class M, class V, class BinaryOperation>
    typename V::value_type reduce(const const_where_expression<M, V>& x,
                                  typename V::value_type identity_element,
                                  BinaryOperation binary_op);
  template<class M, class V>
    typename V::value_type reduce(const const_where_expression<M, V>& x,
                                  plus<>
                                  binary_op = {});
  template<class M, class V>
    typename V::value_type reduce(const const_where_expression<M, V>& x,
                                  multiplies<>
                                  binary_op);
  template<class M, class V>
    typename V::value_type reduce(const const_where_expression<M, V>& x,
                                  bit_and<>
                                  binary_op);
  template<class M, class V>
    typename V::value_type reduce(const const_where_expression<M, V>& x,
                                  bit_or<>
                                  binary_op);
  template<class M, class V>
    typename V::value_type reduce(const const_where_expression<M, V>& x,
                                  bit_xor<>
                                  binary_op);

  template<class T, class Abi>
    T hmin(const simd<T, abi>&);
  template<class T, class Abi>
    typename V::value_type hmin(const const_where_expression<M, V>&);
  template<class T, class Abi>
    T hmax(const simd<T, abi>&);
  template<class T, class Abi>
    typename V::value_type hmax(const const_where_expression<M, V>&);

  // 9.4.6, Algorithms
  template<class T, class Abi>
    simd<T, Abi>
      min(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;
  template<class T, class Abi>
    simd<T, Abi>
      max(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;
  template<class T, class Abi>
    pair<simd<T, Abi>, simd<T, Abi>>
      minmax(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;
  template<class T, class Abi>
    simd<T, Abi>
      clamp(const simd<T, Abi>& v,
            const simd<T, Abi>& lo,
            const simd<T, Abi>& hi) noexcept;
}
}

The header <experimental/simd> defines class templates, tag types, trait types, and function templates for element-wise operations on data-parallel objects.

9.2.1

`simd` ABI tags

[parallel.simd.abi]

namespace simd_abi {
  struct scalar {};
  template<int N> struct fixed_size {};
  template<class T> inline constexpr in max_fixed_size = implementation-defined;
  template<class T> using compatible = implementation-defined;
  template<class T> using native = implementation-defined;
}

An ABI tag is a type in the std::experimental::parallelism_v2::simd_abi namespace that indicates a choice of size and binary representation for objects of data-parallel type. [ Note: The intent is for the size and binary representation to depend on the target architecture. — end note ] The ABI tag, together with a given element type implies a number of elements. ABI tag types are used as the second template argument to simd and simd_mask.

[ Note: The ABI tag is orthogonal to selecting the machine instruction set. The selected machine instruction set limits the usable ABI tag types, though (see 9.3.1). The ABI tags enable users to safely pass objects of data-parallel type between translation unit boundaries (e.g. function calls or I/O). — end note ]

Use of the scalar tag type requires data-parallel types to store a single element (i.e., simd<T, simd_abi::scalar>::size() returns 1). [ Note: scalar is not an alias for fixed_size<1>. — end note ]

The value of max_fixed_size<T> is at least 32.

Use of the simd_abi::fixed_size<N> tag type requires data-parallel types to store N elements (i.e. simd<T, simd_abi::fixed_size<N>>::size() is N). simd<T, fixed_size<N>> and simd_mask<T, fixed_size<N>> with N > 0 and N <= max_fixed_size<T> shall be supported. Additionally, for every supported simd<T, Abi> (see 9.3.1), where Abi is an ABI tag that is not a specialization of simd_abi::fixed_size, N == simd<T, Abi>::size() shall be supported.

[ Note: It is unspecified whether simd<T, fixed_size<T, fixed_size<N>> with N > max_fixed_size<T> is supported. The value of max_fixed_size<T> can depend on compiler flags and can change between different compiler versions. — end note ]

[ Note: An implementation can forego ABI compatibility between differently compiled translation units for simd and simd_mask specializations using the same simd_abi::fixed_size<N> tag. Otherwise, the efficiency of simd<T, Abi> is likely to be better than for simd<T, fixed_size<simd_size_v<T, Abi>>> (with Abi not a specialization of simd_abi::fixed_size). — end note ]

An implementation may define additional extended ABI tag types in the std::experimental::parallelism_v2::simd_abi namespace, to support other forms of data-parallel computation.

compatible<T> is an implementation-defined alias for an ABI tag. [ Note: The intent is to use the ABI tag producing the most efficient data-parallel execution for the element type T that ensures ABI compatibility between translation units on the target architecture. — end note ]

[ Example: Consider a target architecture supporting the extended ABI tags __simd128 and __simd256, where the __simd256 type requires an optional ISA extension on said architecture. Also, the target architecture does not support long double with either ABI tag. The implementation therefore defines

compatible<T> is an alias for __simd128 for all vectorizable T, except long double, and
compatible<long double> as an alias for scalar.

— end example ]

native<T> is an implementation-defined alias for an ABI tag. [ Note: The intent is to use the ABI tag producing the most efficient data-parallel execution for the element type T that is supported on the currently targeted system. For target architectures without ISA extensions, the native<T> and compatible<T> aliases will likely be the same. For target architectures with ISA extensions, compiler flags may influence the native<T> alias while compatible<T> will be the same independent of such flags. — end note ]

[ Example: Consider a target architecture supporting the extended ABI tags __simd128 and __simd256, where hardware support for __simd256 only exists for floating-point types. The implementation therefore defines native<T> as an alias for

__simd256 if T is a floating-point type, and
__simd128 otherwise.

— end example ]

template<T, size_t N> struct deduce { using type = see below; };

The member type shall be present if and only if

T is a vectorizable type, and
simd_abi::fixed_size<N> is supported (see 9.2.1).

Where present, the member typedef type shall name an ABI tag type that satisfies

simd_size<T, type> == N, and
simd<T, type> is default constructible (see 9.3.1).

If N is 1, the member typedef type is simd_abi::scalar. Otherwise, if there are multiple ABI tag types that satisfy the constraints, the member typedef type is implementation-defined. [ Note: It is expected that extended ABI tags can produce better optimizations and thus are preferred over simd_abi::fixed_size<N>. — end note ]

The behavior of a program that adds specializations for deduce is undefined.

9.2.2

`simd` type traits

[parallel.simd.traits]

template<class T> struct is_abi_tag { see below };

The type is_abi_tag<T> is a UnaryTypeTrait with a BaseCharacteristic of true_type if T is a standard or extended ABI tag, and false_type otherwise.

The behavior of a program that adds specializations for is_abi_tag is undefined.

template<class T> struct is_simd { see below };

The type is_simd<T> is a UnaryTypeTrait with a BaseCharacteristic of true_type if T is a specialization of the simd class template, and false_type otherwise.

The behavior of a program that adds specializations for is_simd is undefined.

template<class T> struct is_simd_mask { see below };

The type is_simd_mask<T> is a UnaryTypeTrait with a BaseCharacteristic of true_type if T is a specialization of the simd_mask class template, and false_type otherwise.

The behavior of a program that adds specializations for is_simd_mask is undefined.

template<class T> struct is_simd_flag_type { see below };

The type is_simd_flag_type<class T> is a UnaryTypeTrait with a BaseCharacteristic of true_type if T is one of

element_aligned_tag, or
vector_aligned_tag, or
overaligned_tag<N> with N > 0 and N an integral power of two,

and false_type otherwise.

The behavior of a program that adds specializations for is_simd_flag_type is undefined.

template<class T, class Abi = simd_abi::compatible<T>> struct simd_size { see below };

simd_size<T, Abi> shall have a member value if and only if

T is a vectorizable type, and
is_abi_tag_v<Abi> is true.

[ Note: The rules are different from those in (9.3.1). — end note ]

If value is present, the type simd_size<T, Abi> is a BinaryTypeTrait with a BaseCharacteristic of integral_constant<size_t, N> with N equal to the number of elements in a simd<T, Abi> object. [ Note: If simd<T, Abi> is not supported for the currently targeted system, simd_size<T, Abi>::value produces the value simd<T, Abi>::size() would return if it were supported. — end note ]

The behavior of a program that adds specializations for simd_size is undefined.

template<class T, class U = typename T::value_type> struct memory_alignment { see below };

memory_alignment<T, U> shall have a member value if and only if

is_simd_mask_v<T> is true and U is bool, or
is_simd_v<T> is true and U is a vectorizable type.

If value is present, the type memory_alignment<T, U> is a BinaryTypeTrait with a BaseCharacteristic of integral_constant<size_t, N> for some implementation-defined N (see 9.3.4 and 9.5.3). [ Note: value identifies the alignment restrictions on pointers used for (converting) loads and stores for the give type T on arrays of type U. — end note ]

The behavior of a program that adds specializations for memory_alignment is undefined.

9.2.3

Class templates `const_where_expression` and `where_expression`

[parallel.simd.whereexpr]

template<class M, class T> class const_where_expression {
  const M mask; // exposition only
  T& data; // exposition only

public:
  const_where_expression(const const_where_expression&) = delete;
  const_where_expression& operator=(const const_where_expression&) = delete;

  T operator-() const &&;
  T operator+() const &&;
  T operator~() const &&;

  template<class U, class Flags> void copy_to(U* mem, Flags f) const &&;
};

template<class M, class T>
class where_expression : public const_where_expression<M, T> {
public:
  template<class U> void operator=(U&& x) &&;
  template<class U> void operator+=(U&& x) &&;
  template<class U> void operator-=(U&& x) &&;
  template<class U> void operator*=(U&& x) &&;
  template<class U> void operator/=(U&& x) &&;
  template<class U> void operator%=(U&& x) &&;
  template<class U> void operator&=(U&& x) &&;
  template<class U> void operator|=(U&& x) &&;
  template<class U> void operator^=(U&& x) &&;
  template<class U> void operator<<=(U&& x) &&;
  template<class U> void operator>>=(U&& x) &&;

  void operator++() &&
  void operator++(int) &&
  void operator--() &&
  void operator--(int) &&

  template<class U, class Flags> void copy_from(const U* mem, Flags) &&;
};

The class templates const_where_expression and where_expression abstract the notion of selecting elements of a given object of arithmetic or data-parallel type.

The first templates argument M shall be cv-unqualified bool or a cv-unqualified simd_mask specialization.

If M is bool, T shall be a cv-unqualified arithmetic type. Otherwise, T shall either be M or typename M::simd_type.

In this subclause, if M is bool, data[0] is used interchangably for data, mask[0] is used interchangably for mask, and M::size() is used interchangably for 1.

The selected indices signify the integers i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }. The selected elements signify the elements data[i] for all selected indices i.

In this subclause, the type value_type is an alias for T if M is bool, or an alias for typename T::value_type if is_simd_mask_v<M> is true.

[ Note: The where functions 9.6.5 initialize mask with the first argument to where and data with the second argument to where. — end note ]


  T operator-() const &&;
  T operator+() const &&;
  T operator~() const &&;

Returns:: A copy of data with the indicated unary operator applied to all selected elements.
Throws:: Nothing.


template<class U, class Flags> void copy_to(U* mem, Flags) const &&;

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligned by memory_alignment_v<T, U>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligned by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). If M is not bool, the largest i ∊ [0, M::size()) where mask[i] is true is less than the number of values pointed to by mem.
Effects:: Copies the selected elements as if mem[i] = static_cast(data[i]) for all selected indices i.
Throws:: Nothing.
Remarks:: This function shall not participate in overload resolution unless

is_simd_flag_type_v<Flags> is true, and

either

U is bool and value_type is bool, or

U is a vectorizable type and value_type is not bool.


template<class U> void operator=(U&& x) &&;

Effects:: Replaces data[i] with static_cast<T>(std::forward(x))[i] for all selected indices i.
Remarks:: This operator shall not participate in overload resolution unless U is convertible to T.


template<class U> void operator+=(U&& x) &&;
template<class U> void operator-=(U&& x) &&;
template<class U> void operator*=(U&& x) &&;
template<class U> void operator/=(U&& x) &&;
template<class U> void operator%=(U&& x) &&;
template<class U> void operator&=(U&& x) &&;
template<class U> void operator|=(U&& x) &&;
template<class U> void operator^=(U&& x) &&;
template<class U> void operator<<=(U&& x) &&;
template<class U> void operator>>=(U&& x) &&;

Effects:: Replaces data[i] with static_cast<T>(data @ std::forward(x))[i] (where @ denotes the indicated operator) for all selected indices i.
Remarks:: Each of these operators shall not participate in overload resolution unless the return type of data @ std::forward(x) is convertible to T. It is unspecified whether the binary operator, implied by the compound assignment operator, is executed on all elements or only on the selected elements.


void operator++() &&;
void operator++(int) &&;
void operator--() &&;
void operator--(int) &&;

Effects:: Applies the indicated operator to the selected elements.
Remarks:: Each of these operators shall not participate in overload resolution unless the indicated operator can be applied to objects of type T.


template<class U, class Flags> void copy_from(const U* mem, Flags) &&;

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligned by memory_alignment_v<T, U>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligned by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). If is_simd_flag_type_v is true, for all selected indices i, i shall be less than the number of values pointed to by mem.
Effects:: Replaces the selected elements as if data[i] = static_cast<value_type>(mem[i]) for all selected indices i.
Remarks:: This function shall not participate in overload resolution unless

is_simd_flag_type_v<Flags> is true, and

either

U is bool and value_type is bool, or

U is a vectorizable type and value_type is not bool.

9.3

Class template `simd`

[parallel.simd.class]

9.3.1

Class template `simd` overview

[parallel.simd.overview]

template<class T, class Abi> class simd {
public:
  using value_type = T;
  using reference = see below;
  using mask_type = simd_mask<T, Abi>
  using abi_type = Abi;

  static constexpr size_t size() noexcept;

  simd() = default;

  // implicit conversion constructor
  template<class U> simd(const simd<U, simd_abi::fixed_size<size()>>&);

  // implicit broadcast constructor (see below for constraints)
  template<class U> simd(U&& value);

  // generator constructor (see below for constraints)
  template<class G> explicit simd(G&& gen);

  // load constructor
  template<class U, class Flags> simd(const U* mem, Flags f);

  // 9.3.4, Copy functions
  template<class U, class Flags> copy_from(const U* mem, Flags f);
  template<class U, class Flags> copy_to(U* mem, Flags f);

  // 9.3.5, Subscript operators
  reference operator[](size_t);
  value_type operator[](size_t) const;

  // 9.3.6, Unary operators
  simd& operator++();
  simd operator++(int);
  simd& operator--();
  simd operator--(int);
  mask_type operator!() const;
  simd operator~() const;
  simd operator+() const;
  simd operator-() const;

  // 9.4.1, Binary operators
  friend simd operator+(const simd&, const simd&);
  friend simd operator-(const simd&, const simd&);
  friend simd operator*(const simd&, const simd&);
  friend simd operator/(const simd&, const simd&);
  friend simd operator%(const simd&, const simd&);
  friend simd operator&(const simd&, const simd&);
  friend simd operator|(const simd&, const simd&);
  friend simd operator^(const simd&, const simd&);
  friend simd operator<<(const simd&, const simd&);
  friend simd operator>>(const simd&, const simd&);
  friend simd operator<<(const simd&, int);
  friend simd operator>>(const simd&, int);

  // 9.4.2, Compound assignment
  friend simd& operator+=(simd&, const simd&);
  friend simd& operator-=(simd&, const simd&);
  friend simd& operator*=(simd&, const simd&);
  friend simd& operator/=(simd&, const simd&);
  friend simd& operator%=(simd&, const simd&);
  friend simd& operator&=(simd&, const simd&);
  friend simd& operator|=(simd&, const simd&);
  friend simd& operator^=(simd&, const simd&);
  friend simd& operator<<=(simd&, const simd&);
  friend simd& operator>>=(simd&, const simd&);
  friend simd& operator<<=(simd&, int);
  friend simd& operator>>=(simd&, int);

  // 9.4.3, Compare operators
  friend mask_type operator==(const simd&, const simd&);
  friend mask_type operator!=(const simd&, const simd&);
  friend mask_type operator>=(const simd&, const simd&);
  friend mask_type operator<=(const simd&, const simd&);
  friend mask_type operator>(const simd&, const simd&);
  friend mask_type operator<(const simd&, const simd&);
};

The class template simd is a data-parallel type. The width of a given simd specialization is a constant expression, determined by the template parameters.

Every specialization of simd shall be a complete type. The specialization simd<T, Abi> is supported if T is a vectorizable type and

Abi is simd_abi::scalar, or
Abi is simd_abi::fixed_size<N>, with N is constrained as defined in 9.2.1.

If Abi is an extended ABI tag, it is implementation-defined whether simd<T, Abi> is supported. [ Note: The intent is for implementations to decide on the basis of the currently targeted system. — end note ]

If simd<T, Abi> is not supported, the specialization shall have a deleted default constructor, deleted destructor, deleted copy constructor, and deleted copy assignment.

[ Example: Consider an implementation that defines the extended ABI tags __simd_x and __gpu_y. When the compiler is invoked to translate to a machine that has support for the __simd_x ABI tag for all arithmetic types other than long double and no support for the __gpu_y ABI tag, then:

simd<T, simd_abi::__gpu_y> is not supported for any T and has a deleted constructor.
simd<long double, simd_abi::__simd_x> is not supported and has a deleted constructor.
simd<double, simd_abi::__simd_x> is supported.
simd<long double, simd_abi::scalar> is supported.

— end example ]

Default intialization performs no initialization of the elements; value-initialization initializes each element with T(). [ Note: Thus, default initialization leaves the elements in an indeterminate state. — end note ]


static constexpr size_t size() noexcept;

Returns:: The width of simd<T, Abi>.

Implementations should enable explicit conversion from and to implementation-defined types. This adds one or more of the following declarations to class simd:

explicit operator implementation-defined() const; explicit simd(const implementation-defined& init);

[ Example: Consider an implementation that supports the type __vec4f and the function __vec4f _vec4f_addsub(__vec4f, __vec4f) for the currently targeted system. A user may require the use of _vec4f_addsub for maximum performance and thus writes:


using V = simd<float, simd_abi::__simd128>;
V addsub(V a, V b) {
  return static_cast<V>(_vec4f_addsub(static_cast<__vec4f>(a), static_cast<__vec4f>(b)));
}

— end example ]

9.3.2

Element references

[parallel.simd.reference]

A reference is an object that refers to an element in a simd or simd_mask object. reference::value_type is the same type as simd::value_type or simd_mask::value_type, respectively.

Class reference is for exposition only. An implementation is permitted to provide equivalent functionality without providing a class with this name.

        
class reference // exposition only
{
public:
  reference() = delete;
  reference(const reference&) = delete;

  operator value_type() const noexcept;

  template<class U> reference operator=(U&& x) &&;

  template<class U> reference operator+=(U&& x) &&;
  template<class U> reference operator-=(U&& x) &&;
  template<class U> reference operator*=(U&& x) &&;
  template<class U> reference operator/=(U&& x) &&;
  template<class U> reference operator%=(U&& x) &&;
  template<class U> reference operator|=(U&& x) &&;
  template<class U> reference operator&=(U&& x) &&;
  template<class U> reference operator^=(U&& x) &&;
  template<class U> reference operator<<=(U&& x) &&;
  template<class U> reference operator>>=(U&& x) &&;

  reference operator++() &&;
  value_type operator++(int) &&;
  reference operator--() &&;
  value_type operator--(int) &&;

  friend void swap(reference&& a, reference&& b) noexcept;
  friend void swap(value_type&& a, reference&& b) noexcept;
  friend void swap(reference&& a, value_type&& b) noexcept;
};

operator value_type() const noexcept;

Returns:: The value of the element referred to by *this.

template<class U> reference operator=(U&& x) &&;

Effects:: Replaces the referred to element in simd or simd_mask with static_cast<value_type>(std::forward(x)).
Returns:: A copy of *this.
Remarks:: This function shall not participate in overload resolution unless declval<value_type &>() = std::forward>U>(x) is well-formed.


template<class U> reference operator+=(U&& x) &&;
template<class U> reference operator-=(U&& x) &&;
template<class U> reference operator*=(U&& x) &&;
template<class U> reference operator/=(U&& x) &&;
template<class U> reference operator%=(U&& x) &&;
template<class U> reference operator|=(U&& x) &&;
template<class U> reference operator&=(U&& x) &&;
template<class U> reference operator^=(U&& x) &&;
template<class U> reference operator<<=(U&& x) &&;
template<class U> reference operator>>=(U&& x) &&;

Effects:: Applies the indicated compound operator to the referred to element in simd or simd_mask and std::forward(x).
Returns:: A copy of *this.
Remarks:: This function shall not participate in overload resolution unless declval<value_type &>() @= std::forward(x) (where @= denotes the indicated compound assignment operator) is well-formed.


reference operator++() &&;
reference operator--() &&;

Effects:: Applies the indicated operator to the referred to element in simd or simd_mask.
Returns:: A copy of *this.
Remarks:: This function shall not participate in overload resolution unless the indicated operator can be applied to objects of type value_type.


value_type operator++(int) &&;
value_type operator--(int) &&;

Effects:: Applies the indicated operator to the referred to element in simd or simd_mask.
Returns:: A copy of the referred to element before applying the indicated operator.
Remarks:: This function shall not participate in overload resolution unless the indicated operator can be applied to objects of type value_type.


friend void swap(reference&& a, reference&& b) noexcept;
friend void swap(value_type& a, reference&& b) noexcept;
friend void swap(reference&& a, value_type& b) noexcept;

Effects:: Exchanges the values a and b refer to.

9.3.3

Constructors

[parallel.simd.ctor]

template<class U> simd(U&&);

Effects:: Constructs an object with each element initialized to the value of the argument after conversion to value_type.
Throws:: Any exception thrown while converting the argument to value_type.
Remarks:: Let From identify the type remove_cv_t<remove_reference_t>. This constructor shall not participate in overload resolution unless:

From is a vectorizable type and every possibly value of From can be represented with type value_type, or

From is not an arithmetic type and is implicitly convertible to value_type, or

From is int, or
From is unsigned int and value_type is an unsigned integral type.

template<class U> simd(const simd<U, simd_abi::fixed_size<size()>>& x);

Effects:: Constructs an object where the i-th element equals static_cast<T>(x[i]) for all i ∊ [0, size()).
Remarks:: This constructor shall not participate in overload resolution unless

abi_type is simd_abi::fixed_size<size()>, and

every possible value of U can be represented with type value_type, and

if both U and value_type are integral, the integer conversion rank [conv.rank] of value_type is greater than the integer conversion rank of U.

template<class G> simd(G&& gen);

Effects:: Constructs an object where the i-th element is initialized to gen(integral_constant<size_t, i>()).
Remarks:: This constructor shall not participate in overload resolution unless simd(gen(integral_constant<size_t, i>())) is well-formed for all i ∊ [0, size()).

template<class U, class Flags> simd(const U* mem, Flags);

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligend by memory_alignment_v<simd, U>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligned by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). [mem, mem + size()) is a valid range.
Effects:: Constructs an object where the i-th element is initialized to static_cast<T>(mem[i]) for all i ∊ [0, size()).
Remarks:: This constructor shall not participate in overload resolution unless

is_simd_flag_type_v<Flags> is true, and

U is a vectorizable type.

9.3.4

Copy functions

[parallel.simd.copy]

template<class U, class Flags> void copy_from(const U* mem, Flags);

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligned by memory_alignment_v<simd, U>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligend by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). [mem, mem + size()) is a valid range.
Effects:: Replaces the elements of the simd object such that the i-th element is assigned with static_cast<T>(mem[i]) for all i ∊ [0, size()).
Remarks:: This function shall not participate in overload resolution unless

is_simd_flag_type_v<Flags> is true, and

U is a vectorizable type.

template<class U, class Flags> void copy_to(U* mem, Flags) const;

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligned by memory_alignment_v<simd, U>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligned by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). [mem, mem + size()) is a valid range.
Effects:: Copies all simd elements as if mem[i] = static_cast(operator[](i)) for all i ∊ [0, size()).
Remarks:: This function shall not participate in overload resolution unless

is_simd_flag_type_v<Flags> is true, and

U is a vectorizable type.

9.3.5

Subscript operators

[parallel.simd.subscr]

reference operator[](size_t i);

Requires:: i < size().
Returns:: A reference (see 9.3.2) referring to the i-th element.
Throws:: Nothing.

value_type operator[](size_t i) const;

Requires:: i < size().
Returns:: The value of the i-th element.
Throws:: Nothing.

9.3.6

Unary operators

[parallel.simd.unary]

Effects in this subclause are applied as unary element-wise operations.

simd& operator++();

Effects:: Increments every element by one.
Returns:: *this.
Throws:: Nothing.

simd operator++(int);

Effects:: Increments every element by one.
Returns:: A copy of *this before incrementing.
Throws:: Nothing.

simd& operator--();

Effects:: Decrements every element by one.
Returns:: *this.
Throws:: Nothing.

simd operator--(int);

Effects:: Decrements every element by one.
Returns:: A copy of *this before decrementing.
Throws:: Nothing.

mask_type operator!() const;

Returns:: A simd_mask object with the i-th element set to !operator[](i) for all i ∊ [0, size()).
Throws:: Nothing.

simd operator~() const;

Returns:: A simd object where each bit is the inverse of the corresponding bit in *this.
Throws:: Nothing.
Remarks:: This operator shall not participate in overload resolution unless T is an integral type.

simd operator+() const;

Returns:: *this.
Throws:: Nothing.

simd operator-() const;

Returns:: A simd object where the i-th element is initialized to -operator[](i) for all i ∊ [0, size()).
Throws:: Nothing.

9.4

Non-member operations

[parallel.simd.nonmembers]

9.4.1

Binary operators

[parallel.simd.binary]


friend simd operator+(const simd& lhs, const simd& rhs);
friend simd operator-(const simd& lhs, const simd& rhs);
friend simd operator*(const simd& lhs, const simd& rhs);
friend simd operator/(const simd& lhs, const simd& rhs);
friend simd operator%(const simd& lhs, const simd& rhs);
friend simd operator&(const simd& lhs, const simd& rhs);
friend simd operator|(const simd& lhs, const simd& rhs);
friend simd operator^(const simd& lhs, const simd& rhs);
friend simd operator<<(const simd& lhs, const simd& rhs);
friend simd operator>>(const simd& lhs, const simd& rhs);

Returns:: A simd object initialized with the results of the element-wise application of the indicated operator.
Throws:: Nothing.
Remarks:: Each of these operators shall not participate in overload resolution unless the indicated operator can be applied to objects of type value_type.


friend simd operator<<(const simd& v, int n);
friend simd operator>>(const simd& v, int n);

Returns:: A simd object where the i-th element is initialized to the result of applying the indicated operator to v[i] and n for all i ∊ [0, size()).
Throws:: Nothing.
Remarks:: These operators shall not participate in overload resolution unless the indicated operator can be applied to objects of type value_type.

9.4.2

Compound assignment

[parallel.simd.cassign]


friend simd& operator+=(simd& lhs, const simd& rhs);
friend simd& operator-=(simd& lhs, const simd& rhs);
friend simd& operator*=(simd& lhs, const simd& rhs);
friend simd& operator/=(simd& lhs, const simd& rhs);
friend simd& operator%=(simd& lhs, const simd& rhs);
friend simd& operator&=(simd& lhs, const simd& rhs);
friend simd& operator|=(simd& lhs, const simd& rhs);
friend simd& operator^=(simd& lhs, const simd& rhs);
friend simd& operator<<=(simd& lhs, const simd& rhs);
friend simd& operator>>=(simd& lhs, const simd& rhs);
friend simd& operator<<=(simd& lhs, int n);
friend simd& operator>>=(simd& lhs, int n);

Effects:: These operators perform the indicated binary element-wise operation.
Returns:: lhs.
Throws:: Nothing.
Remarks:: These operators shall not participate in overload resolution unless the indicated operator can be applied to objects of type value_type.

9.4.3

Compare operators

[parallel.simd.comparison]


friend mask_type operator==(const simd&, const simd&);
friend mask_type operator!=(const simd&, const simd&);
friend mask_type operator>=(const simd&, const simd&);
friend mask_type operator<=(const simd&, const simd&);
friend mask_type operator>(const simd&, const simd&);
friend mask_type operator<(const simd&, const simd&);

Returns:: A simd_mask object initialized with the results of the element-wise application of the indicated operator.
Throws:: Nothing.

9.4.4

Reductions

[parallel.simd.reductions]

In this subclause, BinaryOperation shall be a binary element-wise operation.


template<class T, class Abi, class BinaryOperation = plus<>>
T reduce(const simd<T, Abi>& x, BinaryOperation binary_op = {});

Requires:: binary_op shall be callable with two arguments of type T returning T, or callable with two arguments of type simd<T, A1> returning simd<T, A1> for every A1 that is an ABI tag type.
Returns:: GENERALIZED_SUM(binary_op, x.data[i], ...) for all i ∊ [0, size()).
Throws:: Any exception thrown from binary_op.


template<class M, class V, class BinaryOperation>
typename V::value_type reduce(const const_where_expression<M, V>& x, typename V::value_type identity_element,
                              BinaryOperation binary_op = {});

Requires:: binary_op shall be callable with two arguments of type T returning T, or callable with two arguments of type simd<T, A1> returning simd<T, A1> for every A1 that is an ABI tag type. The results of binary_op(identity_element, x) and binary_op(x, identity_element) shall be equal to x for all finite values x representable by V::value_type.
Returns:: If none_of(x.mask), returns identity_element. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...) for all i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }.
Throws:: Any exception thrown from binary_op.


template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, plus<> binary_op);

Returns:: If none_of(x.mask), returns 0. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...) for all i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }.
Throws:: Nothing.


template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, multiplies<> binary_op);

Returns:: If none_of(x.mask), returns 1. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...) for all i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }.
Throws:: Nothing.


template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, bit_and<> binary_op);

Requires:: is_integral_v<V::value_type> is true.
Returns:: If none_of(x.mask), returns ~V::value_type(). Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...) for all i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }.
Throws:: Nothing.


template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, bit_or<> binary_op);
template<class M, class V>
typename V::value_type reduce(const const_where_expression<M, V>& x, bit_xor<> binary_op);

Requires:: is_integral_v<V::value_type> is true.
Returns:: If none_of(x.mask), returns 0. Otherwise, returns GENERALIZED_SUM(binary_op, x.data[i], ...) for all i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }.
Throws:: Nothing.

template<class T, class Abi> T hmin(const simd<T, Abi>& x);

Returns:: The value of an element x[j] for which x[j] <= x[i] for all i ∊ [0, size()).
Throws:: Nothing.

template<class T, class V> typename V::value_type hmin(const const_where_expression<M, V>& x);

Returns:: If none_of(x.mask), the return value is numeric_limits<V::value_type>::max(). Otherwise, returns the value of an element x.data[j] for which x.mask[j] == true and x.data[j] <= x.data[i] for all i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }.
Throws:: Nothing.

template<class T, class Abi> T hmax(const simd<T, Abi>& x);

Returns:: The value of an element x[j] for which x[j] >= x[i] for all i ∊ [0, size()).
Throws:: Nothing.

template<class T, class V> typename V::value_type hmax(const const_where_expression<M, V>& x);

Returns:: If none_of(x.mask), the return value is numeric_limits<V::value_type>::lowest(). Otherwise, returns the value of an element x.data[j] for which x.mask[j] == true and x.data[j] >= x.data[i] for all i ∊ {j ∊ ℕ₀ ∣ j < M::size() ⋀ mask[j] }.
Throws:: Nothing.

9.4.5

Casts

[parallel.simd.casts]

template<class T, class U, class Abi> see below simd_cast(const simd<U, Abi>& x)

Returns:: A simd object with the i-th element initialized to static_cast<To>(x[i]) for all i ∊ [0, size()).
Throws:: Nothing.
Remarks:: The function shall not participate in overload resolution unless

every possible value of type U can be represented with type To, and

either

is_simd_v<T> is false, or

T::size() == simd<U, Abi>::size() is true.

template<class T, class U, class Abi> see below static_simd_cast(const simd<U, Abi>& x);

Returns:: A simd object with the i-th element initialized to static_cast<To>(x[i]) for all i ∊ [0, size()).
Throws:: Nothing.
Remarks:: The function shall not participate in overload resolution unless either

is_simd_v<T> is false, or

T::size() == simd<U, Abi>::size() is true.


template<class T, class Abi>
fixed_size_simd<T, simd_size_v<T, Abi>> to_fixed_size(const simd<T, Abi>& x) noexcept;
template<class T, class Abi>
fixed_size_simd_mask<T, simd_size_v<T, Abi>> to_fixed_size(const simd_mask<T, Abi>& x) noexcept;

Returns:: A data-parallel object with the i-th element initialized to x[i] for all i ∊ [0, size()).


template<class T, int N> native_simd<T> to_native(const fixed_size_simd<T, N>& x) noexcept;
template<class T, int N> native_simd_mask<T> to_native(const fixed_size_simd_mask<T, N>& x) noexcept;

Returns:: A data-parallel object with the i-th element initialized to x[i] for all i ∊ [0, size()).
Remarks:: These functions shall not participate in overload resolution unless simd_size_v<T, simd_abi::native<T>> == N is true.


template<class T, int N> simd<T> to_compatible(const fixed_size_simd<T, N>& x) noexcept;
template<class T, int N> simd_mask<T> to_compatible(const fixed_size_simd_mask<T, N>& x) noexcept;

Returns:: A data-parallel object with the i-th element initialized to x[i] for all i ∊ [0, size()).
Remarks:: These functions shall not participate in overload resolution unless simd_size_v<T, simd_abi::compatible<T>> == N is true.


template<size_t... Sizes, class T, class Abi>
  tuple<simd<T, simd_abi::deduce_t<T, Sizes>>...>
    split(const simd<T, Abi>& x);
template<size_t... Sizes, class T, class Abi>
  tuple<simd_mask<T, simd_abi::deduce_t<T, Sizes>>...>
    split(const simd_mask<T, Abi>& x);

Returns:: A tuple of data-parallel objects with the i-th simd/simd_mask element of the j-th tuple element initialized to the value of the element x with index i + sum of the first j values in the Sizes pack.
Remarks:: These functions shall not participate in overload resolution unless the sum of all values in the Sizes pack is equal to simd_size_v<T, Abi>.


template<class V, class Abi>
  array<V, simd_size_v<typename V::value_type, Abi> / V::size()>
    split(const simd<typename V::value_type, Abi>& x);
template<class V, class Abi>
  array<V, simd_size_v<typename V::value_type, Abi> / V::size()>
    split(const simd_mask<typename V::value_type, Abi>& x);

Returns:: An array of data-parallel objects with the i-th simd/simd_mask element of the j-th array element initialized to the value of the element in x with index i + j * V::size().
Remarks:: These functions shall not participate in overload resolution unless

simd_size_v<typename V::value_type, Abi> is an integral multiple of V::size(), and

for the overload with a simd parameter is_simd_v<V> is true, for the overload with a simd_mask parameter is_simd_mask_v<V> is true.


template<class T, class... Abis>
simd<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>> concat(const simd<T, Abis>&... xs);
template<class T, class... Abis>
simd_mask<T, simd_abi::deduce_t<T, (simd_size_v<T, Abis> + ...)>> concat(const simd_mask<T, Abis>&... xs);

Returns:: A data-parallel object initialized with the concatenated values in the xs pack of data-parallel objects: The i-th simd/simd_mask element of the j-th parameter in the xs pack is copied to the return value's element with index i + the sum of the width of the first j parameters in the xs pack.

9.4.6

Algorithms

[parallel.simd.alg]

template<class T, class Abi> simd<T, Abi> min(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;

Returns:: The result of the element-wise application of std::min(a[i], b[i]) for all i ∊ [0, size()).

template<class T, class Abi> simd<T, Abi> max(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;

Returns:: The result of the element-wise application of std::max(a[i], b[i]) for all i ∊ [0, size()).


template<class T, class Abi>
pair<simd<T, Abi>, simd<T, Abi>> minmax(const simd<T, Abi>& a, const simd<T, Abi>& b) noexcept;

Returns:: A pair initialized with

the result of element-wise application of std::min(a[i], b[i]) for all i ∊ [0, size()) in the first member, and

the result of element-wise application of std::max(a[i], b[i]) for all i ∊ [0, size()) in the second member.


template<class T, class Abi> simd<T, Abi>
clamp(const simd<T, Abi>& v, const simd<T, Abi>& lo, const simd<T, Abi>& hi);

Requires:: No element in lo shall be greater than the corresponding element in hi.
Returns:: The result of element-wise application of std::clamp(v[i], lo[i], hi[i]) for all i ∊ [0, size()).

9.4.7

Math library

[parallel.simd.math]

For each set of overloaded functions within <cmath>, there shall be additional overloads sufficient to ensure that if any argument corresponding to a double parameter has type simd<T, Abi>, where is_floating_point_v<T> is true, then:

All arguments corresponding to double parameters shall be convertible to simd<T, Abi>.
All arguments corresponding to double* parameters shall be of type simd<T, Abi>*.
All arguments corresponding to parameters of integral type U shall be convertible to fixed_size_simd<U, simd_size_v<T, Abi>>.
All arguments corresponding to U*, where U is integral, shall be of type fixed_size_simd<U, simd_size_v<T, Abi>>*.
If the corresponding return type is double, the return type of the additional overloads is simd<T, Abi>. Otherwise, if the corresponding return type is bool, the return type of the additional overload is simd_mask<T, Abi>. Otherwise, the return type is fixed_size_simd<R, simd_size_v<T, Abi>>, with R denoting the corresponding return type.

It is unspecified whether a call to these overloads with arguments that are all convertible to simd<T, Abi> but are not of type simd<T, Abi> is well-formed.

Each function overload produced by the above rules applies the indicated <cmath> function element-wise. The results per element are not required to be bitwise equal to the application of the function which is overloaded for the element type.

The behavior is undefined if a domain, pole, or range error occurs when the input argument(s) are applied to the indicated <cmath> function.

If abs is called with an argument of type simd<X, Abi> for which is_unsigned_v<X> is true, the program is ill-formed.

9.5

Class template `simd_mask`

[parallel.simd.mask.class]

9.5.1

Class template `simd_mask` overview

[parallel.simd.mask.overview]

      
template<class T, class Abi> class simd_mask {
public:
  using value_type = bool;
  using reference = see below;
  using simd_type = simd<T, Abi>;
  using abi_type = Abi;

  static constexpr size_t size() noexcept;

  simd_mask() = default;

  // broadcast constructor
  explicit simd_mask(value_type) noexcept;

  // implicit type conversion constructor
  template<class U>
    simd_mask(const simd_mask<U, simd_abi::fixed_size<size()>>&) noexcept;

  // load constructor
  template<class Flags> simd_mask(const value_Type* mem, Flags);

  // 9.5.3, Copy functions
  template<class Flags> void copy_from(const value_type* mem, Flags);
  template<class Flags> void copy_to(value_type* mem, Flags);

  // 9.5.4, Subscript operators
  reference operator[](size_t);
  value_type operator[](size_t) const;

  // 9.5.5, Unary operators
  simd_mask operator!() const noexcept;
  
  // 9.6.1, Binary operators
  friend simd_mask operator&&(const simd_mask&, const simd_mask&) noexcept;
  friend simd_mask operator||(const simd_mask&, const simd_mask&) noexcept;
  friend simd_mask operator&(const simd_mask&, const simd_mask&) noexcept;
  friend simd_mask operator|(const simd_mask&, const simd_mask&) noexcept;
  friend simd_mask operator^(const simd_mask&, const simd_mask&) noexcept;

  // 9.6.2, Compound assignment
  friend simd_mask& operator&=(simd_mask&, const simd_mask&) noexcept;
  friend simd_mask& operator|=(simd_mask&, const simd_mask&) noexcept;
  friend simd_mask& operator^=(simd_mask&, const simd_mask&) noexcept;

  // 9.6.3, Comparisons
  friend simd_mask operator==(const simd_mask&, const simd_mask&) noexcept;
  friend simd_mask operator!=(const simd_mask&, const simd_mask&) noexcept;
};

The class template simd_mask is a data-parallel type with the element type bool. The width of a given simd_mask specialization is a constant expression, determined by the template parameters. Specifically, simd_mask<T, Abi>::size() == simd<T, Abi>::size().

Every specialization of simd_mask shall be a complete type. The specialization simd_mask<T, Abi> is supported if T is a vectorizable type and

Abi is simd_abi::scalar, or
Abi is simd_abi::fixed_size<N>, with N constrained as defined in (9.2.1).

If Abi is an extended ABI tag, it is implementation-defined whether simd_mask<T, Abi> is supported. [ Note: The intent is for implementations to decide on the basis of the currently targeted system. — end note ] If simd_mask<T, Abi> is not supported, the specialization shall have a deleted default constructor, deleted destructor, deleted copy constructor, and deleted copy assignment.

Default initialization performs no intialization of the elements; value-initialization initializes each element with false. [ Note: Thus, default initialization leaves the elements in an indeterminate state. — end note ]

static constexpr size_t size() noexcept;

Returns:: The width of simd<T, Abi>.

Implementations should enable explicit conversion from and to implementation-defined types. This adds one or more of the following declarations to class simd_mask:

        
explicit operator implementation-defined() const;
explicit simd_mask(const implementation-defined& init) const;

The member type reference has the same interface as simd<T, Abi>::reference, except its value_type is bool. (9.3.2)

9.5.2

Constructors

[parallel.simd.mask.ctor]

explicit simd_mask(value_type x) noexcept

Effects:: Constructs an object with each element initialized to x.

template<class U> simd_mask(const simd_mask<U, simd_abi::fixed_size<size()>>& x) noexcept;

Effects:: Constructs an object of type simd_mask where the i-th element equals x[i] for all i ∊ [0, size()).
Remarks:: This constructor shall not participate in overload resolution unless abi_type is simd_abi::fixed_size<size()>.

template<class Flags> simd_mask(const value_type* mem, Flags);

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligned by memory_alignment_v<simd_mask>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligned by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). [mem, mem + size()) is a valid range.
Effects:: Constructs an object where the i-th element is initialized to mem[i] for all i ∊ [0, size()).
Remarks:: This constructor shall not participate in overload resolution unless is_simd_flag_type_v<Flags> is true.

9.5.3

Copy functions

[parallel.simd.mask.copy]

template<class Flags> void copy_from(const value_type* mem, Flags);

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligned by memory_alignment_v<simd_mask>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligned by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). [mem, mem + size()) is a valid range.
Effects:: Replaces the elements of the simd_mask object such that the i-th element is replaced with mem[i] for all i ∊ [0, size()).
Remarks:: This function shall not participate in overload resolution unless is_simd_flag_type_v<Flags> is true.

template<class Flags> void copy_to(value_type* mem, Flags);

Requires:: If the template parameter Flags is vector_aligned_tag, mem shall point to storage aligned by memory_alignment_v<simd_mask>. If the template parameter Flags is overaligned_tag<N>, mem shall point to storage aligned by N. If the template parameter Flags is element_aligned_tag, mem shall point to storage aligned by alignof(U). [mem, mem + size()) is a valid range.
Effects:: Copies all simd_mask elements as if mem[i] = operator[](i) for all i ∊ [0, size()).
Remarks:: This function shall not participate in overload resolution unless is_simd_flag_type_v<Flags> is true.

9.5.4

Subscript operators

[parallel.simd.mask.subscr]

reference operator[](size_t i);

Requires:: i < size().
Returns:: A reference (see 9.3.2) referring to the i-th element.
Throws:: Nothing.

value_type operator[](size_t i) const;

Requires:: i < size().
Returns:: The value of the i-th element.
Throws:: Nothing.

9.5.5

Unary operators

[parallel.simd.mask.unary]

simd_mask operator!() const noexcept;

Returns:: The result of the element-wise appliation of operator!.

9.6

Non-member operations

[parallel.simd.mask.nonmembers]

9.6.1

Binary operators

[parallel.simd.mask.binary]


friend simd_mask operator&&(const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator||(const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator& (const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator| (const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator^ (const simd_mask&, const simd_mask&) noexcept;

Returns:: A simd_mask object initialized with the results of the element-wise appliation of the indicated operator.

9.6.2

Compound assignment

[parallel.simd.mask.cassign]


friend simd_mask& operator&=(simd_mask& lhs, const simd_mask& rhs) noexcept;
friend simd_mask& operator|=(simd_mask& lhs, const simd_mask& rhs) noexcept;
friend simd_mask& operator^=(simd_mask& lhs, const simd_mask& rhs) noexcept;

Effects:: These operators perform the indicated binary element-wise operation.
Returns:: lhs.

9.6.3

Comparisons

[parallel.simd.mask.comparison]


friend simd_mask operator==(const simd_mask&, const simd_mask&) noexcept;
friend simd_mask operator!=(const simd_mask&, const simd_mask&) noexcept;

Returns:: An object initialized with the results of the element-wise application of the indicated operator.

9.6.4

Reductions

[parallel.simd.mask.reductions]

template<class T, class Abi> bool all_of(const simd_mask<T, Abi>& k) noexcept;

Returns:: true if all boolean elements in k are true, false otherwise.

template<class T, class Abi> bool any_of(const simd_mask<T, Abi>& k) noexcept;

Returns:: true if at least one boolean element in k is true, false otherwise.

template<class T, class Abi> bool none_of(const simd_mask<T, Abi>& k) noexcept;

Returns:: true if none of the one boolean elements in k is true, false otherwise.

template<class T, class Abi> bool some_of(const simd_mask<T, Abi>& k) noexcept;

Returns:: true if at least one of the one boolean elements in k is true and at least one of the boolean elements in k is false, false otherwise.

template<class T, class Abi> int popcount(const simd_mask<T, Abi>& k) noexcept;

Returns:: The number of boolean elements in k that are true.

template<class T, class Abi> int find_first_set(const simd_mask<T, Abi>& k);

Requires:: any_of(k) returns true.
Returns:: The lowest element index i where k[i] is true.

template<class T, class Abi> int find_last_set(const simd_mask<T, Abi>& k);

Requires:: any_of(k) returns true.
Returns:: The greatest element index i where k[i] is true.


bool all_of(see below) noexcept;
bool any_of(see below) noexcept;
bool none_of(see below) noexcept;
bool some_of(see below) noexcept;
int popcount(see below) noexcept;

Returns:: all_of and any_of return their arguments; none_of returns the negation of its argument; some_of returns false; popcount returns the integral representation of its argument.
Remarks:: The functions shall not participate in overload resolution unless the argument is of type bool.


int find_first_set(see below) noexcept;
int find_last_set(see below) noexcept;

Requires:: The value of the argument is true.
Returns:: 0.
Remarks:: The functions shall not participate in overload resolution unless the argument is of type bool.

9.6.5

Where functions

[parallel.simd.mask.where]


template<class T, class Abi>
where_expression<simd_mask<T, Abi>, simd<T, Abi>> where(const typename simd<T, Abi>::mask_type& k,
                                                        simd<T, Abi>& v) noexcept;
template<class T, class Abi>
const_where_expression<simd_mask<T, Abi>, simd<T, Abi>> where(const typename simd<T, Abi>::mask_type& k,
                                                              const simd<T, Abi>& v) noexcept;
template<class T, class Abi>
where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>> where(const nodeduce_t<simd_mask<T, Abi>>& k,
                                                             simd_mask<T, Abi>& v) noexcept;
template<class T, class Abi>
const_where_expression<simd_mask<T, Abi>, simd_mask<T, Abi>> where(const nodeduce_t<simd_mask<T, Abi>>& k,
                                                                   const simd_mask<T, Abi>& v) noexcept;

Returns:: An object (9.2.3) with mask and data initialized with k and v respectively.


template<class T> where_expression<bool T> where(see below k, T& v) noexcept;
template<class T>
const_where_expression<bool, T> where(see below k, const T& v) noexcept;

Remarks:: The functions shall not participate in overload resolution unless

T is neither a simd nor a simd_mask specialization, and

the first argument is of type bool.
Returns:: An object (9.2.3) with mask and data initialized with k and v respectively.

Doc. No.	Title	Primary Section	Macro Name	Value	Header
P0155R0	Task Block R5	8	`__cpp_lib_experimental_-` `parallel_task_block`	201711	`<experimental/exception_list>` `<experimental/task_block>`
P0076R4	Vector and Wavefront Policies	6.2	`__cpp_lib_experimental_-` `execution_vector_policy`	201711	`<experimental/algorithm>` `<experimental/execution>`
P0075R2	Template Library for Parallel For Loops	7.2.2	`__cpp_lib_experimental_-` `parallel_for_loop`	201711	`<experimental/algorithm>`
P0214R9	Data-Parallel Vector Types & Operations	9	`__cpp_lib_experimental_-parallel_simd`	201803	`<experimental/simd>`

Function	Reduction Identity	Combiner Operation
`reduction_plus`	`T()`	`x + y`
`reduction_multiplies`	`T(1)`	`x * y`
`reduction_bit_and`	`(~T())`	`X & y`
`reduction_bit_or`	`T()`	`x \| y`
`reduction_bit_xor`	`T()`	`x ^ y`
`reduction_min`	`var`	`min(x, y)`
`reduction_max`	`var`	`max(x, y)`

Working Draft, Technical Specification for C++ Extensions for Parallelism Version 2