1. Revision History
R1 => R2
-
Improved examples
-
Added more background to naming choice and motivation provide an integer overloads.
-
Added implementation experience
-
Improved wording to fix some errors and to provide more context for required changes.
-
Added revision history
R0 => R1
-
Renamed
tosimd_chunk_n
.simd_chunk -
Wording changes to separate each overload to have its own clauses.
2. Motivation
The
function takes a
object and breaks it down into
a tuple of as many objects of type
that it can, and maybe one remainder
object. An example illustrates a practical use of this, where a large
value is broken into many smaller pieces, each of which fits a specific hardware
register type (possibly with a hardware-specific abi):
using Avx2RegisterType = simd < float , 8 > ; // Or with a hardware ABI instead. simd < float , 19 > x ; auto t = simd_split < Avx2RegisterType > ( x ); // get<0>(t) will be of type simd<float, 8> // get<1>(t) will be of type simd<float, 8> // get<2>(t) will be of type simd<float, 3> - the remainder // Note that each element is now the right size to pass to an AVX2 intrinsic.
If the original type is perfectly divisible into type
then an
is
returned instead of a tuple of different sized
objects.
The behaviour of
is virtually identical to that of
and
. They take a view and
a number n and produce a range of views (the chunks) of the original view, such
that each chunk, except maybe the last one, has the size n.
In contrast,
has a different behaviour to the similarly
named
. The ranges version of
takes an input range and a
delimiter value, and generates a range of views split on the delimiter. For
example, the string "This,is,a,list", when split by the comma value would
generate a range containing 4 views: "This", "is", "a", "list".
We propose that the
function is renamed to
to make its
behavior consistant with the existing range/view counterparts. We did not
consider any alternative names since they would introduce yet another name for a
behavior that already exists and the intent is to allow the behavior of the
existing term chunk to be reused in the context of
.
A common use case for
/
is to break a larger
object into smaller native-sized pieces to call target-specific
intrinsics, as illustrated in the first example in this paper. For that use-case
the behaviour of
is sufficient, but another common use case is
where an algorithm requires that a
be broken down into pieces of a
particular size. Only the size is of interest, not other details such as the
ABI. With the current behavior the user would have to store the size inside a
special type created just for that purpose, but it would be more convenient to
provide overloads which take the size directly. The difference is illustrated
here:
As existing | With new overloads |
---|---|
|
|
The overloaded version allows the user to directly use the size without having to convert to a chunk-type as a vehicle to convey the size indirectly. Allowing the size to be used directly allows the code to be simpler and more obvious.
3. Implementation experience
The rename made no difference to the implementation, but does make the intent of those functions more obvious to those already familiar with the uses of the words "split" and "chunk" within the ranges libraries.
The extra overloads make places where chunking is performed for algorithmic reasons (rather than hardware-related type reasons) more obvious and readable. There is no need for the user’s code to introduce new sized-types for chunking.
The implementation of the new overloads is trivial and was done for Intel internal implementation of
.
The overloads can do as the examples above illustrate, and convert the incoming type to a new size and call
the existing
overloads.
4. Wording
The wording diff is against the current C++ working draft.
4.1. Modify [simd.syn]
Add new
overloads immediately after the existing ones.
template < class T , class Abi , contiguous_iterator I , sized_sentinel_for < I > S , class ... Flags > requires indirectly_writable < I , T > constexpr void simd_partial_store ( const basic_simd < T , Abi >& v , I first , S last , simd_flags < Flags ... > f = {}); template < class T , class Abi , contiguous_iterator I , sized_sentinel_for < I > S , class ... Flags > requires indirectly_writable < I , T > constexpr void simd_partial_store ( const basic_simd < T , Abi >& v , I first , S last , const typename basic_simd < T , Abi >:: mask_type & mask , simd_flags < Flags ... > f = {}); // [simd.creation] , basic_simd and basic_simd_mask creation template < class V , class Abi > constexpr auto simd_split simd_chunk ( const basic_simd < typename V :: value_type , Abi >& x ) noexcept ; template < class M , class Abi > constexpr auto simd_split simd_chunk ( const basic_simd_mask < mask - element - size < M > , Abi >& x ) noexcept ; template < size_t N , class T , class Abi > constexpr auto simd_chunk ( const basic_simd < T , Abi >& x ) noexcept ; template < size_t N , size_t Bytes , class Abi > constexpr auto simd_chunk ( const basic_simd_mask < Bytes , Abi >& x ) noexcept ;
4.2. Modify [simd.creation]
template < class T , class Abi > constexpr auto simd_split simd_chunk ( const basic_simd < typename T :: value_type , Abi >& x ) noexcept ; template < class T , class Abi > constexpr auto simd_split simd_chunk ( const basic_simd_mask < mask - element - size < T > , Abi >& x ) noexcept ; Constraints:
For the first overload
is an enabled specialization of
T . If
basic_simd is not
basic_simd < typename T :: value_type , Abi > :: size () % T :: size () then
0 is valid and denotes a type.
resize_simd_t < basic_simd < typename T :: value_type , Abi > :: size () % T :: size (), T > For the second overload
is an enabled specialization of
T . If
basic_simd_mask is not
basic_simd_mask < mask - element - size < T > , Abi > :: size () % T :: size () then
0 is valid and denotes a type.
resize_simd_t < basic_simd_mask < mask - element - size < T > , Abi > :: size () % T :: size (), T > Let
be
N .
x . size () / T :: size () Returns:
If
is
x . size () % T :: size () == 0 true
, anwith the
array < T , N > th
i or
basic_simd element of the
basic_simd_mask th array element initialized to the value of the element in
j with index
x .
i + j * T :: size () Otherwise, a
of
tuple objects of type
N and one object of type
T . The
resize_simd_t < x . size () % T :: size (), T > th
i or
basic_simd element of the
basic_simd_mask th
j element of type
tuple is initialized to the value of the element in
T with index
x . The
i + j * T :: size () th
i or
basic_simd element of the
basic_simd_mask th
N element is initialized to the value of the element in
tuple with index
x .
i + N * T :: size () template < size_t N , class T , class Abi > constexpr auto simd_chunk ( const basic_simd < T , Abi >& x ) noexcept ; Effects: Equivalent to:
return simd_chunk < resize_simd_t < N , basic_simd < T , Abi >>> ( x ); template < size_t N , size_t Bytes , class Abi > constexpr auto simd_chunk ( const basic_simd_mask < Bytes , Abi >& x ) noexcept ; Effects: Equivalent to:
return simd_chunk < resize_simd_t < N , basic_simd_mask < Bytes , Abi >>> ( x );