1. Motivation
ISO/IEC 19570:2018 (1) introduced data-parallel types to the C++ Extensions
for Parallelism TS. [P1928R15] is proposing to make that a part of C++ IS.
Intel supports the concept of a standard interface to SIMD instruction
capabilities and we have made extra suggestions for other APIs and facilities for
in document [P2638R0]. One of the extra features we suggested was the
ability to work with interleaved complex values. It is now becoming common
for processor instruction sets to include native support for these operations
(e.g., Intel AVX512-FP16, ARM Neon) and we think that allowing
to
directly access these features is advantageous.
This document gives more details about the motivation for supporting complex-values, the different storage and implementation alternatives, and some suggestions for a suitable API.
2. Background
Complex-valued mathematics is widely used in scientific computing and many types
of engineering, with applications including physical simulations, wireless
signal processing, media processing, and much more besides. C++ already supports
complex-valued operations using the
template available in the
header file and C supports complex values by using the
keyword. C++ supports floating point elements, including
(C++23),
,
or
,
though in practice compilers permit integer complex values too. Such is the
importance of interleaved complex values that some mainstream processor vendors
now provide native hardware support for complex-value SIMD instructions (e.g.,
Intel AVX512-FP16, ARM Neon v8.3).
Any complex value is represented as a pair of values, one for the real component and one for the imaginary component. In C and C++ a complex value is a pair of two values of the appropriate data type, allocated to a single unit containing 2 elements which are stored contiguously. This is illustrated in the figure below. This storage layout is used in other languages too, allowing for easy data interchange between software written in different languages, or with comparable user-defined types. This format may also be used for hardware accelerators or interfaces too:
When many complex values are stored together, such as in a C-style array, a C++
or a C++
, then the real and imaginary elements are
interleaved in memory:
Many applications, libraries, programming languages and interfaces between diverse software and hardware components will store data in this interleaved format, and it is treated as the default layout for blocks of complex-valued data. To improve the compute efficiency of this data format both Intel and ARM have introduced native hardware support to allow efficient SIMD operations to be performed in this data format without having to resort to reformatting the data into a more SIMD-amenable form. Where the underlying target hardware does not have native support for interleaved complex values it is straight-forward to synthesize a sequence of operations which give the same effect.
Given the popularity of interleaved complex-values as a data exchange and
compute format we propose that
should be able to directly represent
this simd format in a form exemplified as:
std :: simd < std :: complex < float > , 5 > std :: simd < std :: complex < double > , 9 >
In all these cases the fundamental element type will be a suitable complex value, and the individual components of each complex element will be worked on as a single unit, according to the rules of complex arithmetic where appropriate. This includes:
-
All numeric operators (e.g.,
,+
,/
,-
) and their derivatives (e.g., assignment operators) will operate as per their standard complex type definition. For example, the multiply operator will invoke a complex multiply for each complex element.* -
Any numeric operation will apply the operation on a per-element basis, generating a simd object of the same size as the input, but modifying the element type to be the appropriate return type. For example, when applied to a
:simd < complex < T >> -
,exp
,sin
,log
,tan
and so on will generate acos simd < complex < T >> -
orabs
will generate anorm
(i.e., real-valued SIMD equivalent).simd < T >
Note that the mathematical behavior will also be applied appropriately (e.g., if a real or imaginary component is a NaN, then the complete operation for a complex operation will also be NaN).
-
-
The
associated with asimd_mask
will have each individual mask bit refer to a complete complex element, not to sub-components of the complex elements. When thesimd < complex < T >>
is backed by a compact representation (e.g., AVX-512), then each individual bit will refer to a complete complex element. When thesimd_mask
is backed by an element mask (i.e., multiple bits filling a container of the same size as the simd element) then the size of each bit element will be twice the size of the underlying complex value type (e.g., asimd_mask
would be equivalent to something likesimd_mask < complex < _Float16 >>
, sincesimd < uint32_t >
.sizeof ( uint32_t ) == 2 * sizeof ( _Float16 ) -
Operations or functions which are invalid for complex values will be removed. This includes relational operators (e.g.,
,<
,<=
,>
) and those derived from them such, as>=
ormin
.max -
Any operation which moves or reorders elements will move entire complex values. For example, any sort of permute, resize, split, concat, insert, extract or indexing operation works on complete complex elements as though they are atomic units.
-
Reduction operations apply at the level of complex values (this follows from the previous bullet), provided the mathematical operation is valid (e.g., reduce-max would not be available).
One of the aims of
is to make it possible to write a generic
code, which can use either a scalar type or a data-parallel type
interchangeably. To this end,
provides overloads of many of the
standard functions which operate in data-parallel mode. The same should apply to
a simd of complex values; it should be possible to write an algorithm which uses
either
or
with only a type replacement,
not with major API changes. To this end
should include some member
functions, such as real or imag, to mirror those found in std::complex.
Note that an alternative way to create parallel complex values is to allow
to use simd elements, such as
.
This storage layout is called separated storage and discussion of this format is
outside the scope of this paper.
3. Overview of updates required to support std :: complex
simd elements
There are three ways in which
needs to change to accommodate
elements:
-
Modify the definition of vectorizable types and the operations on them to allow for complex elements. In common with
itself, only specializations of floating-point element types are allowed. These modifications are small in nature and don’t change the fundamentalstd :: complex
API.std :: simd -
Add a small number of methods to
to mimic those found instd :: simd
. These modifications allow astd :: complex
value to be easily inserted into source code by text or template replacement in place of astd ::: simd < complex < T >>
value without requiring a rewrite.std :: complex -
Add a set of overloads and free functions to allow
to reflect the different behaviors of complex (e.g.,simd < complex < T >>
,sin
,cos
,exp
).log
Each of these points will now be discussed in more detail.
3.1. std :: simd
base modifications
In its current definition,
allows the element type to be any
cv-unqualified arithmetic type other than
or
. The first modification is to
extend the set of allowable element types to include any
specialisation for a cv-unqualified vectorizable floating-point type.
Any simd operation or function which relies only on an element being an atomic
container which can be arbitrarily moved or duplicated as a single unit will
continue to operate as expected. For example, all the following will work on
without modification:
-
Constructors or
/copy_to
functions for loading and storing values to and from contiguous iteratorscopy_from -
Broadcast constructor
-
Index operations
-
Reordering operations, such as
,simd_split
,simd_cat
,permute
,resize
andinsert
.extract -
Masking selection operations (using a mask to include or exclude values from an operation or copy).
’s of complex values do not need to have any special behaviour since the mask
is simply a collection of boolean values, and the underlying element type makes no difference.
Constructors, including generator functions, which work on atomic units (e.g., broadcasting constructor) will work without modification. Note that the existing broadcast constructor will already permit a real-valued object to be used in a broadcast, since an individual value may be converted to a complex value with a zero imaginary:
auto x = simd < std :: complex < float >> ( 3.14f ); // [3.14 + 0i, 3.14 + 0i, ...]
Numeric operations (binary, unary, and compound-assignment) are mostly covered by [P1928R15] remark:
A simd object initialized with the results of applying the indicated operator to
and
as a binary element-wise operation
In the case of complex values this has the practical effect of ensuring that
operators will perform their complex operations without changing the definition
of
(e.g.,
will perform complex-valued
multiplication). Furthermore, [P1928R15] notes that constraints will be applied
as necessary on operators and functions:
Constraints: Application of the indicated operator to objects of type
is well-formed.
This will ensure that operators which are not permitted for complex numbers
(e.g.,
,
,
) will be removed from
the overload set, as will functions which rely on these operators (e.g.,
,
,
).
3.2. Additional complex methods for std :: simd
It is desirable for
to be source- or template-replaceable with
respect to its underlying element type. For example, given a simple code
fragment which works with
:
auto my_fn ( auto cmplx_value ) { std :: complex < float > rotator ( 0 , 1 ); return ( cmplx_value * rotator ). real (); }
In this example the function will work with a
input. Ideally
it should also work with an input of type
. For this to happen,
firstly the
must work with a scalar value, and secondly it should be
possible for the
method to be invoked on a
value.
provides a number of member functions:
-
Constructors
-
(assignment)operator = -
,operator +=
,operator -=
,operator *= operator /= -
real -
imag
The assignment operator and the compound-assignment operators are already
provided by
so no further action is required. The remaining member
functions that are required are considered in the following sub-sections.
3.2.1. Constructors
Conversion and copy methods already exist in
and can be used for
complex SIMD values. An additional constructor is needed to match the
constructor which can build a complex value from separate
real and imaginary components: constructed from real and imaginary simd values.
template < simd - floating - point VC > constexpr explicit ( see below ) basic_simd ( const VC & reals , const VC & imags = {}) noexcept ;
This constructor is constrained so that it can only be used for SIMD values with
complex elements. Like its
counterpart it can be used to build a
complex value from real components only, with insertion of zero imaginaries.
This constructor can be used as in the following example:
simd < float , 8 > reals = ...; simd < float , 8 > imaginaries = ...; // Create a real-valued complex number (i.e., zero imaginaries) simd < std :: complex < float > , 8 > realAsComplex ( reals ); // Create a complex value from real and imaginary simd inputs simd < std :: complex < float > , 8 > cmplx ( reals , imaginaries );
3.2.2. Real/imag accessors
It should be possible to separately read or write the real and imaginary
components. For example, given a simd of
values, the real
components of the simd will be extracted as a value of type
instead.
Similarly, given a
value those values can be inserted into the simd
as the real components of each respective value.
The read accessors for
come in two variants: a free function,
and a member function. These would be defined in
as follows:
// Free functions for accessing real/imaginary components template < class T , class ABI > constexpr auto real ( const basic_simd < complex < T > , ABI >& ); template < class T , class ABI > constexpr auto imag ( const basic_simd < complex < T > , ABI >& ); // Member functions of std::simd for accessing real/imaginary components. constexpr simd < T , size () > std :: simd < std :: complex < T >>:: real () const ; constexpr simd < T , size () > std :: simd < std :: complex < T >>:: imag () const ;
The write accessors for
are provided as member functions
which allow the real and imaginary components of an existing complex value to be
overwritten.
// Member functions of std::simd for accessing real/imaginary components. template < class T , class ABI > constexpr void real ( const basic_simd < T , ABI >& reals ); template < class T , class ABI > constexpr void imag ( const basic_simd < T , ABI >& imags );
The advantage of making
and
members of
is an API consistency
between
and
for both getters and setters.
has a variety of overloads for these accessors for special cases. For example:
template < class FloatingPoint > constexpr FloatingPoint real ( FloatingPoint f ); template < class Integer > constexpr double real ( Integer i ); template < class FloatingPoint > constexpr FloatingPoint imag ( FloatingPoint f ); template < class Integer > constexpr double imag ( Integer i );
In some cases, mirroring these functions into
would result in an
unwanted performance degradation as, for example, the potentially small
would be turned into a much larger
result, which
is likely not what the user intended. For
we therefore propose
that only sufficient overloads are provided to operate on
values.
3.3. Additional free functions
The free functions for
include numeric operators, such as
,
,
and
. These are already proposed in [P1928R15] and need not be considered
further.
The remaining free functions which should be specialized for
include:
| returns the real part |
---|---|
| returns the imaginary part |
| returns the magnitude of a complex number |
| returns the phase angle |
| returns the squared magnitude |
| returns the complex conjugate |
| returns the projection onto the Riemann sphere |
| constructs a complex number from magnitude and phase angle |
| complex base e exponential |
| complex natural logarithm with the branch cuts along the negative real axis |
| complex common logarithm with the branch cuts along the negative real axis |
| complex power, one or both arguments may be a complex number |
| complex square root in the range of the right half-plane |
| computes sine of a complex number |
| computes cosine of a complex number |
| computes tangent of a complex numbe |
| computes arc sine of a complex number |
| computes arc cosine of a complex number |
| computes arc tangent of a complex number |
| computes hyperbolic sine of a complex number |
| computes hyperbolic cosine of a complex number |
| computes hyperbolic tangent of a complex number |
| computes area hyperbolic sine of a complex number |
| computes area hyperbolic cosine of a complex number |
| computes area hyperbolic tangent of a complex number |
All of these functions should operate element-wise in the same way as their
scalar counterparts. Note that although most of these functions take complex
inputs and generate complex outputs, some of these functions generate
real-valued outputs (e.g.,
returns the magnitude, which is real-valued), or
accept an input which is real-valued (e.g.,
can accept a real-valued value
and return a complex value).
Unlike other math functions in
, none of the
free-functions
equivalents of
are marked as
. This matches the
behaviour of
.
In
a variety of overloads exist for each of the functions listed above, such as with
:
template < simd - type V > constexpr V conj ( const V & v ); template < class FloatingPoint > constexpr FloatingPoint conj ( FloatingPoint f ); template < class Integer > constexpr double conj ( Integer i );
If these overloads were mirrored into
then they could result in
unexpected performance degradation. For example, accepting a
and
returning a
will typically result in a much larger SIMD register
being required, or allowing an argument to be a slightly different type to what
is typical using an overload may result in an implicit conversion with an
unwanted cost. It is the intent of
to support only complex-type
operations on
, not on
. if the user wants an
overload-like behaviour with
they should handle the
appropriate conversion themselves so that the performance expectations can be
managed.
4. Implementation experience
Intel have written an example implementation of
which includes
the extensions to support interleaved complex values. We have been able to
generate efficient code both on targets with native support (e.g., AVX512_FP16),
and also on targets without native support which require synthesized code
sequences.
5. Questions for LEWG from LWG
During LWG review two queries were raised concerning design intent which need to be reconfirmed by LEWG:
-
In common with
itself, thestd :: complex
equivalents of thesimd
functions are not marked asstd :: complex
. This design is different to the bulk ofnoexcept
where functions are typically marked asstd :: simd
.noexcept -
Only sufficient overloads for accessors and free-functions to support
. Overloads forsimd < complex < T >>
,simd < float >
, andsimd < Integer >
are not provided.simd < Other >
6. Wording
The wording relies on [P1928R15] being landed to the Working Draft. Below, substitute the � character with a number the editor finds appropriate for the table, paragraph, section or sub-section.
6.1. Modify [simd.general]
�: The set of vectorizable types comprises all standard integer types,
character types, the types
and
,
and
,
where
is a vectorizable floating-point type
.
([basic.fundamental]). In addition,
,
, and
are vectorizable types if defined ([basic.extended.fp]).
6.2. Modify [simd.expos]
Add a concept to detect a
type:
template < class V > concept simd - type = // exposition only same_as < V , basic_simd < typename V :: value_type , typename V :: abi_type >> && is_default_constructible_v < V > ;
Add new concepts to detect
types with specific element type requirements:
template < class V > concept simd - complex = // exposition only same_as < typename V :: value_type , complex < typename V :: value_type :: value_type >> template < class V > concept simd - integral = // exposition only simd - type < V > && integral < typename V :: value_type > ;
Modify existing type-element concepts to use the new refactored
concept:
template < class V > concept simd - floating - point = // exposition only same_as < V , basic_simd < typename V :: value_type , typename V :: abi_type >> && is_default_constructible_v < V > && simd - type < V > && floating_point < typename V :: value_type > ;
6.3. Modify [simd.syn]
In the header
synopsis - [simd.syn] - add at the end after the "Mathematical functions"
// [simd.complex], basic_simd complex functions template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > real ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > imag ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > abs ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > arg ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > norm ( const V & ); template < simd - complex V > constexpr V conj ( const V & ); template < simd - complex V > constexpr V proj ( const V & ); template < simd - complex V > constexpr V exp ( const V & v ); template < simd - complex V > constexpr V log ( const V & v ); template < simd - complex V > constexpr V log10 ( const V & v ); template < simd - complex V > constexpr V sqrt ( const V & v ); template < simd - complex V > constexpr V sin ( const V & v ); template < simd - complex V > constexpr V asin ( const V & v ); template < simd - complex V > constexpr V cos ( const V & v ); template < simd - complex V > constexpr V acos ( const V & v ); template < simd - complex V > constexpr V tan ( const V & v ); template < simd - complex V > constexpr V atan ( const V & v ); template < simd - complex V > constexpr V sinh ( const V & v ); template < simd - complex V > constexpr V asinh ( const V & v ); template < simd - complex V > constexpr V cosh ( const V & v ); template < simd - complex V > constexpr V acosh ( const V & v ); template < simd - complex V > constexpr V tanh ( const V & v ); template < simd - complex V > constexpr V atanh ( const V & v ); template < simd - floating - point V > rebind_simd_t < complex < typename V :: value_type > , V > polar ( const V & x , const V & y = {}); template < simd - complex V , simd - complex V > constexpr V pow ( const V & x , const V & y );
6.4. Modify [simd.overview]
In the header
overview - [simd.overview] - add at the end of the existing list of constructors.
template < simd - floating - point VC > constexpr explicit ( see below ) basic_simd ( const VC & reals , const VC & imags = {}) noexcept ;
Add after the operators:
// [simd.complex_accessors], basic_simd complex-value accessors constexpr rebind_simd_t < typename T :: value_type , basic_simd > real () const ; constexpr rebind_simd_t < typename T :: value_type , basic_simd > imag () const ; template < class CAbi > constexpr void real ( const basic_simd < typename T :: value_type , CAbi >& v ); template < class CAbi > constexpr void imag ( const basic_simd < typename T :: value_type , CAbi >& v );
6.5. Modify [simd.summary]
NOTE: This paragraph was removed from [P1928R12]. Should it be re-inserted to allow the behaviour of complex to be specified?
�: Default intialization performs no initialization of the elements
,
except when the value type is a complex
specialization in which case all elements will be value-initialized
;
value-initialization initializes each element with
. [Note:
Thus, default initialization leaves
non-complex
elements in an
indeterminate state. — end note ]
6.6. Modify [simd.ctor]
Provide a
-value constructor for real/imaginary component-wise initialization.
template < simd - floating - point VC > constexpr explicit ( see below ) basic_simd ( const VC & reals , const VC & imags = {}) noexcept ; Constraints:
is
simd - complex < basic_simd > true
.
is
simd - size - v < VC > == size () true
.Effects:
Initializes the ith element with
for all
value_type ( reals [ i ], imags [ i ]) in the range of
i .
[ 0 , size ()) Remarks:
The expression inside
evaluates to
explicit false
if and only if the floating-point conversion rank ofis greater than or equal to the floating-point conversion rank of
T :: value_type .
VC :: value_type
6.7. Add new section [simd.complex_accessors]
�complex accessors [simd.complex_accessors]
simd constexpr rebind_simd_t < typename T :: value_type , basic_simd > real () const ; constexpr rebind_simd_t < typename T :: value_type , basic_simd > imag () const ; Constraints:
is
simd - complex < basic_simd > true
.Returns:
A data-parallel object initialized with values equal to the results of applying cmplx-func element-wise to
, where cmplx-func is either
x or
real corresponding to the overload name.
imag template < class CAbi > constexpr void real ( const basic_simd < typename T :: value_type , CAbi >& v ); template < class CAbi > constexpr void imag ( const basic_simd < typename T :: value_type , CAbi >& v ); Constraints:
is
simd - complex < basic_simd > true
.
is
basic_simd < typename T :: value_type , CAbi >:: size () == basic_simd :: size () true
.Effects:
Replaces each element of the
object such that the ith element is replaced with
basic_simd or
operator []( i ). real ( v [ i ]) , respectively, for all
operator []( i ). imag ( v [ i ]) in the range of
i .
[ 0 , size ())
6.8. Add new section [simd.complex_math]
�complex math [simd.complex_math]
simd Unless otherwise specified, each
function declared in
complex yields a
< simd > result, where the value of each element approximates the result obtained by applying the function with the same name from
basic_simd to the corresponding elements of the arguments. The value of an element of the result is unspecified if a domain, pole, or range error occurs for the corresponding elements of the arguments.
< complex > is never modified.
errno template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > real ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > imag ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > abs ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > arg ( const V & ); template < simd - complex V > constexpr rebind_simd_t < typename V :: value_type :: value_type , V > norm ( const V & ); template < simd - complex V > constexpr V conj ( const V & ); template < simd - complex V > constexpr V proj ( const V & ); template < simd - complex V > constexpr V exp ( const V & v ); template < simd - complex V > constexpr V log ( const V & v ); template < simd - complex V > constexpr V log10 ( const V & v ); template < simd - complex V > constexpr V sqrt ( const V & v ); template < simd - complex V > constexpr V sin ( const V & v ); template < simd - complex V > constexpr V asin ( const V & v ); template < simd - complex V > constexpr V cos ( const V & v ); template < simd - complex V > constexpr V acos ( const V & v ); template < simd - complex V > constexpr V tan ( const V & v ); template < simd - complex V > constexpr V atan ( const V & v ); template < simd - complex V > constexpr V sinh ( const V & v ); template < simd - complex V > constexpr V asinh ( const V & v ); template < simd - complex V > constexpr V cosh ( const V & v ); template < simd - complex V > constexpr V acosh ( const V & v ); template < simd - complex V > constexpr V tanh ( const V & v ); template < simd - complex V > constexpr V atanh ( const V & v ); Returns:
A data-parallel object initialized with values equal to the results of applying cmplx-func element-wise to
, where cmplx-func is the corresponding scalar function from
x .
< complex > template < simd - floating - point V > rebind_simd_t < complex < typename V :: value_type > , V > polar ( const V & x , const V & y = {}); template < simd - complex V > constexpr V pow ( const V & x , const V & y ); Returns:
A data-parallel object initialized with values equal to the results of applying the cmplx-func element-wise to
and
x , where cmplx-func is the corresponding scalar binary function from
y .
< complex >
7. Polls
7.1. Kona 2023 LEWG polls
Poll: Forward P2663R4 (Interleaved complex support in std::simd) with
modified wording for 5.1 to constrain
to floating point to LWG for
C++26 (to be confirmed with a Library Evolution electronic poll).
SF | F | N | A | SA |
---|---|---|---|---|
8 | 8 | 0 | 0 | 0 |
7.2. Kona 2022 SG1 polls
Poll: After significant experience with the TS, we recommend that the next version (the TS version with
improvements) of
target the IS (C++26)
SF | F | N | A | SA |
---|---|---|---|---|
10 | 8 | 0 | 0 | 0 |
Poll: Future papers and future revisions of existing papers that target
should go directly to LEWG. (We do not believe
there are SG1 issues with
today.)
SF | F | N | A | SA |
---|---|---|---|---|
9 | 8 | 0 | 0 | 0 |
8. Revision History
R5 => R6
-
Updated wording section to reflect the changes and style now used in [P1928R13].
-
Addressed LWG review comments
-
Raised some LEWG design issues that need to be reconfirmed.
R4 => R5
-
Modified wording to clarify that only floating-point specializations of
should be allowed.std :: complex <> -
Removed design option for making
/real
setter free-functions. The member function API fromimag
will be used.std :: complex
R3 => R4
-
Updated wording to reflect changes made to
in [P1928R6] (e.g., removal ofstd :: simd
, addition offixed_size_simd
).basic_simd
R2 => R3
-
Added wording
R1 => R2
-
Add more content for proposal overview
-
Rewrite to bikeshed
R0 => R1
-
Add polls from Kona SG1 2022