1. Introduction
The Working Draft makes types , which allows operations. However, the object representation is unspecified, making the results implementation-defined.
vec < float > v = {...}; auto as_int = std :: bit_cast < vec < int32_t >> ( v ); // Legal C++26, but what are the integer values?
This contrasts with , which has a well-specified contiguous layout that makes operations portable and predictable. Since is conceptually similar to (both are fixed-size containers of homogeneous elements), users naturally expect similar bit-casting guarantees. Without specifying the layout, such code is not portable across implementations.
Furthermore, all target-specific intrinsics (e.g., Intel’s ,
ARM’s , etc.) provide well-defined bit-reinterpretation.
Users migrating from intrinsics to lose this capability because the Working Draft does not specify an object representation that gives portable semantics for such reinterpretation.
This paper proposes mandating array-like object representation specifically for , where the guarantee is clean, unambiguous, and aligned with universal hardware practice. For other ABIs — including and implementation-defined ABIs — the object representation remains implementation-defined. The paper also offers a fallback solution based on query traits, and notes that the two approaches are complementary.
2. Motivation
Bit-level operations on SIMD vectors are pervasive in performance-critical code. Common patterns include clearing sign bits for fast absolute value, inspecting IEEE 754 exponent bits, and type-punning between float and integer vectors. For example:
template < typename T , typename Abi > auto abs_via_bitwise ( basic_vec < T , Abi > v ) -> basic_vec < T , Abi > { auto bits = std :: bit_cast < basic_vec < uint_t , Abi >> ( v ); bits &= ~ sign_bit_mask ; return std :: bit_cast < basic_vec < T , Abi >> ( bits ); }
Similar bit-casting patterns appear across a wide range of domains, including scientific computing, signal processing, game engines, and numeric libraries. Every target vendor provides these operations with well-defined semantics (e.g., Intel’s , ARM’s ). Furthermore, the bit pattern meanings are consistent across mainstream intrinsic APIs, so vendor intrinsic code already works portably. However, because the object representation of is not specified, these idioms do not have portable semantics when expressed in terms of .
The case for specifying layout is strongest for . This is the default ABI which maps directly to hardware registers, and it is the ABI that corresponds to vendor intrinsic types. When developers use , they are explicitly requesting the hardware’s natural representation, and they expect the bit-level semantics that come with it.
In production code for high-performance signal processing, predictable bit-level behavior is essential. When a developer s a SIMD vector, the expectation is the same semantics provided by vendor intrinsics or by working with arrays. This expectation is well-founded: it reflects decades of consistent hardware design across all major CPU architectures.
If an implementation were to choose a non-standard layout, several problems would arise:
-
The ability to reason about bit-level operations would be lost
-
Optimisation techniques that are standard practice with intrinsics would become unusable
-
Defensive code with fallback paths would be required to accommodate one implementation’s unusual choice
The problem is not with flexibility per se, but rather that the exceptional non-standard case penalises the common case of writing portable code. If an implementation genuinely needs a different layout, that is what custom ABI tags are for. Such exceptional choices should be explicit and discoverable, not hidden behind implementation-defined behavior that forces every user to write defensive code.
We now examine two possible solutions in more detail.
3. Proposed Solution 1: Mandate Array-Like Layout for native - abi
This solution requires that stores values of type contiguously and in index order, with no inter-element padding and no trailing padding. The object representation is identical to . This layout matches the behavior of all mainstream SIMD targets we are aware of, and is well-suited to SIMD processing.
template < typename T > auto abs_via_bitwise ( basic_vec < T , native - abi < T >> v ) -> basic_vec < T , native - abi < T >> { auto bits = std :: bit_cast < basic_vec < uint_t , native - abi < uint_t >>> ( v ); bits &= ~ sign_bit_mask ; return std :: bit_cast < basic_vec < T , native - abi < T >>> ( bits ); } // Works everywhere with native-abi.
This approach provides guaranteed portability for native-width vectors, straightforward user code, and matches the behavior of all mainstream SIMD targets we are aware of.
This paper is specifically about object representation in support of portable (and -style copying). It does not propose pointer-interconvertibility with , nor does it propose any change to existing aliasing or lifetime rules.
3.1. Why This Matches Hardware Reality
Across all mainstream SIMD targets we are aware of (including Intel/AMD x86, Arm NEON/SVE, RISC-V V, and PowerPC/VSX), vector data is naturally treated as a contiguous sequence of elements when transferred to and from memory, and the mapping between element index and increasing memory offset is consistent with array-like layout. This is the layout that existing vendor intrinsics and idioms assume.
For specifically, the vector maps to a single hardware register, and the elements fill that register completely with no trailing padding. This makes the array-like guarantee clean and unambiguous.
3.2. The Standard Already Strongly Implies Array-Like Layout
The existing Working Draft contains multiple indications that array-like layout is expected, and these indications are particularly strong for :
The meaning of "native": The specification says should provide "the most efficient data-parallel execution for the element type T on the currently targeted system" and that representation should "depend on the target architecture." The word "native" means the hardware’s natural representation. Scoping the layout mandate to is therefore directly aligned with the specification’s own intent: if the ABI is the native one, the layout should be the native one too.
Recommended practice for conversions: The draft states "Implementations should support implicit conversions between specializations of basic_vec and appropriate implementation-defined types (see simd.overview(https://eel.is/c++draft/simd#overview))." These implementation-defined types are vendor intrinsics like . This creates an inconsistency:
// This is legal and well-defined at every step basic_vec < float , native - abi < float >> v = /* ... */ ; __m256 native = v ; // Recommended conversion __m256i as_int = _mm256_castps_si256 ( native ); // Well-defined intrinsic basic_vec < int , native - abi < int >> result = as_int ; // Recommended conversion // But this equivalent direct path is implementation-defined auto result = std :: bit_cast < basic_vec < int , native - abi < int >>> ( v );
The indirect path through intrinsics is legal and portable because intrinsics have well-defined bit-reinterpretation semantics. But the direct path is not portable. If intrinsic interop is recommended, then the layout implications of that interop should also be normative; otherwise the recommendation is misleading.
ABI tags handle variation: The ABI parameter mechanism exists precisely to handle platform-specific variations. Scoping the mandate to respects this design: the native ABI gets a firm guarantee, while custom and implementation-defined ABIs retain full freedom.
3.3. Why fixed_size Is Excluded
The ABI presents additional challenges that make it unsuitable for the same guarantee. When the requested number of elements does not match a single hardware register width, implementations must decompose the logical vector into multiple hardware registers. Different implementations make legitimately different choices about how to perform this decomposition:
-
Some implementations round up to the next power of two (e.g., GCC)
-
Others pack into as many full-width registers as possible, with a smaller trailing register (e.g., Clang)
These choices affect , internal alignment boundaries, and trailing padding. Mandating a specific layout for would either constrain implementations unnecessarily or require a trailing-padding escape hatch that undermines the portability guarantee.
However, we observe that on a given implementation, specializations whose element data requires the same total number of bits will typically use the same register decomposition. For example, and both require 704 bits of element data. An implementation targeting a platform with 512-bit registers might represent both as one 512-bit register plus one 256-bit register (with 64 bits of padding). Because the decomposition is the same, the element data occupies the same byte offsets within the object representation, and between the two types would produce meaningful results.
Notably, users can already test whether two types are -compatible at compile time, since requires equal :
static_assert ( sizeof ( basic_vec < uint32_t , fixed_size < 22 >> ) == sizeof ( basic_vec < uint64_t , fixed_size < 11 >> )); // If this passes, bit_cast compiles.
This size equality is a necessary condition for , but it is not sufficient for portable element-wise reinterpretation — that additionally requires knowing that elements are laid out contiguously and in index order. For , this paper mandates that property. For , the layout is implementation-defined, so while between same-sized specializations is likely to work on any given platform, it is not portably guaranteed.
This also highlights an important distinction: for vectors, the most useful guarantee may not be "is this type array-like?" (a per-type property) but rather "are these two types layout-compatible with each other?" (a pairwise property). Two vectors might both have internal padding that prevents either from being array-like, yet share identical internal structure, making between them well-defined.
A robust specification for layout, including the question of pairwise compatibility, is a more complex problem that deserves its own treatment. Although the layout guarantee proposed here applies to , users working with vectors can use the function to decompose into native-sized pieces, each of which carries the array-like guarantee. A complete solution for bit-casting, including handling of remainder chunks, is left for future work.
3.4. Handling Missing Hardware Support
Implementations already handle missing hardware support without altering layouts. Even modern Intel processors do not provide a completely uniform instruction set for all possible data types. For example, AVX-512 lacks 8-bit integer multiplication, shift, and rotate instructions. Rather than adopting a special layout for , implementations synthesize the specific operations that are missing using wider operations (e.g., 16-bit multiplication with masking) or scalar fallbacks, while maintaining the same array-like layout.
When hardware lacks native support for a type, may fall back to software emulation that still uses array-like layout. The guarantee holds for whatever is defined to be. When hardware does not support operations natively, implementations maintain the standard layout and emulate operations. They do not change the memory representation.
4. Proposed Solution 2: Query Traits
As a fallback to mandating array-like layout, implementations could instead be required to document their layout and provide a compile-time query:
template < typename T , typename Abi > inline constexpr bool is_simd_array_like_v = /* implementation-defined */ ;
This trait indicates whether has contiguous, index-ordered layout with no inter-element or trailing padding. Users could then write defensive code with , or adaptive code with fast and slow paths for different layouts. The trait takes both and because layout may depend on element type as well as the ABI.
Just as the standard provides to query byte order, this solution would provide a way to query the layout of or types, enabling users to determine at compile time whether the layout is array-like.
This solution preserves maximum implementation freedom but does not guarantee portability. Users must write more complex code, and generic libraries need conditional compilation with potential performance cliffs.
5. Complementary Use of Both Solutions
Solutions 1 and 2 are not mutually exclusive. One outcome may be to adopt both:
-
Solution 1 mandates array-like layout for
, giving users a firm, unconditional guarantee for the most common case.native - abi -
Solution 2 provides
as a query trait for all other ABIs (includingis_simd_array_like_v and implementation-defined ABIs), enabling generic code to adapt to whatever layout the implementation provides.fixed_size
Under this combined approach, would be unconditionally true (as a consequence of the mandate), while for other ABIs it would be implementation-defined but discoverable. This gives users the best of both worlds: zero-overhead portable code for native-width vectors, and a principled way to handle other ABIs without sacrificing generality.
6. Comparison
| Aspect | Solution 1 (Mandate for )
| Solution 2 (Query) | Combined |
|---|---|---|---|
| Scope | only
| All ABIs via query | guaranteed; others queryable
|
| Portability | Guaranteed for
| Conditional | Guaranteed for ; conditional for others
|
| Trailing Padding | None (for )
| Implementation-defined | None for ; implementation-defined for others
|
| User Code | Simple for
| Complex (conditionals) | Simple for ; conditionals for others
|
| Implementation Freedom | Constrained for ; full freedom for other ABIs
| Maximum | Constrained for ; documented for others
|
| Zero-Overhead for Portable Code | Yes (for )
| No | Yes (for )
|
7. Our Recommendation
We recommend adopting both solutions in combination: mandating array-like layout for (Solution 1) and providing query traits for all ABIs (Solution 2).
The historical evidence is compelling: essentially every major CPU architecture in widespread use over the last 25 years exposes SIMD facilities whose interaction with memory is consistent with array-like element layout, from phones to servers. Scoping the mandate to makes the guarantee precise and defensible: we are mandating layout for the ABI that represents the hardware’s native representation.
This scoping eliminates the trailing padding concern entirely. A vector maps to a single hardware register, and the elements fill it completely. The object representation is identical to , with no caveats or escape hatches.
The standard already assumes array-like layout for through multiple mechanisms: the meaning of "native," the recommended practice for conversions to intrinsics, and the existence of ABI tags for handling variations. Leaving layout unspecified creates an internal inconsistency when the indirect path through intrinsics is well-defined but direct is not.
Providing the trait alongside the mandate extends the utility to generic code that must work across multiple ABIs. For , the trait is unconditionally true whilst for other ABIs, it provides the discoverability that enables adaptive algorithms without sacrificing correctness.
We recognize that the problem remains open. Different implementations make legitimately different decomposition choices, and the question of pairwise layout compatibility between specializations may prove more useful than per-type array-likeness. However, covers the vast majority of the motivating use cases — bit-manipulation idioms, type-punning between float and integer vectors of the same width — and provides a solid foundation on which to build.
We also recognize that this issue might be regarded as an evolution of rather than a defect fix. is already legal, and an implementation could be reverse-engineered to determine its layout, but this adds overhead and complexity to the experience of programming with . The current state represents a usability regression compared to existing practice with vendor intrinsics, which have always had well-defined bit-casting semantics.
If the committee prefers not to mandate layout for , we offer Solution 2 alone as a fallback.
8. Additional Discussion
8.1. Implementation Experience
In high-performance software development, such as signal processing, it is extremely common to manipulate data at the bit level to achieve greater speed. At Intel we have large intrinsic-based software code bases where bit-casts are used frequently, and we have found that well-defined bit-casting semantics are essential for writing portable, high-performance code. While this software is written to perform well on Intel processors, customers increasingly demand portable code that runs well on multiple vendors' hardware and across a wide range of compilers. Using is an attractive way to achieve this portability, but not if it becomes impractical due to non-portable (implementation-defined) bit-casting behavior across targets and implementations.
8.2. Scalar Type Consistency
If is 8 bits in scalar code, then should store 8 × 8-bit elements. Users reason about memory consumption, cache behavior, and bandwidth based on element size. If an implementation of silently consumed 512 bits instead of 128 bits, it would defeat the purpose of using smaller types.
8.3. Endianness
Mandating array-like layout for does not introduce any endianness concerns beyond those that already exist for arrays. For a given element type , each element’s byte order in memory follows the platform’s representation of , exactly as it does for . Separately, the mapping from element index to increasing memory offsets is defined by the proposed array-like layout (increasing index order), and this lane ordering is independent of the platform’s endianness.
8.4. Library Interoperability
BLAS, LAPACK, FFTW, Eigen, and game engines all assume array-like layout. Without a specified layout, cannot reliably interoperate with these libraries. Since library interop most commonly involves native-width vectors, the scope addresses the primary use case for such interoperability.
8.5. ABI Stability
Since is new in C++26, there are no existing deployed standard-library implementations with established ABI contracts. Mandating array-like layout for therefore does not constitute an ABI break for any existing implementation. Furthermore, every implementation we are aware of already uses array-like layout for native-width vectors, so the mandate codifies existing practice rather than requiring changes.
8.6. Complex Numbers
There has been historical debate about whether complex values should be stored in interleaved (real, imag, real, imag) or separated (all real, then all imag) format for SIMD processing. Both formats have advantages and disadvantages, leading to different choices for different problem domains or data sizes. However, the debate was ultimately resolved by hardware realities: modern hardware that supports complex values (e.g., AVX-512 and ARM SVE) does so in interleaved form. This led even long-term proponents of separated storage, such as MATLAB, to switch to interleaved storage for their SIMD implementations. This historical precedent demonstrates that hardware realities tend to dominate in the long term, and that software must adapt to those realities for performance and interoperability.
8.7. Masks
The type presents the same general problem as : the object representation is unspecified. However, masks differ from vectors because hardware genuinely diverges on mask representation. AVX2 uses full-element representation, where each boolean occupies the full element width (e.g., 32 bits for a mask corresponding to 32-bit elements). AVX-512 and ARM SVE use compact bitmask representation, where each boolean is a single bit stored in a dedicated predicate register.
This divergence is not merely an implementation choice but it reflects fundamentally different hardware designs. Mandating a single representation for masks would disadvantage targets whose hardware uses the other format, unlike the vector case where all mainstream hardware agrees on array-like layout.
For this reason, the proposed trait allows users to query which representation a given mask specialization uses. This does not by itself make to/from masks portable — that would additionally require specifying the exact mapping for each representation — but it provides the foundation for future work in this area.
9. Proposed Wording
Wording is provided for the recommended combined approach and for Solution 2 alone as a fallback.
9.1. Wording for Recommended Approach (Combined)
Modify [simd.overview] as follows:
The value representation ofconsists ofbasic_vec < T , native - abi < T >> contiguously allocated values of typebasic_vec < T , native - abi < T >>:: size () , in increasing index order. LetT beN . The object representation ofbasic_vec < T , native - abi < T >>:: size () shall be identical to the object representation ofbasic_vec < T , native - abi < T >> . There shall be no padding between elements and no trailing padding.array < T , N > [Note: When
betweenstd :: bit_cast andbasic_vec < T , native - abi < T >> is well-formed (i.e., both have the same size), it behaves equivalently tobasic_vec < U , native - abi < U >> between corresponding arrays. —end note]std :: bit_cast For other ABIs, the object representation of
is implementation-defined.basic_vec < T , Abi >
Modify [simd.mask.overview] to specify mask representation:
The value representation ofconsists ofbasic_mask < Bytes , Abi > boolean values stored either as full elements (each occupyingbasic_mask < Bytes , Abi >:: size () bytes) or as compact bits (one bit per boolean), as determined by the implementation. The representation is consistent for a givenBytes . There shall be no padding between elements in full-element representation.Abi [Note: Different hardware uses different mask representations. AVX2 uses full-element representation where each boolean occupies the full element size. AVX-512 and ARM SVE use compact bitmask representation where each boolean is a single bit. —end note]
Add to [simd.traits]:
template < typename T , typename Abi > inline constexpr bool is_simd_array_like_v = /* see below */ ; template < size_t Bytes , typename Abi > inline constexpr bool is_mask_array_like_v = /* see below */ ; Returns: For
:is_simd_array_like_v < T , Abi > trueif the object representation ofis identical to the object representation ofbasic_vec < T , Abi > (wherearray < T , N > isN ), with no padding between elements and no trailing padding, and elements are stored in increasing index order. Otherwise,basic_vec < T , Abi >:: size () false.Remarks:
isis_simd_array_like_v < T , native - abi < T >> truefor allfor whichT is a valid specialization.basic_vec < T , native - abi < T >> Returns: For
:is_mask_array_like_v < Bytes , Abi > trueifuses full-element representation where each boolean element occupiesbasic_mask < Bytes , Abi > contiguous bytes (with no padding between elements, and in increasing index order). Otherwise,Bytes falseindicates a compact bitmask representation.[Note: When
isis_simd_array_like_v < T , Abi > true,betweenbit_cast specializations behaves equivalently tobasic_vec between corresponding arrays. The mask trait distinguishes between full-element and compact representations. —end note]bit_cast
Add to [simd.expos.abi]:
For ABIs other than, the object representation ofnative - abi is implementation-defined. Implementations shall document whetherbasic_vec < T , Abi > isis_simd_array_like_v < T , Abi > trueorfalsefor each supported combination. Implementations shall document whetherisis_mask_array_like_v < Bytes , Abi > trueorfalsefor each supported combination.
9.2. Wording for Solution 2 Alone (Fallback)
If the committee prefers not to mandate layout for , the following provides query traits without any layout mandate.
Add to [simd.traits]:
template < typename T , typename Abi > inline constexpr bool is_simd_array_like_v = /* see below */ ; template < size_t Bytes , typename Abi > inline constexpr bool is_mask_array_like_v = /* see below */ ; Returns: For
:is_simd_array_like_v < T , Abi > trueif the object representation ofis identical to the object representation ofbasic_vec < T , Abi > (wherearray < T , N > isN ), with no padding between elements and no trailing padding, and elements are stored in increasing index order. Otherwise,basic_vec < T , Abi >:: size () false.Returns: For
:is_mask_array_like_v < Bytes , Abi > trueifuses full-element representation. Otherwise,basic_mask < Bytes , Abi > falseindicates compact bitmask representation.[Note: When
isis_simd_array_like_v < T , Abi > true,betweenbit_cast specializations behaves equivalently tobasic_vec between corresponding arrays. The mask trait distinguishes between full-element and compact representations. —end note]bit_cast
Add to [simd.expos.abi]:
The object representation ofandbasic_vec < T , Abi > is implementation-defined. Implementations shall document whetherbasic_mask < Bytes , Abi > andis_simd_array_like_v < T , Abi > areis_mask_array_like_v < Bytes , Abi > trueorfalsefor each supported combination.
10. Impact on Existing Code
For users, the recommended approach means existing code relying on array-like layout for vectors continues to work, and code that avoided can now use it portably with . The trait additionally enables generic code to adapt to other ABIs. There are no breaking changes.
For implementations, the mandate applies only to , which is already what every implementation we are aware of provides. Implementations using array-like layout for native-width vectors (Intel, GCC, Clang, on major platforms) require no changes. Implementations of and custom ABIs are entirely unaffected, needing only to provide the query traits.
The fallback (Solution 2 alone) requires all implementations to document layout properties and provide query traits, with minimal code changes.
11. Future Work
This paper deliberately limits its scope to layout and basic mask representation queries. Several related topics merit further investigation in future papers:
-
**
layout specification.** Thefixed_size ABI presents challenges due to differing register decomposition strategies across implementations. A future paper should explore whether a useful layout guarantee can be provided, potentially through constraints on how implementations decompose multi-register vectors.fixed_size -
**Pairwise layout compatibility for
.** As discussed in §fixed-size, the most useful guarantee forfixed_size may be pairwise compatibility ("are these two types layout-compatible?") rather than per-type array-likeness. A future paper could propose a trait such asfixed_size to express this relationship.is_simd_layout_compatible_v < T1 , Abi1 , T2 , Abi2 > -
**Mask representation guarantees for
.** This paper providesnative - abi as a query trait but does not mandate a specific mask representation, because hardware genuinely diverges. A future paper could specify the exact bit-level mapping for each representation, enabling portableis_mask_array_like_v to and from masks.bit_cast -
Explicit reinterpretation functions. Rather than relying solely on
, a future paper could propose dedicated functions such asstd :: bit_cast that express type-punning intent directly, potentially with relaxed constraints (e.g., handling size mismatches by truncation or zero-extension).simd_reinterpret < To > ( from )