The latest draft, reference header, and links to past discussions on Github:
This proposal adds support for a few mnemonics which are useful for low level code which has to manually align blocks of memory. It also adds two new alignment casts, which allow safe alignment up or down conversions between differently typed pointers.
This proposal was originally a small part of [N3864] but has been broken out as it is somewhat unrelated to the core purpose of that paper.
This proposal is intended for a C++ Technical Specification.
This proposal is a pure library extension.
It does not require any changes in the core language and
does not depend on any other library extensions.
The proposal is composed entirely of free functions. The
proposed functions are added to the <memory>
header.
No new headers are introduced and the implementations
of these functions on common modern platforms are trivial.
Manually aligning blocks of memory is an operation often required in low level applications such as memory allocators, simd code, device drivers, compression routines, encryption, and binary IO.
The operations is_aligned()
, align_up()
, and align_down()
are commonly re-implemented over and over again as macros in C and/or
inline template functions in C++.
We propose to standardize these 3 simple mnemonics for the following reasons:
We currently have std::align
in the standard for doing alignment calculations.
The function std::align
has one specific use case, that is to carve out an aligned buffer of a known size within a larger buffer.
In order to use std::align
, the user must a priori know the size of the aligned buffer
they require. Unfortunately in some use cases, even calculating the size of this buffer
as an input to std::align
itself requires doing alignment calculations.
Consider the following example of using aligned SIMD registers to process a memory buffer.
The alignment calculations here cannot be done with std::align
.
void process(char* b, char* e) {
char* pb = std::min((char*)std::align_up(b, sizeof(simd16)), e);
char* pe = (char*)std::align_down(e, sizeof(simd16));
for(char* p = b; p < pb; ++p) {
process1(p);
}
for(char* p = pb; p < pe; p += sizeof(simd16)) {
simd16 x = simd16_aligned_load(p);
process16(x);
simd16_aligned_store(x, p);
}
for(char* p = pe; p < e; ++p) {
process1(p);
}
}
We conclude that std::align
is much too specific for general alignment calculations. It has a narrow
use case and should only be considered as a helper function for when that use case is needed.
std::align
could also be implemented using this proposal.
Consider the following code fragment
void process(char* ptr, char* end) {
char* simd_ptr = std::align_down(ptr);
char* simd_end = std::align_down(end);
int128_t simd;
simd = simd_load_aligned(simd_ptr);
simd &= mask_leading_bits(ptr-simd_ptr);
process16(simd_ptr);
simd_ptr += sizeof(int128_t);
for(;simd_ptr < simd_end; ++simd_ptr) {
simd = simd_load_aligned(simd_ptr);
process16(simd_ptr);
}
simd = simd_load_aligned(simd_ptr);
simd_ptr &= mask_trailing_bits(end - simd_ptr);
process16(simd_ptr);
}
If implemented by hand today in C++, this code would trigger
undefined behavior if simd_ptr
and/or simd_end
were located
outside of the memory block containing ptr
.
In general, undefined behavior is a reasonable expectation because
simd_ptr
could now point to an invalid or restricted memory address.
On most modern machines, one can read and write to any memory address
within the same memory page. Each page is aligned to and is of size PAGE_SIZE
,
which is commonly 4096 bytes. Since simd registers are typically
nowhere near this size, on these implementations one can safely align_down
or
align_up
and arbitrary pointer.
When we say a memory block, we mean a block of valid memory allocated to the application either on the stack, heap, or elsewhere.
As is described in §5.7/5 [IsoCpp], the valid set of addresses in this block span from the first address
of the block up to and including 1 + the last address
.
We will now describe the additions to the <memory>
header. This is a procedural library implemented
entirely using templated and overloaded free functions.
Each pre-condition has
either undefined or implementation defines results and each case is documented below.
For all of the following, std::is_integral<integral>::value && !std::is_same<integral<bool>::value == true
template<class integral>
constexpr bool is_aligned(integral x, size_t a) noexcept;
Returns: true
if x == 0
or x
is a multiple of a
.
template<class integral>
constexpr integral align_up(integral x, size_t a) noexcept;
Returns: n
, where n
is the least number >= x
and is_aligned(n, a)
.
template<class integral>
constexpr integral align_down(integral x, size_t a) noexcept;
Returns: n
, where n
is the greatest number <= x
and is_aligned(n, a)
.
The result is undefined if any of:
a == 0
a
is not a power of 2x < 0
integral
is a signed type and the result causes an overflow in either integral
or its promoted arithmetic type.The result is 0 if the result is not undefined and any of:
integral
is an unsigned type and the result causes an overflow in either integral
or its promoted arithmetic type.All of these implementations are trivial, efficient, and portable.
template <class integral>
constexpr bool is_aligned(integral x, size_t a) noexcept {
return (x & (integral(a) - 1)) == 0;
}
template <class integral>
constexpr integral align_up(integral x, size_t a) noexcept {
return integral((x + (integral(a) - 1)) & ~integral(a-1));
}
template <class integral>
constexpr integral align_down(integral x, size_t a) noexcept {
return integral(x & ~integral(a-1));
}
~integral(a-1)
can be optimized to -integral(a)
.bool is_aligned(const volatile void* p, size_t a);
Returns: true
if p == nullptr
or p
is aligned to a
, otherwise return false
.
The result is undefined if any of:
a == 0
a
is not a power of 2a > std::numeric_limits<uintptr_t>::max()
[note: this would require a platform where sizeof(size_t) > sizeof(uintptr_t)
--end note].The result is implementation-defined if the result is not undefined and any of:
a > PTRDIFF_MAX
p != nullptr
and p
does not point to an existing memory blockp
For all of the following std::is_pointer<pointer>::value == true
template <class pointer>
pointer align_up(pointer p, size_t a);
Returns: the least pointer t
such that t >= p
and is_aligned(t, a) == true
, or nullptr
if p == nullptr
.
template <class pointer>
pointer align_down(pointer p, size_t a);
Returns: the greatest pointer t
such that t <= p
and is_aligned(t, a) == true
, or nullptr
if p == nullptr
.
nullptr_t align_up(nullptr_t, size_t) { return nullptr; }
nullptr_t align_down(nullptr_t, size_t) { return nullptr; }
We also add special overloads for nullptr_t
because align_up(nullptr, a)
and align_down(nullptr, a)
will not compile.
The result is undefined if any of:
a == 0
a
is not a power of 2a > std::numeric_limits<uintptr_t>::max()
[Note: this result would require a platform where sizeof(size_t) > sizeof(uintptr_t)
--end note].The result is implementation-defined if the result is not undefined and any of:
a > PTRDIFF_MAX
p != nullptr
and p
does not point to an existing memory blockp
bool is_aligned(const volatile void* p, size_t a) {
return is_aligned(reinterpret_cast<uintptr_t>(p), a);
}
template <class pointer>
pointer align_up(pointer p, size_t a) {
return reinterpret_cast<pointer>(align_up(reinterpret_cast<uintptr_t>(p), a));
}
template <class pointer>
pointer align_down(pointer p, size_t a) {
return reinterpret_cast<pointer>(align_down(reinterpret_cast<uintptr_t>(p), a));
}
Note that the above assumes the above assume a flat address space and that arithmetic on uintptr_t
is
equivalent to arithmetic on char*
. While these conditions prevail for the majority of modern platforms,
neither of which is required by the standard. It is entirely possible
for an implementation to perform any transformation when casting void*
to uintptr_t
as long the
transformation can be reversed when casting back from uintptr_t
to void*
.
For all of the following, std::is_pointer<pointer>::value == true
These functions are designed to become the standard way of doing a reinterpret_cast
and an alignment adjustment all in one operation which
can optionally be checked by the implementation for correctness.
template <class pointer, class U>
pointer align_up_cast(U* p, size_t a=alignof(typename std::remove_pointer<pointer>::type))
Returns: the least pointer t
where reinterpret_cast<void*>(t) >= reinterpret_cast<void*>p
and t
is aligned to a
, or nullptr
if p == nullptr
.
template <class pointer, class U>
pointer align_down_cast(U* p, size_t a=alignof(typename std::remove_pointer<pointer>::type))
Returns: the greatest pointer t
where reinterpret_cast<void*>(t) >= reinterpret_cast<void*>p
and t
is aligned to a
, or nullptr
if p == nullptr
.
template <class pointer>
nullptr_t align_up_cast(nullptr_t, size_t a=1) { (void)a; return nullptr; }
template <class pointer, class U>
nullptr_t align_down_cast(nullptr_t, size_t a=1) { (void)a; return nullptr; }
Again, we add nullptr_t
overloads.
The result is undefined if any of:
a == 0
a
is not a power of 2a > std::numeric_limits<uintptr_t>::max()
[Note: this would require a platform where sizeof(size_t) > sizeof(uintptr_t)
--end note].The result is implementation-defined if the result is not undefined and any of:
a > PTRDIFF_MAX
p != nullptr
and p
does not point to an existing memory blockp
template <class pointer, class U>
inline pointer align_up_cast(U* p, size_t a=alignof(typename std::remove_pointer<pointer>::type)) {
return reinterpret_cast<pointer>(align_up(p, a));
}
template <class pointer, class U>
inline pointer align_down_cast(U* p, size_t a=alignof(typename std::remove_pointer<pointer>::type)) {
return reinterpret_cast<pointer>(align_down(p, a));
}
Some compilers throw a warning if you cast a type with lesser alignment to a type with greater alignment.
For example, on clang 3.4 [clang] with -Wcast-align
enabled, the following code has a warning: cast from 'char*' to 'int*' increases required alignment from 1 to 4 [-Wcast-align]
int foo(void* f) {
return *(int*)(char*)f;
}
Implementations should not fire such warnings for align_up_cast
or align_down_cast
.
uintptr_t
Compute page boundaries using integers:
uintptr_t addr_in_page = /* something */
auto page_begin = align_down(addr_in_page, PAGE_SIZE);
auto page_end = align_up(addr_in_page, PAGE_SIZE);
In a disk device driver, optimize reads and write from buffers which are already block size aligned.
void* user_buffer = /* something */
if(is_aligned(user_buffer, BLOCK_SIZE) {
fast_direct_write(user_buffer, size, block_device);
} else {
slow_buffered_write(user_buffer, size, block_device);
}
Cast array of ints to array of simd ints.
int32_t* x = /* something */
auto* s = align_up_cast<int128_t*>(x);
Get a pointer of type float*
which is aligned for 16 byte floating operations
float* f = /* something */
f = align_up(f, 16);
IS_ALIGNED
, ALIGN_MASK
, and ALIGN
[LXR].