P0401R3: Providing size feedback in the Allocator interface

1. Introduction

This is a library paper to describe how size feedback can be used with allocators:

In the case of std::allocator, the language feature proposed in [P0901R5] could be used.
In the case of custom allocators, the availability of the traits interface allows standard library containers to leverage this functionality.

2. Motivation

Consider code adding elements to vector:

std::vector<int> v = {1, 2, 3};
// Expected: v.capacity() == 3

// Add an additional element, triggering a reallocation.
v.push_back(4);

Many allocators only allocate in fixed-sized chunks of memory, rounding up requests. Our underlying heap allocator received a request for 12 bytes (3 * sizeof(int)) while constructing v. For several implementations, this request is turned into a 16 byte region.

2.1. Why not `realloc`

realloc poses several problems problems:

We have unclear strategies for growing a container. Consider the two options (a and b) outlined in the comments of the example below. Assume, as with the §2 Motivation example, that there is a 16 byte size class for our underlying allocator.
```
std::vector<int> v = {1, 2, 3};
// Expected: v.capacity() == 3

// Add an additional element.  Does the vector:
// a) realloc(sizeof(int) * 4), adding a single element
// b) realloc(sizeof(int) * 6), attempt to double, needing reallocation
v.push_back(4);

// Add another element.
// a) realloc(sizeof(int) * 5), fails in-place, requires reallocation
// b) size + 1 <= capacity, no reallocation required
v.push_back(5);
```
Option a requires every append (after the initial allocation) try to realloc, requiring an interaction with the allocator each time.

Option b requires guesswork to probe the allocator’s true size. By doubling (the typical growth pattern for std::vector in libc++ and libstdc++), we skip past the existing allocation’s size boundary. While we can use a reduced growth rate, doing so requires more interactions with the allocator (calls to realloc), as well as copies (when we cannot resize in-place).

Both of these options add extra branching on the control flow for growing the buffer.
When the allocator cannot resize the buffer in-place, it allocates new memory and memcpy's the contents. For non-trivially copyable types, this is disastrous. The language lacks support for reallocating in-place without moving. While this function could be implemented as magic library function, it could not interact with many underlying malloc implementations (libc, in the case of libc++ and libstdc++) used to build operator new, nor could it interact with user-provided replacement functions.

In-place reallocation has been considered in [N3495] (throwing std::bad_alloc when unable) and [P0894R1] (returning false when unable). This has sometimes been referred to as an "extend" or an "expand" interface. Extending an existing allocation requires a round-trip to the allocator (and its requisite branch/call overheads) to determine whether the allocation could be expanded. As discussed previously, we need to probe for the true size of the backing object (until we find extend fails) until we give up and apply a multiplicative expansion factor.

For fixed-size allocators it makes more sense, and is much simpler for containers to adapt to, if the allocator is able to over-allocate on the initial request and inform the caller how much memory was made available.

2.2. Why not ask the allocator for the size

We could also explore APIs that answer the question: "If I ask for N bytes, how many do I actually get?" [jemalloc] and [TCMalloc] call this nallocx.

nallocx requires an extra call / size calculation. This requires we duplicate work, as the size calculation is performed as part of allocation itself.
nallocx impairs telemetry for monitoring allocation request sizes. If a nallocx return value is cached, the user appears to be asking for exactly as many bytes as nallocx said they would get.

See also [P0901R5]'s discussion "nallocx: not as awesome as it looks."

3. Proposal

Wording relative to [N4800].

We propose a free function to return the allocation size simultaneously with an allocation.

Amend [allocator.requirements]:

template<typename Pointer> struct allocation_result { Pointer ptr; size_t count; };

The class template allocation_result has the template parameters, data members, and special members specified above. It has no base classes or members other than those specified.

template<typename Allocator> constexpr allocation_result<Allocator::pointer> allocate_at_least(Allocator& a, size_t n);

Effects: Let s be the result of allocate_at_least(a, n). Memory is allocated for at least n objects of type T but objects are not constructed.
Returns: allocation_result{ptr, m}, where ptr is that memory and m is the number of objects for which memory has been allocated, such that m>=n.
[ Note: If s.count == 0, the value of s.ptr is unspecified. -- end note]

Table 34 - Cpp17Allocator Requirements

a.allocate_at_least(n) allocation_result<X::pointer> Returns allocation_result{ptr, m}. Memory is allocated for at least n objects of type T but objects are not constructed and returned as ptr. The actual number of objects m, such that m>=n, that memory has been allocated for is returned in the second value. allocate_at_least may throw an appropriate exception. [ Note: If n == 0, the value of ptr is unspecified. — end note ] {a.allocate(n), n}
a.deallocate(p,n) (not used) Preconditions: p shall be a value returned by an earlier call to allocate or allocate_at_least that has not been invalidated by an intervening call to deallocate. If this memory was obtained by a call to allocate, n shall match the value passed to allocate to obtain this memory. Otherwise, n shall satisfy capacity >= n >= requested where [p, capacity] = allocate_at_least(requested) was used to obtain this memory.
Throws: Nothing.

Amend [allocator.members]

Except for the destructor, member functions of the default allocator shall not introduce data races as a result of concurrent calls to those member functions from different threads. Calls to these functions that allocate or deallocate a particular unit of storage shall occur in a single total order, and each such deallocation call shall happen before the next allocation (if any) in this order.
[[nodiscard]] constexpr T* allocate(size_t n);
Returns: A pointer to the initial element of an array of storage of size n * sizeof(T), aligned appropriately for objects of type T.
Remarks: the storage is obtained by calling ::operator new, but it is unspecified when or how often this function is called.
Throws: bad_array_new_length if SIZE_MAX / sizeof(T) < n, or bad_alloc if the storage cannot be obtained.
[[nodiscard]] constexpr allocation_result<T*> allocate_at_least(size_t n);
Returns: allocation_result{ptr, count}, where ptr is a pointer to the initial element of an array of storage of size count * sizeof(T), aligned appropriately for objects of type T, and count >= n.
Remarks: the storage is obtained by calling ::operator new, but it is unspecified when or how often this function is called.
Throws: bad_array_new_length if SIZE_MAX / sizeof(T) < n, or bad_alloc if the storage cannot be obtained.
constexpr void deallocate(T* p, size_t n);
Preconditions: p shall be a pointer value obtained from allocate() or allocate_at_least . If this memory was obtained by a call to allocate, n shall equal the value passed as the first argument to the invocation of allocate which returned p. Otherwise, n shall satisfy count >= n >= requested where [p, count] = allocate_at_least(requested) was used to obtain this memory.
Effects: Deallocates the storage referenced by p.
Remarks: Uses ::operator delete, but it is unspecified when this function is called.

4. Design Considerations

4.1. `allocate` selection

There are multiple approaches here:

Return a pointer-size pair, as presented.
Overload allocate and return via a reference parameter. This potentially hampers optimization depending on ABI, as the value is returned via memory rather than a register.

4.2. Size Return Value

In [Prague], LEWG discussed the return value. For compatibility with the existing allocator APIs, which work in units of objects rather than bytes, this proposal chooses to continue to return an integer number of objects.

Additionally, for types with non-trivial alignment requirements, we must allocate storage for objects, rather than bytes, as raw bytes do not convey the appropriate alignment needs to the allocator.

For example: In the std::vector<T> case, many implementations use 3 pointers for representing the state of the container (begin, end, and capacity). If we preserved the precise value returned by the underlying allocator, we may not be able to legally form the capacity pointer. For these implementations, replacing the capacity pointer with a capacity in bytes would be an ABI break.

4.3. `deallocate` changes

We now require flexibility on the size we pass to deallocate. For container types using this allocator interface, they are faced with the choice of storing both the original size request as well as the provided size (requiring additional auxillary storage), or being unable to reproduce one during the call to deallocate.

As the true size is expected to be useful for the capacity of a string or vector instance, the returned size is available without additional storage requirements. The original (requested) size is unavailable, so we relax the deallocate size parameter requirements for these allocations.

4.4. Interaction with `polymorphic_allocator`

std::pmr::memory_resource is implemented using virtual functions. Adding new methods, such as the proposed allocate API would require taking an ABI break.

4.5. Zero-Sized Requests

In Prague, LEWG discussed the behavior of allocate_at_least(allocator, 0). This maximizes implementation freedom.

The return value may not be dereferenceable (even if not nullptr).
Other implementations, such as a stack allocator, could just return the current offset.

5. Revision History

5.1. R2 → R3

Applied LEWG feedback from [Prague].

Renamed sized_ptr_t to allocation_result, based on discussion in Prague that this library proposal (which returns a count of storage for objects) have a distinct type from the underlying language proposal ([P0901R5], which returns bytes). This was one of the names bikeshedded during the review of P0901R5 ([P0901R5-Review]).
Renamed n (the returned size) to count, based on discussion in Prague to clarify that is measured in objects, rather than bytes. Additionally, augmented discussion of the deallocate changes for why implementations may not be able to preserve the precise size returned by the underlying allocator.

Poll: We prefer to return number of bytes.

SF F N A SA
0 0 6 6 5

SF	F	N	A	SA
0	0	6	6	5

Changed the template parameter to pointer type to allow for fancy pointers.

POLL: Change the template parameter to sized_ptr_t to pointer type (to support fancy ptr).
Unanimous consent

Added design intent and wording that allocation_result be compatible with structured bindings. This wording was adapted from the proposed resolution of [LWG3373].
Added constexpr
Removed default template parameter from allocation_result
Based on LEWG feedback, changed allocate_at_least to deduce return type from the allocator. Since the pointer type is chosen by the allocator, the underlying object type is less relevant. This also removed the default template parameter.

POLL: We want to deduce the object type from the allocator (alloc_least_n [sic]).

SF F N A SA
4 8 1 0 0

SF	F	N	A	SA
4	8	1	0	0

POLL: We like the design, please revise and do initial wording review (with Pablo, and David Stone) send to LWG (tentatively Ready).

SF F N A SA
6 12 0 0 0

SF	F	N	A	SA
6	12	0	0	0

5.2. R1 → R2

Applied LEWG feedback from [Cologne].

Poll: We want a feature like this.

SF F N A SA
2 9 2 0 0

SF	F	N	A	SA
2	9	2	0	0

As users may specialize std::allocator_traits, this revision moves to a free function std::allocate_at_least as suggested via a LEWG poll.

Poll: Name choice

5 allocate_with_size
4 overallocate
2 allocate_with_oversubscribe
6 allocate_oversize
14 allocate_at_least

Poll: Prefer a struct rather than a requirement of structured bindings.

SF F N A SA
2 8 0 2 1

P0401R3
Providing size feedback in the Allocator interface

Published Proposal, 2020-03-01

Abstract

1. Introduction

2. Motivation

2.1. Why not `realloc`

2.2. Why not ask the allocator for the size

3. Proposal

4. Design Considerations

4.1. `allocate` selection

4.2. Size Return Value

4.3. `deallocate` changes

4.4. Interaction with `polymorphic_allocator`

4.5. Zero-Sized Requests

5. Revision History

5.1. R2 → R3

5.2. R1 → R2

References

Informative References

`a.allocate_at_least(n)`	`allocation_result<X::pointer>`	Returns `allocation_result{ptr, m}`. Memory is allocated for at least `n` objects of type `T` but objects are not constructed and returned as `ptr`. The actual number of objects `m`, such that `m>=n`, that memory has been allocated for is returned in the second value. `allocate_at_least` may throw an appropriate exception. [ Note: If `n == 0`, the value of `ptr` is unspecified. — end note ]	`{a.allocate(n), n}`
`a.deallocate(p,n)`	(not used)	Preconditions: `p` shall be a value returned by an earlier call to `allocate` or `allocate_at_least` that has not been invalidated by an intervening call to `deallocate`. If this memory was obtained by a call to `allocate`, `n` shall match the value passed to `allocate` to obtain this memory. Otherwise, `n` shall satisfy `capacity >= n >= requested` where `[p, capacity] = allocate_at_least(requested)` was used to obtain this memory. Throws: Nothing.

5	`allocate_with_size`
4	`overallocate`
2	`allocate_with_oversubscribe`
6	`allocate_oversize`
14	`allocate_at_least`

P0401R3Providing size feedback in the Allocator interface

Published Proposal, 2020-03-01

Abstract

1. Introduction

2. Motivation

2.1. Why not realloc

2.2. Why not ask the allocator for the size

3. Proposal

4. Design Considerations

4.1. allocate selection

4.2. Size Return Value

4.3. deallocate changes

4.4. Interaction with polymorphic_allocator

4.5. Zero-Sized Requests

5. Revision History

5.1. R2 → R3

5.2. R1 → R2

References

Informative References

P0401R3
Providing size feedback in the Allocator interface

2.1. Why not `realloc`

4.1. `allocate` selection

4.3. `deallocate` changes

4.4. Interaction with `polymorphic_allocator`