P0901R8
Size feedback in operator new

Published Proposal,

This version:
http://wg21.link/P0901R8
Authors:
(Google)
Audience:
CWG
Project:
ISO/IEC JTC1/SC22/WG21 14882: Programming Language — C++

Abstract

Provide access to actual malloc buffer sizes for users.

1. Motivation

Throughout this document "malloc" refers to the implementation of ::operator new both as fairly standard practice for implementers, and to make clear the distinction between the interface and the implementation.

Everyone’s favorite dynamic data structure, std::vector, allocates memory with code that looks something like this (with many details, like Allocator, templating for non char, and exception safety, elided):

void vector::reserve(size_t new_cap) {
  if (capacity_ >= new_cap) return;
  const size_t bytes = new_cap;
  void *newp = ::operator new(new_cap);
  memcpy(newp, ptr_, capacity_);
  ptr_ = newp;
  capacity_ = bytes;
}

Consider the sequence of calls:

std::vector<char> v;
v.reserve(37);
// ...
v.reserve(38);

All reasonable implementations of malloc round sizes, both for alignment requirements and improved performance. It is extremely unlikely that malloc provided us exactly 37 bytes. We do not need to invoke the allocator here...except that we don’t know that for sure, and to use the 38th byte would be undefined behavior. We would like that 38th byte to be usable without a roundtrip through the allocator.

This paper proposes an API making it safe to use that byte, and explores many of the design choices (not all of which are obvious without implementation experience.)

1.1. nallocx: not as awesome as it looks

The simplest way to help here is to provide an informative API answering the question "If I ask for N bytes, how many do I actually get?" [jemalloc] and [TCMalloc] call this nallocx. We can then use that hint as a smarter parameter for operator new:

void vector::reserve(size_t new_cap) {
  if (capacity_ >= new_cap) return;
  const size_t bytes = nallocx(new_cap, 0);
  void *newp = ::operator new(bytes);
  memcpy(newp, ptr_, capacity_);
  ptr_ = newp;
  capacity_ = bytes;
}

This is a good start, and does in fact work to allow vector and friends to use the true extent of returned objects. But there are three significant problems with this approach.

1.1.1. nallocx must give a conservative answer

While many allocators have a deterministic map from requested size to allocated size, it is by no means guaranteed that all do. Presumably they can make a reasonably good guess, but if two calls to ::operator new(37) might return 64 and 128 bytes, we’d definitely rather know the right answer, not a conservative approximation.

1.1.2. nallocx duplicates work

Allocation is often a crucial limit on performance. Most allocators compute the returned size of an object as part of fulfilling that allocation...but if we make a second call to nallocx, we duplicate all that communication, and also the overhead of the function call.

1.1.3. nallocx hides information from malloc

The biggest problem (for the authors) is that nallocx discards information malloc finds valuable (the user’s intended allocation size.) That is: in our running example, malloc normally knows that the user wants 37 bytes (then 38), but with nallocx, we will only ever be told that they want 40 (or 48, or whatever nallocx(37) returns.)

Google’s malloc implementation ([TCMalloc]) rounds requests to one of a small (<100) number of sizeclasses: we maintain local caches of appropriately sized objects, and cannot do this for every possible size of object. Originally, these sizeclasses were just reasonably evenly spaced among the range they cover. Since then, we have used extensive telemetry on allocator use in the wild to tune these choices. In particular, as we know (approximately) how many objects of any given size are requested, we can solve a fairly simple optimization problem to minimize the total internal fragmentation for any choice of N sizeclasses.

Widespread use of nallocx breaks this. By the time TCMalloc’s telemetry sees a request that was hinted by nallocx, to the best of our knowledge the user wants exactly as many bytes as we currently provide them. If a huge number of callers wanted 40 bytes but were currently getting 48, we’d lose the ability to know that and optimize for it.

Note that we can’t take the same telemetry from nallocx calls: we have no idea how many times the resulting hint will be used (we might not allocate at all, or we might cache the result and make a million allocations guided by it.) We would also lose important information in the stack traces we collect from allocation sites.

Optimization guided by malloc telemetry has been one of our most effective tools in improving allocator performance. It is important that we fix this issue without losing the ground truth of what a caller of ::operator new wants.

These three issues explain why we don’t believe nallocx is a sufficient solution here.

1.2. after allocation is too late

Another obvious suggestion is to add a way to inspect the size of an object returned by ::operator new. Most mallocs provide a way to do this; [jemalloc] calls it sallocx. Vector would look like:

void vector::reserve(size_t new_cap) {
  if (capacity_ >= new_cap) return;
  void *newp = ::operator new(new_cap);
  const size_t bytes = sallocx(newp);
  memcpy(newp, ptr_, capacity_);
  ptr_ = newp;
  capacity_ = bytes;
}

This is worse than nallocx. It fixes the non-constant size problem, and avoids a feedback loop, but the performance issue is worse (this is the major issue fixed by [SizedDelete]!), and what’s worse, the above code invokes UB as soon as we touch byte new_cap+1. We could in principle change the standard, but this would be an implementation nightmare.

1.3. realloc’s day has passed

We should also quickly examine why the classic C API realloc is insufficient.

void vector::reserve(size_t new_cap) {
  if (capacity_ >= new_cap) return;
  ptr_ = realloc(ptr_, new_cap);
  capacity_ = new_cap;
}

In principle a realloc from 37 to 38 bytes wouldn’t carry the full cost of allocation. But it’s dramatically more expensive than making no call at all. What’s more, there are a number of more complicated dynamic data structures that store variable-sized chunks of data but are never actually resized. These data structures still deserve the right to use all the memory they’re paying for.

Furthermore, realloc's original purpose was not to allow the use of more bytes the caller already had, but to (hopefully) extend an allocation in place to adjacent free space. In a classic malloc implementation this would actually be possible...but most modern allocators use variants of slab allocation. Even if the 65th byte in a 64-byte allocation isn’t in use, they cannot be combined into a single object; it’s almost certainly required to be used for the next 64-byte allocation. In the modern world, realloc serves little purpose.

2. Proposal

We propose adding new overloads of ::operator new (size-returning allocation functions) that directly inform the user of the size available to them. C++ makes ::operator new replaceable (15.5.4.6), allowing a program to provide its own version different from the standard library’s implementation.

We propose wording, relative to [N4868]:

An allocation function shall be a class member function or a global function; a program is ill-formed if an allocation function is declared in a namespace scope other than global scope or declared static in global scope. An allocation function is a size-returning allocation function if it has a second parameter of type std::return_size_t, or it has a second parameter of type std::align_val_t and a third parameter of type std::return_size_t. The return type shall be std::sized_allocation_t<void> if the allocation function is a size-returning allocation function and void* otherwise . The first parameter shall have type std::size_t ([support.types]). The first parameter shall not have an associated default argument ([dcl.fct.default]). The value of the first parameter is interpreted as the requested size of the allocation. [...]
Objects created by a new-expression have dynamic storage duration ([basic.stc.dynamic]). [ Note: The lifetime of such an object is not necessarily restricted to the scope in which it is created. — end note ]

When the allocated object is not an array, the result of the new-expression resulting pointer value is a pointer to the object created.

When the allocated object is an array (that is, the noptr-new-declarator syntax is used or the new-type-id or type-id denotes an array type), the new-expression yields resulting pointer value is a pointer to the initial element (if any) of the array.
The result of a size-returning placement new-expression is an object of type std::sized_allocation_t<T> ([new.syn]) whose ptr member has the resulting pointer value and whose bytes member is the bytes member of the value returned by the size-returning allocation function less any array allocation overhead. The result of any other new-expression is the resulting pointer value.
Overload resolution is performed on a function call created by assembling an argument list. The first argument is the amount of space requested, and has type std::size_t. If the type of the allocated object has new-extended alignment, the next argument is the type’s alignment, and has type std::align_val_t. If the new-placement syntax is used, the initializer-clauses in its expression-list are the succeeding arguments. If no matching function is found and the allocated object type has new-extended alignment, the alignment argument is removed from the argument list, and overload resolution is performed again. A size-returning placement new expression is one whose selected allocation function is a size-returning allocation function.
A declaration of a placement deallocation function matches the declaration of a placement allocation function if it has the same number of parameters and, after parameter transformations ([dcl.fct]), all parameter types except the first are identical. If no matching function is found and one of the arguments to the new-placement syntax has type std::return_size_t, that argument is removed from the argument list, and lookup and template argument deduction is performed again. If the lookup finds a single matching deallocation function, that function will be called; otherwise, no deallocation function will be called. If the lookup finds a usual deallocation function and that function, considered as a placement deallocation function, would have been selected as a match for the allocation function, the program is ill-formed. For a non-placement allocation function, the normal deallocation function lookup is used to find the matching deallocation function ([expr.delete]).
If a new-expression calls a deallocation function, it passes the value returned from the allocation function call , or the ptr member of that value for a size-returning allocation function, as the first argument of type void*. If a placement deallocation function is called, it is passed the same additional arguments as were passed to the placement allocation function, that is, the same arguments as those specified with the new-placement syntax. If the implementation is allowed to introduce a temporary object or make a copy of any argument as part of the call to the allocation function, it is unspecified whether the same object is used in the call to both the allocation and deallocation functions.
In an array delete expression, the value of the operand of delete may be a null pointer value or a pointer value that resulted from the resulting pointer value of a previous array new-expression. If not, the behavior is undefined.
The value returned from the resulting pointer value of the allocation call of the new-expression shall be passed as the first argument to the deallocation function.
operator new(std::size_t)
operator new(std::size_t, std::align_val_t)
operator new(std::size_t, const std::nothrow_t&)
operator new(std::size_t, std::align_val_t, const std::nothrow_t&)


operator new(std::size_t size, std::return_size_t)
operator new(std::size_t size, std::align_val_t al, std::return_size_t)
operator new(std::size_t size, std::return_size_t, std::nothrow_t)
operator new(std::size_t size, std::align_val_t al, std::return_size_t, std::nothrow_t)


[...]

operator new[](std::size_t)
operator new[](std::size_t, std::align_val_t)
operator new[](std::size_t, const std::nothrow_t&)
operator new[](std::size_t, std::align_val_t, const std::nothrow_t&)


operator new[](std::size_t size, std::return_size_t)
operator new[](std::size_t size, std::align_val_t al, std::return_size_t)
operator new[](std::size_t size, std::return_size_t, std::nothrow_t)
operator new[](std::size_t size, std::align_val_t al, std::return_size_t, std::nothrow_t)


[...]
namespace std {
  class bad_alloc;
  class bad_array_new_length;

  struct destroying_delete_t {
    explicit destroying_delete_t() = default;
  };
  inline constexpr destroying_delete_t destroying_delete{};


  // global operator new control
  struct return_size_t {
    explicit return_size_t() = default;
  };

  inline constexpr return_size_t return_size{};

  template<typename T = void>
  struct sized_allocation_t {
    T *ptr;
    size_t bytes;
  };


  enum class align_val_t : size_t {};

  [...]
}

[[nodiscard]] void* operator new(std::size_t size);
[[nodiscard]] void* operator new(std::size_t size, std::align_val_t alignment);
[[nodiscard]] void* operator new(std::size_t size, const std::nothrow_t&) noexcept;
[[nodiscard]] void* operator new(std::size_t size, std::align_val_t alignment,
                                 const std::nothrow_t&) noexcept;


[[nodiscard]] std::sized_allocation_t<void> ::operator new(
        std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t<void> ::operator new(
  std::size_t size, std::align_val_t alignment, std::return_size_t);
[[nodiscard]] std::sized_allocation_t<void> ::operator new(
  std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t<void> ::operator new(
  std::size_t size, std::align_val_t alignment, std::return_size_t, std::nothrow_t) noexcept;

[...]

[[nodiscard]] void* operator new[](std::size_t size);
[[nodiscard]] void* operator new[](std::size_t size, std::align_val_t alignment);
[[nodiscard]] void* operator new[](std::size_t size, const std::nothrow_t&) noexcept;
[[nodiscard]] void* operator new[](std::size_t size, std::align_val_t alignment,
                                   const std::nothrow_t&) noexcept;


[[nodiscard]] std::sized_allocation_t<void> ::operator new[](
        std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t<void> ::operator new[](
  std::size_t size, std::align_val_t alignment, std::return_size_t);
[[nodiscard]] std::sized_allocation_t<void> ::operator new[](
  std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t<void> ::operator new[](
  std::size_t size, std::align_val_t alignment, std::return_size_t, std::nothrow_t) noexcept;

[...]

[[nodiscard]] void* operator new  (std::size_t size, void* ptr) noexcept;
[[nodiscard]] void* operator new[](std::size_t size, void* ptr) noexcept;
void  operator delete  (void* ptr, void*) noexcept;
void  operator delete[](void* ptr, void*) noexcept;
[[nodiscard]] std::sized_allocation_t<void> ::operator new(
        std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t<void> ::operator new(
  std::size_t size, std::align_val_t alignment, std::return_size_t);
[[nodiscard]] std::sized_allocation_t ::operator new(
  std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t ::operator new(
  std::size_t size, std::align_val_t alignment, std::return_size_t, std::nothrow_t) noexcept;
void operator delete(void* ptr) noexcept;
void operator delete(void* ptr, std::size_t size) noexcept;
void operator delete(void* ptr, std::align_val_t alignment) noexcept;
void operator delete(void* ptr, std::size_t size, std::align_val_t alignment) noexcept;

[...]

[...]

[[nodiscard]] std::sized_allocation_t ::operator new[](
        std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t ::operator new[](
  std::size_t size, std::align_val_t alignment, std::return_size_t);
[[nodiscard]] std::sized_allocation_t ::operator new[](
  std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t ::operator new[](
  std::size_t size, std::align_val_t alignment, std::return_size_t, std::nothrow_t) noexcept;
void operator delete[](void* ptr) noexcept;
void operator delete[](void* ptr, std::size_t size) noexcept;
void operator delete[](void* ptr, std::align_val_t alignment) noexcept;
void operator delete[](void* ptr, std::size_t size, std::align_val_t alignment) noexcept;

[...]

[...]

Name Value
__cpp_size_returning_new PLACEHOLDER DATE

3. Alternative Designs Considered

Another signature we could use would be:

enum class return_size_t : std::size_t {};
void* ::operator new(std::size_t size, std::return_size_t&);

(and so on.) This is slightly simpler to read as a signature, but arguably worse in usage:

std::tie(obj.ptr, obj.size) = ::operator new(37, std::return_size_t{});

// ...vs...

// Presumably the object implementation wants to contain a size_t,
// not a return_size_t.
std::return_size_t rs;
obj.ptr = ::operator new(37, rs);
obj.size = rs;

More importantly, this form is less efficient. In practice, underlying malloc implementations provide actual definitions of ::operator new symbols which are called like any other function. Passing a reference parameter requires us to actually return the size via memory.

Whether we use a reference parameter or a second returned value, the interpretation is the same.

3.1. How many ::operator new's?

It is unfortunate that we have so many permutations of ::operator new--eight seems like far more than we should really need! But there really isn’t any significant runtime cost for having them. Use of raw calls to ::operator new is relatively rare: It’s a building block for low-level libraries, allocators ([P0401R4]), and so on, so the cognitive burden on C++ users is low.

The authors have considered other alternatives to the additional overloads. At the Jacksonville meeting, EWG suggested looking at parameter packs.

The authors have also considered APIs where all parameters are passed, thereby requiring a single new overload. This adds further overhead for implementations, as it moves compile-time decisions (is the alignment at or below the minimum guaranteed by operator new) into runtime ones.

The alternative to modifying the handling of new-expressions invoking deallocation functions (when an exception is thrown) would require additional overloads for operator delete / operator delete[] whose sole purpose would be to accept and discard the std::return_size_t.

3.2. Implementation difficulty

It’s worth reiterating that there’s a perfectly good trivial implementation of these functions:

std::sized_allocation_t<void> ::operator new(
    std::size_t n, std::return_size_t) {
  return {::operator new(n), n};
}

Malloc implementations are free to properly override this with a more impactful definition, but this paper poses no significant difficulty for toolchain implementers.

Implementation Experience:

3.3. Interaction with Sized Delete

For allocations made with sized_allocation_t-returning ::operator new, we need to relax ::operator delete's size argument (16.6.2.1 and 16.6.2.2). For allocations of T, the size quanta used by the allocator may not be a multiple of sizeof(T), leading to both the original and returned sizes being unrecoverable at the time of deletion.

Consider the memory allocated by:

using T = std::aligned_storage<16, 8>::type;

std::vector<T> v(4);

The underlying heap allocation is made with ::operator new(64, std::return_size_t).

For allocations made with

std::tie(p, m) = ::operator new(n, std::return_size_t{});

we permit ::operator delete(p, s) where n <= s <= m .

This behavior is consistent with [jemalloc]'s sdallocx, where the deallocation size must fall between the request (n) and the actual allocated size (m) inclusive.

3.4. Advantages

It’s easy to see that this approach nicely solves the problems posed by other methods:

4. New Expressions

Additionally, we propose expanding this functionality to new expressions by returning:

We considered alternatives for returning the size.

For new[] expressions, we considered alternatively initializing the returned (sz / sizeof(T)) number of elements.

5. Naming

The library support type is named sized_allocation_t, based on LEWG’s naming suggestions and for consistency (in choice of _t) with the other allocation library support types (size_t, align_val_t, nothrow_t, etc.). We expect this to be spelled rarely.

Its members are ptr and bytes. ptr for the memory returned is an intuitive name. bytes conveys units in its name (bytes). Since new expressions can return additional storage under this proposal, this distinguishes it from returning the amount of storage available for whole objects. This contrasts with [P0401R4], which does return storage for whole objects and whose support type’s field is named count.

6. Related work

[P0401R4] considers this problem at the level of the Allocator concept. Ironically, the lack of the above API was one significant problem in its original revision: how could an implementation of std::allocator provide the requested feedback in a way that would work with any underlying malloc implementation?

7. History

7.1. R7 → R8

LEWG reviewed [P0901R6] via telecon.

Poll: Send P0901R6, after changing p to ptr and n to bytes in sized_allocation_t, to electronic ballot to be forwarded to CWG.
SF F N A SA
4 11 0 0 0

7.2. R6 → R7

7.3. R5 → R6

LEWG reviewed [P0901R5] library support types at [Prague].

LEWG took an approval poll of different names

5 sized_ptr_t
2 sized_ptr
3 memblock_t
7 memblock
0 memspan_t
2 memspan
8 alloc_size_t
4 alloc_size
5 alloc_t
0 alloc
13 sized_allocation_t
17 sized_allocation
0 alloc_span_t
2 alloc_span
9 allocation_t
9 allocation
16 alloc_result_t
14 alloc_result
12 allocation_result_t
16 allocation_result

Based on this poll, LEWG directed the paper authors to consider sized_allocation, alloc_result_t, and allocation_result.

LEWG additionally polled on whether the type should have a _t suffix.

SF F N A SA
0 6 7 8 1

Based on this feedback, the authors have chosen sized_allocation_t, based on LEWG’s naming suggestions and for consistency (in choice of _t) with the other allocation library support types (size_t, align_val_t, nothrow_t, etc.). We expect this to be spelled rarely.

7.4. R4 → R5

EWG reviewed P0901R4 at [Cologne].

Poll: P0901R4 as presented, forward to LEWG for C++23, not C++20.
SF F N A SA
2 11 14 2 0

7.5. R3 → R4

7.6. R2 → R3

7.7. R1 → R2

Applied feedback from San Diego Mailing

7.8. R0 → R1

Applied feedback from [JacksonvilleMinutes].

Additionally, a discussion of § 3.3 Interaction with Sized Delete has been added.

References

Informative References

[Cologne]
Size feedback in operator new. 2019-09-17. URL: http://wiki.edg.com/bin/view/Wg21cologne2019/P0901R4-EWG
[JacksonvilleMinutes]
Jacksonville 2018 minutes. 2018-03-15. URL: http://wiki.edg.com/bin/view/Wg21jacksonville2018/P0901R0-Jax18
[JEMALLOC]
jemalloc(3) - Linux man page. URL: http://jemalloc.net/jemalloc.3.html
[MallocExtension]
TCMalloc Malloc Extensions. URL: https://github.com/google/tcmalloc/blob/master/tcmalloc/malloc_extension.h
[MicrosoftABI]
Return Values. 2016-11-03. URL: https://docs.microsoft.com/en-us/cpp/build/return-values-cpp
[N4868]
Working Draft, Standard for Programming Language C++. 2020-10-18. URL: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4868.pdf
[P0401R1]
Jonathan Wakely; Chris Kennelly. Providing size feedback in the Allocator interface. 2019-06-11. URL: http://wg21.link/P0401R1
[P0401R4]
Jonathan Wakely; Chris Kennelly. Providing size feedback in the Allocator interface. 2020-11-14. URL: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0401r4.html
[P0901R5]
Size feedback in operator new. 2019-10-06. URL: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0901r5.html
[P0901R6]
Size feedback in operator new. 2020-03-01. URL: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0901r6.html
[Prague]
LEWG Prague Minutes for P901R5. 2020-02-10. URL: http://wiki.edg.com/bin/view/Wg21prague/P0901
[SizedDelete]
Lawrence Crowl. C++ Sized Deallocation. URL: http://wg21.link/n3536
[SMALLOCX]
Add experimental API to support P0901r0. URL: https://github.com/jemalloc/jemalloc/pull/1270
[TCMalloc]
TCMalloc. URL: https://github.com/google/tcmalloc