1. Motivation
Throughout this document, "malloc" refers to the implementation of
both as fairly standard practice for implementers, and to
make clear the distinction between the interface and the implementation.
Everyone’s favorite dynamic data structure,
, allocates memory with
code that looks something like this (with many details, like
,
templating for non
, and exception safety, elided):
void vector::reserve ( size_t new_cap ) { if ( capacity_ >= new_cap ) return ; const size_t bytes = new_cap ; void * newp = :: operator new ( new_cap ); memcpy ( newp , ptr_ , capacity_ ); ptr_ = newp ; capacity_ = bytes ; }
Consider the sequence of calls:
std :: vector < char > v ; v . reserve ( 37 ); // ... v . reserve ( 38 );
All reasonable implementations of malloc round sizes, both for alignment requirements and improved performance. It is extremely unlikely that malloc provided us exactly 37 bytes. We do not need to invoke the allocator here...except that we don’t know that for sure, and to use the 38th byte would be undefined behavior. We would like that 38th byte to be usable without a roundtrip through the allocator.
This paper proposes an API making it safe to use that byte, and explores many of the design choices (not all of which are obvious without implementation experience.)
1.1. nallocx: not as awesome as it looks
The simplest way to help here is to provide an informative API answering the
question "If I ask for N bytes, how many do I actually get?" [jemalloc] and [TCMalloc] call this
. We can then use that hint as a smarter
parameter for operator new:
void vector::reserve ( size_t new_cap ) { if ( capacity_ >= new_cap ) return ; const size_t bytes = nallocx ( new_cap , 0 ); void * newp = :: operator new ( bytes ); memcpy ( newp , ptr_ , capacity_ ); ptr_ = newp ; capacity_ = bytes ; }
This is a good start, and does in fact work to allow vector and friends to use the true extent of returned objects. But there are three significant problems with this approach.
1.1.1. nallocx must give a conservative answer
While many allocators have a deterministic map from requested size to allocated
size, it is by no means guaranteed that all do. Presumably they can make a
reasonably good guess, but if two calls to
might return 64
and 128 bytes, we’d definitely rather know the right answer, not a conservative
approximation.
1.1.2. nallocx duplicates work
Allocation is often a crucial limit on performance. Most allocators compute
the returned size of an object as part of fulfilling that allocation...but if
we make a second call to
, we duplicate all that communication, and
also the overhead of the function call.
1.1.3. nallocx hides information from malloc
The biggest problem (for the authors) is that
discards information
malloc finds valuable (the user’s intended allocation size.) That is: in our
running example, malloc normally knows that the user wants 37 bytes (then 38),
but with
, we will only ever be told that they want 40 (or 48, or
whatever
returns.)
Google’s malloc implementation ([TCMalloc]) rounds requests to one of a small (<100) number of sizeclasses: we maintain local caches of appropriately sized objects, and cannot do this for every possible size of object. Originally, these sizeclasses were just reasonably evenly spaced among the range they cover. Since then, we have used extensive telemetry on allocator use in the wild to tune these choices. In particular, as we know (approximately) how many objects of any given size are requested, we can solve a fairly simple optimization problem to minimize the total internal fragmentation for any choice of N sizeclasses.
Widespread use of
breaks this. By the time TCMalloc’s telemetry sees
a request that was hinted by nallocx, to the best of our knowledge the user wants exactly as many bytes as we currently provide them. If a huge number
of callers wanted 40 bytes but were currently getting 48, we’d lose the ability
to know that and optimize for it.
Note that we can’t take the same telemetry from
calls: we have no
idea how many times the resulting hint will be used (we might not allocate at
all, or we might cache the result and make a million allocations guided by it.)
We would also lose important information in the stack traces we collect from
allocation sites.
Optimization guided by malloc telemetry has been one of our most effective
tools in improving allocator performance. It is important that we fix this
issue without losing the ground truth of what a caller of
wants.
These three issues explain why we don’t believe
is a sufficient
solution here.
1.2. after allocation is too late
Another obvious suggestion is to add a way to inspect the size of an object
returned by
. Most mallocs provide a way to do this; [jemalloc] calls it
. Vector would look like:
void vector::reserve ( size_t new_cap ) { if ( capacity_ >= new_cap ) return ; void * newp = :: operator new ( new_cap ); const size_t bytes = sallocx ( newp ); memcpy ( newp , ptr_ , capacity_ ); ptr_ = newp ; capacity_ = bytes ; }
This is worse than nallocx. It fixes the non-constant size problem, and avoids
a feedback loop, but the performance issue is worse (this is the major issue fixed by [SizedDelete]!), and what’s worse, the above code invokes UB as
soon as we touch byte
. We could in principle change the standard,
but this would be an implementation nightmare.
1.3. realloc’s day has passed
We should also quickly examine why the classic C API
is insufficient.
void vector::reserve ( size_t new_cap ) { if ( capacity_ >= new_cap ) return ; ptr_ = realloc ( ptr_ , new_cap ); capacity_ = new_cap ; }
In principle a realloc from 37 to 38 bytes wouldn’t carry the full cost of allocation. But it’s dramatically more expensive than making no call at all. What’s more, there are a number of more complicated dynamic data structures that store variable-sized chunks of data but are never actually resized. These data structures still deserve the right to use all the memory they’re paying for.
Furthermore,
's original purpose was not to allow the use of more bytes
the caller already had, but to (hopefully) extend an allocation in place to
adjacent free space. In a classic malloc implementation this would actually be
possible...but most modern allocators use variants of slab allocation. Even if
the 65th byte in a 64-byte allocation isn’t in use, they cannot be combined into
a single object; it’s almost certainly required to be used for the next 64-byte
allocation. In the modern world,
serves little purpose.
2. Proposal
2.1. Allocation functions
We propose adding new overloads of
(so-called "size-returning allocation
functions") that directly inform the user of the size available to them. C++
makes
replaceable (15.5.4.6), allowing a program to provide
its own version different from the standard library’s implementation.
We note immediately that we are deliberately not providing this facility as an unrelated
free function, nor leave it as an implementation detail of
; see § 5.2 Why operator new? for a detailed discussion.
The new overloads are selected by a tag argument of type
,
usually provided as the value
. (This is analogous to
/
and to
.)
The new overloads return a new type,
, which
stores a void pointer that corresponds to the pointer returned by the
existing allocation functions, as well as the new size feedback
information.
2.2. New-expressions
In the previous revision, R10, we had proposed to make new-expressions be able to use the new
allocation functions, so that a placement new-expression of the form
would have a novel return type that includes the size feedback. However, during EWG discussion
it became clear that that is not a useful facility; see § 5.3 New expressions for details.
Consequently, we are not proposing to expose size-returning allocation functions to new-expressions. We propose that a hypothetical (placement) new-expression that would find a size-returning allocation function during lookup is ill-formed. (No such expressions can currently exist.)
3. Proposed Wording
We propose wording, relative to [N4917]:
-
Amend [basic.stc.dynamic.general], paragraph 2:
The library provides default definitions for the global allocation and deallocation functions. Some global allocation and deallocation functions are replaceable ([new.delete]). A C++ program shall provide at most one definition of a replaceable allocation or deallocation function. Any such function definition replaces the default version provided in the library ([replacement.functions]). The following allocation and deallocation functions ([support.dynamic]) are implicitly declared in global scope in each translation unit of a program.[[ nodiscard ]] void * operator new ( std :: size_t ); [[ nodiscard ]] void * operator new ( std :: size_t , std :: align_val_t ); [[ nodiscard ]] std :: sized_allocation_t operator new ( std :: size_t , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t operator new ( std :: size_t , std :: align_val_t , std :: return_size_t ); void operator delete ( void * ) noexcept ; void operator delete ( void * , std :: size_t ) noexcept ; void operator delete ( void * , std :: align_val_t ) noexcept ; void operator delete ( void * , std :: size_t , std :: align_val_t ) noexcept ; [[ nodiscard ]] void * operator new []( std :: size_t ); [[ nodiscard ]] void * operator new []( std :: size_t , std :: align_val_t ); [[ nodiscard ]] std :: sized_allocation_t operator new []( std :: size_t , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t operator new []( std :: size_t , std :: align_val_t , std :: return_size_t ); void operator delete []( void * ) noexcept ; void operator delete []( void * , std :: size_t ) noexcept ; void operator delete []( void * , std :: align_val_t ) noexcept ; void operator delete []( void * , std :: size_t , std :: align_val_t ) noexcept ; These implicit declarations introduce only the function names
,
operator new ,
operator new [] , and
operator delete .
operator delete [] [Note 2: The implicit declarations do not introduce the names
,
std ,
std :: size_t ,
std :: align_val_t , or any other names that the library uses to declare these names. Thus, a new-expression, delete-expression, or function call that refers to one of these functions without importing or including the header
std :: return_size_t ([new.syn]) or importing a C++ library module ([std.modules]) is well-formed. However, referring to
< new > or
std or
std :: size_t or
std :: align_val_t is ill-formed unless a standard library declaration ([...]) of that name precedes ([basic.lookup.general]) the use of that name. — end note]
std :: return_size_t Allocation and/or deallocation functions may also be declared and defined for any class ([class.free]).
-
Amend [basic.stc.dynamic.allocation], paragraph 1:
An allocation function that is not a class member function shall belong to the global scope and not have a name with internal linkage. The return type shall be([new.syn]) if the allocation function is a size-returning allocation function and
std :: sized_allocation_t otherwise . The first parameter shall have type
void * ([support.types]). The first parameter shall not have an associated default argument ([dcl.fct.default]). The value of the first parameter is interpreted as the requested size of the allocation. An allocation function is a size-returning allocation function if it has a second parameter of type
std :: size_t , or it has a second parameter of type
std :: return_size_t and a third parameter of type
std :: align_val_t . An allocation function can be a function template. Such a template shall declare its return type and first parameter as specified above (that is, template parameter types shall not be used in the return type and first parameter type). Allocation function templates shall have two or more parameters as specified above .
std :: return_size_t
-
Insert [basic.stc.dynamic.allocation], after paragraph 1:
A size-returning allocation function returns a value with a member subobject that is a pointer suitable to hold the address of potentially allocated storage, and therefore all allocation functions are said to return a pointer value.
-
Amend [replacement.functions], paragraph 2:
operator new ( std :: size_t ) operator new ( std :: size_t , std :: align_val_t ) operator new ( std :: size_t , const std :: nothrow_t & ) operator new ( std :: size_t , std :: align_val_t , const std :: nothrow_t & ) operator new ( std :: size_t size , std :: return_size_t ) operator new ( std :: size_t size , std :: align_val_t al , std :: return_size_t ) operator new ( std :: size_t size , std :: return_size_t , std :: nothrow_t ) operator new ( std :: size_t size , std :: align_val_t al , std :: return_size_t , std :: nothrow_t ) [...] operator new []( std :: size_t ) operator new []( std :: size_t , std :: align_val_t ) operator new []( std :: size_t , const std :: nothrow_t & ) operator new []( std :: size_t , std :: align_val_t , const std :: nothrow_t & ) operator new []( std :: size_t size , std :: return_size_t ) operator new []( std :: size_t size , std :: align_val_t al , std :: return_size_t ) operator new []( std :: size_t size , std :: return_size_t , std :: nothrow_t ) operator new []( std :: size_t size , std :: align_val_t al , std :: return_size_t , std :: nothrow_t ) [...]
-
Amend header
synopsis in [new.syn]:< new >
namespace std { class bad_alloc ; class bad_array_new_length ; struct destroying_delete_t { explicit destroying_delete_t () = default ; }; inline constexpr destroying_delete_t destroying_delete {}; // global operator new control struct return_size_t { explicit return_size_t () = default ; }; inline constexpr return_size_t return_size {}; struct sized_allocation_t { void * ptr ; size_t bytes ; }; enum class align_val_t : size_t {}; [...] } [[ nodiscard ]] void * operator new ( std :: size_t size ); [[ nodiscard ]] void * operator new ( std :: size_t size , std :: align_val_t alignment ); [[ nodiscard ]] void * operator new ( std :: size_t size , const std :: nothrow_t & ) noexcept ; [[ nodiscard ]] void * operator new ( std :: size_t size , std :: align_val_t alignment , const std :: nothrow_t & ) noexcept ; [[ nodiscard ]] std :: sized_allocation_t operator new ( std :: size_t size , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t operator new ( std :: size_t size , std :: align_val_t alignment , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t operator new ( std :: size_t size , std :: return_size_t , std :: nothrow_t ) noexcept ; [[ nodiscard ]] std :: sized_allocation_t operator new ( std :: size_t size , std :: align_val_t alignment , std :: return_size_t , std :: nothrow_t ) noexcept ; [...] [[ nodiscard ]] void * operator new []( std :: size_t size ); [[ nodiscard ]] void * operator new []( std :: size_t size , std :: align_val_t alignment ); [[ nodiscard ]] void * operator new []( std :: size_t size , const std :: nothrow_t & ) noexcept ; [[ nodiscard ]] void * operator new []( std :: size_t size , std :: align_val_t alignment , const std :: nothrow_t & ) noexcept ; [[ nodiscard ]] std :: sized_allocation_t operator new []( std :: size_t size , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t operator new []( std :: size_t size , std :: align_val_t alignment , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t operator new []( std :: size_t size , std :: return_size_t , std :: nothrow_t ) noexcept ; [[ nodiscard ]] std :: sized_allocation_t operator new []( std :: size_t size , std :: align_val_t alignment , std :: return_size_t , std :: nothrow_t ) noexcept ; [...] [[ nodiscard ]] void * operator new ( std :: size_t size , void * ptr ) noexcept ; [[ nodiscard ]] void * operator new []( std :: size_t size , void * ptr ) noexcept ; void operator delete ( void * ptr , void * ) noexcept ; void operator delete []( void * ptr , void * ) noexcept ;
-
Insert [new.delete.single], after paragraph 4:
[[ nodiscard ]] std :: sized_allocation_t :: operator new ( std :: size_t size , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t :: operator new ( std :: size_t size , std :: align_val_t alignment , std :: return_size_t );
- Effects: Same as above, except these are called by a placement version of a new-expression when a C++ program prefers the size-returning allocation function.
- Replaceable: A C++ program may define functions with either of these function signatures, and thereby displace the default versions defined by the C++ standard library.
- Required behavior: Return a
whose
sized_allocation_t member represents the address of a region of N bytes of suitably aligned storage ([basic.stc.dynamic]) for some , and whose
ptr member is N, or else throw a
bytes exception. This requirement is binding on any replacement versions of these functions.
bad_alloc - Default behavior: Returns
and
std :: sized_allocation_t { operator new ( size ), N } respectively. If a user-provided operator new is invoked directly or indirectly, N is
std :: sized_allocation_t { operator new ( size , alignment ), N } .
size
-
Insert [new.delete.single], after paragraph 9:
[[ nodiscard ]] std :: sized_allocation_t :: operator new ( std :: size_t size , std :: return_size_t , std :: nothrow_t ) noexcept ; [[ nodiscard ]] std :: sized_allocation_t :: operator new ( std :: size_t size , std :: align_val_t alignment , std :: return_size_t , std :: nothrow_t ) noexcept ;
- Effects: Same as above, except these are called by a placement version of a new-expression when a C++ program prefers the size-returning allocation function and a null pointer result as an error indication.
- Replaceable: A C++ program may define functions with either of these function signatures, and thereby displace the default versions defined by the C++ standard library.
- Required behavior: Return a
whose
sized_allocation_t member represents the address of a region of N bytes of suitably aligned storage ([basic.stc.dynamic]) for some , and whose
ptr member is N, or else return
bytes . Each of these nothrow versions of
std :: sized_allocation_t { nullptr , 0 } returns a pointer obtained as if acquired from the (possibly replaced) corresponding non-placement function. This requirement is binding on any replacement versions of these functions.
operator new - Default behavior: Returns
and
std :: sized_allocation_t { operator new ( size ), N } respectively. If a user-provided operator new is invoked directly or indirectly, N is
std :: sized_allocation_t { operator new ( size , alignment ), N } . If the call to
size throws, returns
operator new .
std :: sized_allocation_t { nullptr , 0 }
-
Amend [new.delete.single], paragraph 11:
void operator delete ( void * ptr ) noexcept ; void operator delete ( void * ptr , std :: size_t size ) noexcept ; void operator delete ( void * ptr , std :: align_val_t alignment ) noexcept ; void operator delete ( void * ptr , std :: size_t size , std :: align_val_t alignment ) noexcept ;
[...]
-
If the
parameter is not present,alignment
shall have been returned by an allocation function without an alignment parameter. If present, theptr
argument shall equal the alignment argument passed to the allocation function that returnedalignment
. If present, theptr
argument shall equal thesize
argument passed to the allocation function that returnedsize
, ifptr
was not allocated by a size-returning allocation function. If present, theptr
argument shall satisfy ifsize
was allocated by a size-returning allocation function, whereptr
is the size returned inbytes
andstd :: sized_allocation_t
is the size argument passed to the allocation function .requested
[...]
-
Insert [new.delete.array], after paragraph 4:
[[ nodiscard ]] std :: sized_allocation_t :: operator new []( std :: size_t size , std :: return_size_t ); [[ nodiscard ]] std :: sized_allocation_t :: operator new []( std :: size_t size , std :: align_val_t alignment , std :: return_size_t );
- Effects: Same as above, except these are called by a placement version of a new-expression when a C++ program prefers the size-returning allocation function.
- Replaceable: A C++ program may define functions with either of these function signatures, and thereby displace the default versions defined by the C++ standard library.
- Required behavior: Return a
whose
sized_allocation_t member represents the address of a region of N bytes of suitably aligned storage ([basic.stc.dynamic]) for some , and whose
ptr member is N, or else throw a
bytes exception. This requirement is binding on any replacement versions of these functions.
bad_alloc - Default behavior: Returns
and
std :: sized_allocation_t { operator new []( size ), N } respectively. If a user-provided operator new is invoked directly or indirectly, N is
std :: sized_allocation_t { operator new []( size , alignment ), N } .
size
-
Insert [new.delete.array], after paragraph 8:
[[ nodiscard ]] std :: sized_allocation_t :: operator new []( std :: size_t size , std :: return_size_t , std :: nothrow_t ) noexcept ; [[ nodiscard ]] std :: sized_allocation_t :: operator new []( std :: size_t size , std :: align_val_t alignment , std :: return_size_t , std :: nothrow_t ) noexcept ;
- Effects: Same as above, except these are called by a placement version of a new-expression when a C++ program prefers the size-returning allocation function and a null pointer result as an error indication.
- Replaceable: A C++ program may define functions with either of these function signatures, and thereby displace the default versions defined by the C++ standard library.
- Required behavior: Return a
whose
sized_allocation_t member represents the address of a region of N bytes of suitably aligned storage ([basic.stc.dynamic]) for some , and whose
ptr member is N, or else return
bytes . This requirement is binding on any replacement versions of these functions.
std :: sized_allocation_t { nullptr , 0 } - Default behavior: Returns
and
std :: sized_allocation_t { operator new []( size ), N } respectively. If a user-provided operator new is invoked directly or indirectly, N is
std :: sized_allocation_t { operator new []( size , alignment ), N } . If the call to
size throws, returns
operator new [] .
std :: sized_allocation_t { nullptr , 0 }
-
Amend [new.delete.array], paragraph 10:
void operator delete []( void * ptr ) noexcept ; void operator delete []( void * ptr , std :: size_t size ) noexcept ; void operator delete []( void * ptr , std :: align_val_t alignment ) noexcept ; void operator delete []( void * ptr , std :: size_t size , std :: align_val_t alignment ) noexcept ;
[...]
-
If the
parameter is not present,alignment
shall have been returned by an allocation function without anptr
parameter. If present, thealignment
argument shall equal thealignment
argument passed to the allocation function that returnedalignment
. If present, theptr
argument shall equal thesize
argument passed to the allocation function that returnedsize
, ifptr
was not allocated by a size-returning allocation function. If present, theptr
argument shall satisfy ifsize
was allocated by a size-returning allocation function, whereptr
is the size returned inbytes
andstd :: sized_allocation_t
is the size argument passed to the allocation function .requested
[...]
-
Amend Table 19 ("Feature-test macros") [cpp.predefined]
Name Value
__cpp_size_returning_new PLACEHOLDER DATE
4. Implementation, Deployment and Performance Experience
4.1. Implementability
The fundamental size feedback logic is a shipping part of TCMalloc ([MallocExtension]), as a set of functions such as:
tcmalloc :: sized_ptr_t tcmalloc_size_returning_operator_new ( std :: size_t size );
We implemented the new
overload in Clang. This
presented no noteworthy complication or obstacle. An
whose return type is not
is novel, but entirely implementable.
Building a few large applications from Google’s codebase with the modified version of Clang showed nothing unexpected and continued working.
4.2. Deployment
We modified the libc++ standard library implementation with minimal effort to use that function. (The implementation has a convenient central point at which the underlying allocation function is called.) We were able to build a number of significant production binaries of the Google codebase without problems.
As above, building a few large applications from Google’s codebase succeeded without incident.
4.3. Performance
At Google, we modified libc++'s
to use TCMalloc’s size-returning allocation
function. This resulted in savings on heap allocations that we will describe below. In summary,
the proposed feature allows vectors and strings to grow more efficiently. While it is hard to
spot these effects in real-world benchmarks, where they are within the noise, we performed
micro-benchmarks and traced allocations to show that string allocations are reduced by about
4%. For the application deployment in Google’s production systems, we extrapolate fleet-wide
memory savings from strings of about 0.2%.
One source of inefficiencies is the interplay between malloc and the implementation of
. The libc++ implementation of
always allocates memory in multiples
of 16 bytes. The glibc malloc (ptmalloc2) allocates blocks of memory of size
24 + n × 16. For example, a call
would make
allocate a rounded-up value of 48, but malloc would allocate 56 bytes. With size
feedback,
would ask for "at least 35 bytes" and receive 40 bytes from malloc.
With TCMalloc the situation is different, but savings are still possible: its size classes are
8, 16, 32, 48, 64, followed by 8-byte increments up to 128. In this case, size feedback would
allow
to hit every size bucket instead of only every even one for strings of
length 64‒128.
TCMalloc’s size feedback is used in Google’s implementation of
and of
.
5. Alternative Designs Considered
5.1. Parameters and return types
Another signature we could use would be:
enum class return_size_t : std :: size_t {}; void * :: operator new ( std :: size_t size , std :: return_size_t & );
(and so on.) This is slightly simpler to read as a signature, but arguably worse in usage:
std :: tie ( obj . ptr , obj . size ) = :: operator new ( 37 , std :: return_size_t {}); // ...vs... // Presumably the object implementation wants to contain a size_t, // not a return_size_t. std :: return_size_t rs ; obj . ptr = :: operator new ( 37 , rs ); obj . size = rs ;
More importantly, this form is less efficient. In practice, underlying malloc
implementations provide actual definitions of
symbols which
are called like any other function. Passing a reference parameter requires us
to actually return the size via memory.
-
Linux ABIs support returning at least two scalar values in registers (even if they’re members of a trivially copyable struct) which can be dramatically more efficient.
-
The [MicrosoftABI] returns large types by pointer, but this is no worse than making the reference parameter an inherent part of the API.
Whether we use a reference parameter or a second returned value, the interpretation is the same.
5.2. Why operator new
?
Given the added complexity of the
overload set, one might ask why we want to
expose this facility as such an overload, rather than something else. Concretely, we could
provide a completely unrelated free function, or we could ask that the new behavior be exposed
via
.
Those alternatives fall short, however, since the
allocation functions have a
special place in C++ that the alternatives do not have: They are replaceable and can be chosen
by the application owner:
-
Compared to an unrelated free function: an unrelated free function would not be used by existing code, in particular, code that may be outside the user’s control. The replaceable allocation function, by contrast, can affect and benefit existing code. Moreover, it would require additional integration to make the free function interoperate with
, whereas we have an existing framework for matchingoperator delete
andoperator new
. Finally,operator delete
enjoys various special properties regarding implicit lifetimes which would need to be added explicitly to an unrelated free function.operator new -
Compared to
: The implementation ofstd :: allocator
is generally under the control of the standard library vendor, not the user. The vendor is not going to select any one particular malloc implementation in theirstd :: allocator
. In fact, the user may want to use different malloc implementations in different projects. By contrast, a replaceable allocation function opens the choice of malloc implementation to the user on a program-by-program basis. (Note thatstd :: allocator
is specified to usestd :: allocator
, and while no further details (such as choice of overload) are specified, it is reasonable to assume that a size-returning:: operator new
would be picked up byoperator new
as a matter of quality-of-implementation.)std :: allocator :: allocate_at_least
Finally, a suggestion that came up during the review of R10 was that we could aim at a lower level and propose a new
-like facility to both C and C++. While this would undoubtedly
be useful, we believe that it does not replace
. That is because
is
generally how one "obtains memory in C++"; it enjoys special properties in the language,
e.g. regarding implicit lifetimes, and also with regards to elision or merging of allocations.
As a data point, the codebase at Google uses TCMalloc and its
interface, not its
interface, which are entirely different. A significant performance gain comes from the
use of
and sized-
, compared to the
/
interface.
(The vast majority of memory allocations in that codebase come from
and not from
.) Exposing the size feedback as part of this established framework makes the performance
benefits available in ways that a less integrated facility would not easily be able to do.
5.3. New expressions
We considered previously to have "size-returning new-expressions that expose the excess
allocation space resulting from the allocation that’s done as part of the new-expression. However, it is exceedingly awkward to make use of the excess allocation space,
since the lifetime of the object(s) created by the
would be distinct from
whatever objects one later creates manually in the excess space.
The following example was given in the previous revision:
auto [ p , sz ] = new ( std :: return_size ) T [ 5 ]; for ( int i = 5 ; i < sz / sizeof ( T ); i ++ ) { new ( p [ i ]) T ; } for ( int i = 0 ; i < sz / sizeof ( T ); i ++ ) { p [ i ]. DoStuff (); } for ( int i = 5 ; i < sz / sizeof ( T ); i ++ ) { p [ i ]. ~ T (); } delete [] p ;
The example shows how the lifetime of the "requested" objects is managed by the delete expression, but the lifetime of the manually constructed excess objects has to be handled manually. This is awkward, easy to misuse, and we do not see any need for such a facility.
Non-array new-expressions would be similarly useless.
One conceivable use case, which we ultimately will not pursue, is perhaps worth describing: for
trivial types, such as
, one could imagine a chain of expressions
. These could successively return excess storage, but in addition, the compiler would be
allowed to merge the allocations, because of the special allowances made to how new-expressions (don’t) call allocation functions. This behavior is not achievable with
function calls alone.
We also note in passing that in R10
was a template, whose template
parameter would take the type used in the new-expression. There were implementer objections to
having a core language facility having to instantiate templates.
5.3.1. Which kind of "size"
We considered alternatives for returning the size.
-
We could return two pointers, the initial object and one past the end of the array (minus the array allocation overhead).
auto [ start , end ] = new ( std :: return_size ) T [ 5 ]; for ( T * p = start + 5 ; p != end ; p ++ ) { new ( p ) T ; } for ( T * p = start ; p != end ; p ++ ) { p -> DoStuff (); } for ( T * p = start + 5 ; p != end ; p ++ ) { p ->~ T (); } delete [] start ; The pair of pointers provides convenience for use with iterator-oriented algorithms. The problem we foresee is that a size-returning allocation function may not provide a size that is an appropriate multiple of
(or fail to be a multiple ofsizeof ( T )
, a possible, but unlikely scenario). This imposes a need for more extensive logic around the new-expression in handling the returned allocation size.alignof ( T ) -
We could return the size in units of
, this leads to an inconsistency between the expected usage forT
andnew
:new [] -
For
, we may only end up fitting a singlenew
into an allocator size quanta, so the extra space remains unusable. If we can fit multipleT
into a single allocator size quanta, we now have an array from what was a scalar allocation site. This cannot be foreseen by the compiler asT
is a replaceable function.:: operator new -
For
, the size in units ofnew []
can easily be derived from the returned size in bytes.T
-
-
We could pass the size in units of
or bytes to the constructor ofT
:T -
For
, this is especially useful for tail-padded arrays, but neglects default-initializednew
.T -
For
, a common use case is expected to be the allocation of arrays ofnew []
,char
, etc. The size of the overall array is irrelevant for the individual elements.int
-
-
We could return the size via a reference parameter:
std :: return_end < T > end ; T * p = new ( end ) T [ 5 ]; for ( T * p = start + 5 ; p != end ; p ++ ) { new ( p ) T ; } for ( T * p = start ; p != end ; p ++ ) { p -> DoStuff (); } for ( T * p = start + 5 ; p != end ; p ++ ) { p ->~ T (); } or, demonstrated with bytes:
std :: return_size_t size ; T * p = new ( size ) T [ 5 ]; for ( int i = 5 ; i < size / sizeof ( T ); i ++ ) { new ( p [ i ]) T ; } for ( int i = 0 ; i < size / sizeof ( T ); i ++ ) { p [ i ]. DoStuff (); } for ( int i = 5 ; i < size / sizeof ( T ); i ++ ) { p [ i ]. ~ T (); } delete [] p ; (Casts omitted for clarity.)
As discussed for
in § 3 Proposed Wording, a reference parameter poses difficulties for optimizers and involves returning the size via memory (depending on ABI).:: operator new
For
expressions, we considered alternatively initializing the returned
(
) number of elements.
-
This would avoid the need to explicitly construct / destruct the elements with the additional returned space (if any).
The new-initializer is invoked for the returned number of elements, rather than the requested number of elements. This allows
to destroy the correct number of elements (by storingdelete []
in the array allocation overhead).sz / sizeof ( T ) -
The presented proposal (leaving this space uninitialized) was chosen for consistency with
.new
6. Discussion
6.1. How many :: operator new
's?
It is unfortunate that we have so many permutations of
--eight
seems like far more than we should really need! But there really isn’t any
significant runtime cost for having them. Use of raw calls to
is relatively rare: It’s a building block for low-level libraries, allocators
([P0401R4]), and so on, so the cognitive burden on C++ users is low.
The authors have considered other alternatives to the additional overloads. At the Jacksonville meeting, EWG suggested looking at parameter packs.
-
Parameter packs do not reduce the number of symbols introduced. Implementers still need to provide implementations each of the n overloads.
-
Retrofitting parameter packs leaves us with more mangled variants. Implementers need to provide both the legacy symbols as well as the parameter pack-mangled symbols.
The authors have also considered APIs where all parameters are passed, thereby
requiring a single new overload. This adds further overhead for
implementations, as it moves compile-time decisions (is the alignment at or
below the minimum guaranteed by
) into runtime ones.
The alternative to modifying the handling of new-expressions invoking
deallocation functions (when an exception is thrown) would require additional
overloads for
/
whose sole purpose would
be to accept and discard the
.
6.2. Implementation difficulty
It’s worth reiterating that there’s a perfectly good trivial implementation of these functions:
std :: sized_allocation_t :: operator new ( std :: size_t n , std :: return_size_t ) { return { :: operator new ( n ), n }; }
Malloc implementations are free to properly override this with a more impactful definition, but this paper poses no significant difficulty for toolchain implementers.
Implementation Experience:
-
TCMalloc has developed an implementation opensourced on GitHub ([MallocExtension]). While this requires mapping from an integer size class to the true number of bytes, combining this lookup with the allocation is more efficient as we avoid recomputing the sizeclass itself (given a request) or deriving it from the object’s address.
-
jemalloc is prototyping a
function providing a C API for this functionality [smallocx].smallocx
6.3. Interaction with Sized Delete
For allocations made with
-returning
, we need to
relax
's size argument (16.6.2.1 and 16.6.2.2). For
allocations of
, the size quanta used by the allocator may not be a multiple
of
, leading to both the original and returned sizes being
unrecoverable at the time of deletion.
Consider the memory allocated by:
using T = std :: aligned_storage < 16 , 8 >:: type ; std :: vector < T > v ( 4 );
The underlying heap allocation is made with
.
-
The memory allocator may return a 72 byte object: Since there is no
such thatk
, we can’t provide that value tosizeof ( T ) * k = 72
. The only option would be storing 72 explicitly, which would be wasteful.:: operator delete ( void * , size_t ) -
The memory allocator may instead return an 80 byte object (5
's): We now cannot represent the original request when deallocating without additional storage.T
For allocations made with
std :: tie ( p , m ) = :: operator new ( n , std :: return_size_t {});
we permit
where
.
This behavior is consistent with [jemalloc]'s
, where the
deallocation size must fall between the request (
) and the actual allocated
size (
) inclusive. It is also consistent with the standard library’s
allocator interface (as amended by [P0401R4]).
6.4. Advantages
It’s easy to see that this approach nicely solves the problems posed by other methods:
-
We pay almost nothing in speed to return an actual-size parameter. For TCMalloc and jemalloc, this is typically a load from to map from sizeclass to size. This cost is strictly smaller than with
or the like, as that same translation must be performed in addition to the duplicative work previously discussed.nallocx -
We are told exactly the size we have, without risk of UB. We can avoid subsequent reallocations when growing to a buffer to an already-allocated size.
-
Allocator telemetry knows actual request sizes exactly.
6.5. Naming
The library support type is named
, based on
LEWG’s naming suggestions and for consistency (in choice of
) with the
other allocation library support types (
,
,
,
etc.). We expect this to be spelled rarely.
Its members are
and
.
for the memory returned is an
intuitive name.
conveys units in its name (
). Since new
expressions can return additional storage under this proposal, this
distinguishes it from returning the amount of storage available for whole objects. This contrasts with [P0401R4], which does return storage for whole
objects and whose support type’s field is named
.
7. Related work
[P0401R4] considers this problem at the level of the
concept.
Ironically, the lack of the above API was one significant problem in its
original revision: how could an implementation of
provide the
requested feedback in a way that would work with any underlying malloc
implementation? See also § 5.2 Why operator new?.
8. History
8.1. R10 → R11
Support for new-expressions is removed, and the class template
is changed to a non-template class with a
member.
Rationale added why the facility needs to be provided as an allocation
function, rather than some unrelated free function or as a detail of
.
Reports on implementation and deployment experience and impact on runtime allocations have been added.
8.2. R9 → R10
A detailed design of new-expressions has been added. Previously, changes to new-expressions had only been outlined at a high level. As a result, previously suggested changes to overload resolution rules for finding the allocation function (which was allowed to fall back to a non-size-returning allocation functon, with unclear consequences) have been removed. The relation between new-expressions and allocation functions is now described explicitly.
Sections are reorganized: alternative designs are now separate from a discussion of the chosen design, and an expanded section on new-expressions has been moved up before the proposed wording.
8.3. R8 → R9
CWG reviewed [P0901R8] via [CWG2021Telecon].
Wording changes have been made to the proposal based on CWG feedback and assistance from Ryan McDougall.
8.4. R7 → R8
LEWG reviewed [P0901R6] via telecon.
Poll: Send P0901R6, after changingto
p and
ptr to
n in
bytes , to electronic ballot to be forwarded to CWG.
sized_allocation_t
SF F N A SA 4 11 0 0 0
-
Applied member name feedback from LEWG to rename
's members fromsized_allocation_t
top
andptr
ton
. Added naming rationale section.bytes -
Update reference to revised [P0401R4].
-
Refined wording in proposal.
8.5. R6 → R7
-
Removed extraneous
typostd ::
8.6. R5 → R6
LEWG reviewed [P0901R5] library support types at [Prague].
-
LEWG discussed the similarity to
. Participants in the discussion were concerned about the potential library layering issue (std :: span
layers below practically everything) and that ownership is conveyed in the return value. In contrast,< new >
is non-owning.std :: span -
LEWG discussed whether the type needed a name. It needs to be named to allow users to replace
and spell the type.operator new
LEWG took an approval poll of different names
5
sized_ptr_t 2
sized_ptr 3
memblock_t 7
memblock 0
memspan_t 2
memspan 8
alloc_size_t 4
alloc_size 5
alloc_t 0
alloc 13
sized_allocation_t 17
sized_allocation 0
alloc_span_t 2
alloc_span 9
allocation_t 9
allocation 16
alloc_result_t 14
alloc_result 12
allocation_result_t 16
allocation_result
Based on this poll, LEWG directed the paper authors to consider
,
, and
.
LEWG additionally polled on whether the type should have a
suffix.
SF F N A SA 0 6 7 8 1
Based on this feedback, the authors have chosen
, based on
LEWG’s naming suggestions and for consistency (in choice of
) with the
other allocation library support types (
,
,
,
etc.). We expect this to be spelled rarely.
8.7. R4 → R5
EWG reviewed P0901R4 at [Cologne].
Poll: P0901R4 as presented, forward to LEWG for C++23, not C++20.
SF F N A SA 2 11 14 2 0
-
Fixed typos in examples.
-
Added proposed feature test macro.
-
Added LEWG audience for library support type names.
8.8. R3 → R4
-
Update reference to revised [P0401R1].
8.9. R2 → R3
-
Added proposed wording.
-
For newly added allocation functions,
is passed by value, rather than reference.std :: nothrow_t
8.10. R1 → R2
Applied feedback from San Diego Mailing
-
Moved from passing
parameter by reference to by value. For many ABIs, this is more optimizable and to the authors' knowledge, no worse on any other.std :: return_size_t -
Added rationale for not using parameter packs for this functionality.
8.11. R0 → R1
Applied feedback from [JacksonvilleMinutes].
-
Clarified in § 3 Proposed Wording the desire to leverage the existing "replacement functions" wording of the IS, particularly given the close interoperation with the existing
/:: operator new
implementations.:: operator delete -
Added a discussion of the Microsoft ABI in § 3 Proposed Wording.
-
Noted in § 6.1 How many ::operator new's? the possibility of using a parameter pack.
-
Added a proposal for § 2.2 New-expressions, as requested by EWG.
Additionally, a discussion of § 6.3 Interaction with Sized Delete has been added.