Document number | P0401R0 |
Date | 2015-07-08 |
Project | Programming Language C++, Library Evolution Working Group |
Reply-to | Jonathan Wakely <cxx@kayari.org> |
Despite all the changes to allocators in each revision of the C++ standard, they still have a very limited interface for their main function: allocating memory.
This proposal offers some ideas for new allocate()
overloads to help
specific use cases.
Many allocators will only allocate fixed-size chunks of memory, rounding up
requests for smaller sizes to the nearest chunk size. Because the Allocator
interface doesn't provide any means for an allocator to expose this information
to code using it, this can lead to very inefficient memory usage. For example a
vector
that allocates 36 bytes might result in a 64-byte chunk being
allocated, but the vector only uses the 36 bytes it asked for, wasting 43% of
the memory. If the vector then grows such that it needs 40 bytes it will ask
the allocator for it and be given a second 64-byte chunk, even though the
existing one was large enough already.
This could be addressed by adding a new optional overload of allocate
that
returns the size that was actually allocated e.g.
pointer allocate(size_type n, size_type& result, const_void_pointer hint = {});
This would allow allocators to inform the container when they have
over-allocated and for containers to use the additional capacity.
std::vector
and std::string
would use the result
parameter as their new
capacity and so could avoid a re-allocation if the size grows beyond the
originaly requested capacity but would still fit in the actually-allocated
capacity.
The proposal in Respect vector::reserve(request) Relative to Reallocation (N4524)
would not forbid std::vector::reserve(size_type)
from taking advantage of
this new interface.
All that would be required is for the allocator to guarantee that it won't
over-allocate optimistically, i.e. it will only return a larger size than
requested when doing so is an unavoidable consequence of how it obtains
memory.
If the allocator is incapable of reserving fewer than N bytes when requested
to allocate n (where N >= n) then it can't meet the "guarantee" that N4524
asks for anyway. Informing the vector
of that additional amount has no
downside (except possibly users being confused).
So to meet the aim of N4524, vector::reserve()
could guarantee to only ask
for exactly that number of bytes, and std::allocator
could guarantee not to
over-allocate on the off-chance it might be used (which still allows returning
a larger size if that's what really gets allocated anyay). Custom allocators
should be allowed to over-allocate as much as they like though. If users want a
guarantee that vector::reserve()
doesn't waste memory they should be sure to
use an allocator that won't waste memory. That should provide a way to please
everyone, because the decision of whether to over-allocate aggressively will be
made by the allocator (where such decisions belong), and vector's behaviour can
be customized by using a different allocator.
An alternative that would address the same problem, at least partially,
would be to add a realloc
-like function as proposed by N3495.
As proposed that is a very different API to std::realloc
which is also
responsible for "moving" data from the old storage to the new, but that
should be done by the container not the allocator. The N3495 proposal uses
an exception of type std::bad_alloc
to report when the existing memory
region cannot be resized, which might be the common case for many allocators
and for many allocations (e.g. even if the allocator supports the resize
operation, doubling the size of a vector might require a new memory chunk
anyway).
That would also require containers to add extra code to try and resize, and
then if that throws fall back to allocating and moving elements as done today.
That adds exceptions and extra branching to the normal control-flow path.
For fixed-size allocators it makes more sense, and is much simpler for
containers to adapt to, if the allocator is able to over-allocate on the
initial request and inform the caller how much memory was made available.
Note, however, that my proposal doesn't help the shrink-to-fit case where
resizing would be used to return unused memory (which not all allocators would
be able to support anyway). It also doesn't help for allocators which don't
over-allocate initially but might be able to grow the memory region without
re-allocating (e.g. because the next page in the virtual address space is
not mapped yet and so could be made available).
The default implementation in allocator_traits
would be trivial:
pointer
allocate(allocator_type& a, size_type n, size_type& result, const_void_pointer hint = {})
{
pointer p = allocate(a, n, hint);
result = n;
return p;
}
Whether std::allocator
does anything different would be unspecified and
depend on the implementation of std::allocator
. An allocator that uses
jemalloc could implement the function like so:
pointer
jemallocator<T>::allocate(size_type n, size_type& result, const void* = nullptr)
{
void* ptr = mallocx(n * sizeof(T), MALLOCX_ALIGN(alignof(T)));
result = sallocx(ptr, 0) / sizeof(T);
return static_cast<T*>(ptr);
}
I modified the libstdc++ vector
and allocator_traits
to use this new
overload, implemented the jemallocator
function shown above, and ran this
code in a loop:
{
std::vector<int, jemallocator<int>> v;
v.resize(7);
for (int i = 0; i < 1000; ++i)
v.push_back(1);
}
The test was run twice, once with allocate
calling sallocx
to obtain the
actual size that was allocated, and again with the traditional behaviour
(i.e. doing result = n
instead so that extra space was not available to the
vector).
When the allocator returned the actual size it was 15% faster and the final
capacity was 1024 after eight allocations, compared to 1792 after nine
allocations. (The difference was due to whether the first allocation
returned result=8
or result=7
, and so whether subsequent doubling in size
went up in powers of two until reaching 1024, or as multiples of 7, 14, ...).
Without the initial v.resize(7)
the capacity in both tests grew as powers of
two, and jemalloc always returned exactly what was requested, so there was no
wasted space that the vector wasn't told about. However the test that returned
the actual size was 3% slower (probably due to the overhead of calling
sallocx
). Clearly the benefits will depend on workload, but there is
potential for significant speedup with a suitable allocator. An allocator which
doesn't require a separate call to obtain the real size of the allocation might
avoid that 3% overhead.
Other microbenchmarks showed no significant difference in performance with
or without the sallocx
call. More numbers would be useful.
Although they're not standard C++ there are exception-free environments
(i.e. where everything is compiled with an option like -fno-exceptions
)
that range from the smallest systems to the largest.
shrink_to_fit()
is hard to use in such codebases, because it's supposed
to be a non-binding request, but if re-allocating fails then a naïve
implementation will terminate rather than leaving the container unchanged.
If containers obtained memory using operator new
then shrink_to_fit
could be made to use new(std::nothrow)
when exceptions are disabled, but
the Allocator API has no equivalent.
It's also hard to use allocators in custom containers which don't use
exceptions for error handling, since allocators can only report failure
via exceptions. Such containers can in theory catch bad_alloc
and report
an error some other way, but that's impossible if -fno-exceptions
is
used for compilation.
This could be addressed by adding a new optional overload of allocate
:
pointer allocate(nothrow_t, size_type n, const_void_pointer hint = {}) noexcept;
and the corresponding function in allocator_traits
, which would simply
call a.allocate(n, hint)
if the allocator doesn't support the overload,
but on failure would swallow any exception and return a null pointer.
Implementations wishing to support exception-free codebases would ensure
that std::allocator
provides an implementation with an exception-free code
path, and users could use their own allocators that define the overload.
Implementations that don't care about supporting exception-free environments don't need to do any extra work, just define the overload with the default behaviour. Avoiding exceptions would be a quality of implementation issue, not a conformance one.
For the same reasons, the over-allocating overload should also get a nothrow variant.
None at this time, the aim of this paper is only to gauge interest in these extensions, and consider the interaction with N4524.
Any new overloads are likely to involve changes to:
std::allocator
std::allocator_traits
std::memory_resource
std::polymorphic_allocator