... there is a family of types for which programmers[P0593R2]
assume they do not need to explicitly create objects ...
1. Summary
[P0593R2] Gives us defined behavior for creating objects from raw, application initialized, memory. These implicit lifetime types include:
- Scalar types
- Array types (with any element type)
- Class types with a trivial destructor and a trivial constructor (of any kind)
When working with implicit lifetime types applications may be able to bulk
initialize objects without value initialization (zeroing) or explicit calls to
constructors. For example, the application may read objects from the network, or
it may stamp out a pattern of objects, etc.. This proposal enables these
optimizations for allocator aware containers, and extends
to allow
them. Prior work in this area has relied on trivial default construction.
However, [P0593R2] allows class types with non-trivial default constructors
that have trivial move or copy constructors to be implicit lifetime types. This
proposal includes this alternate set of types. Default construction typically
leaves indeterminate values in elements controlled by the
, exposing it
to undefined behavior until the application initializes all such elements. This
proposal avoids this "trap state" for the container by not incorporating
elements into the container until they have been initialized.
We have proposed related changes to
in a separate paper [P1072R1].
1.1. Approach
Allocator aware containers must cooperate with their allocator for object construction and destruction. Working with the broader set of implicit lifetime types requires that the container use a two step interaction with the application. First, the container exposes memory that the application initializes. Second, the application tells the container how many objects were initialized. The container can then tell the allocator about the newly created objects.
References and wording are relative to [N4762].
Starting with [allocator.requirements] (Table 33), we add:
-
- This expression informs the allocator post facto of an object of implicit lifetime type that has been initialized and implicitly created by the application. This member function, if provided, does not participate in overload resolution unlessa . implicit_construct ( c )
is an implicit lifetime type. By default it does nothing.C
Then in [allocator.traits] we add a new optional member:
-
- This member function:implicit_construct ( Alloc & a , T * p ) - Calls
if it is well-formed, otherwise ...a . implicit_construct ( p ) - Does nothing if T is an implicit lifetime type and
is not well-formed, otherwise ...a . construct ( p ) - Does not participate in overload resolution.
(The intent is to leave the meaning of allocators which define
unchanged, but to allow those that don’t, including the default allocator, to supportconstruct ( T * )
implicitly.)implicit_construct - Calls
Finally, in [vector] we add two member functions:
-
- Returns a pointer to storage that would back elements [T * uninitialized_data ()
,size ()
). Note that this storage may contain indeterminate values ([dcl.init]p12). The application may initialize the memory by casting tocapacity ()
and then tovoid *
,char *
, orunsigned char *
. ([basic.life]p6.4).std :: byte * Does not participate in overload resolution if
is not well-formed.allocator_traits < Allocator >:: rebind_traits < U >:: implicit_construct ( U * ) -
- Appendsinsert_from_capacity ( size_type n )
elements from capacity. The application must have initialized the storage backing these elements otherwise the behavior is undefined.n Does not participate in overload resolution if
is not well-formed.allocator_traits < Allocator >:: rebind_traits < U >:: implicit_construct ( U * )
2. Revision History
2.1. R0 → R1
- Rebased on [N4762].
- Updated reference to [P1072].
3. Motivating examples
The extra overhead described in the examples below is often small, yet the optimization can be significant in performance critical execution paths.
Casts, using declarations, and other details have been elided to keep the examples simple.
3.1. Example: reading from the network
The current
interface forces a copy when reading objects from the
network, or a file, etc.. (
keeps the example simple, but the
principle applies to user defined implicit lifetime types as well.):
using ByteVec = vector < byte > ; class Socket { public : size_t Read ( byte * buf , size_t size ); ... }; unsigned ReadSome ( ByteVec * out , Socket & socket ) { byte buf [ kBufferSize ]; auto size = socket . Read ( & buf [ 0 ], kBufferSize ); out -> insert ( out . end (), & buff [ 0 ], & buff [ 0 ] + size ); // BAD: Copies. return size ; }
With the changes proposed in this paper, the above example would be optimized as:
unsigned ReadSome ( ByteVec * out , Socket & socket ) { out . reserve ( out . size () + kBufferSize ); auto size = socket . Read ( out -> uninitialized_data (), // GOOD: No copies. data . capacity () - data . size ()); out -> insert_from_capacity ( size ); // GOOD: No-op. return size ; }
Note:
bypasses
's normal growth algorithm which can
lead to unnecessary allocations. Resolving that issue is out of scope for
this proposal.
3.2. Example: stamping a pattern
For a second example, consider stamping a repeating pattern of elements.
's interface offers two options, neither optimal:
- Call
and write directly into the container. However, this value initializes elements, typically writing zeros:resize using IntVec = vector < int > ; void AppendPattern ( IntVec & out , span < const int > pattern , unsigned count ) { auto start = out . size (); auto step = pattern . size (); out . resize ( start + step * count ); // BAD: Zeros. for ( auto cur = out . begin () + start ; cur < out . end (); cur += step ) { memcpy ( &* cur , pattern . data (), // GOOD: No bookkeeping. step * sizeof ( int ))); } } - Call
and thenreserve
in a loop. However, this incurs bookkeeping overhead in each insert:insert void AppendPattern ( IntVec & out , span < const int > pattern , unsigned count ) { out . reserve ( out . size () + pattern . size () * count ); // GOOD: No zeros. for ( unsigned i = 0 ; i < count ; ++ i ) { out . insert ( out . end (), pattern . begin (), // BAD: Bookkeeping. pattern . end ()); } }
With the changes proposed in this paper the above example would be optimized as:
void AppendPattern ( IntVec & out , span < const int > pattern , unsigned count ) { auto step = pattern . size (); auto total = step * count ; out . reserve ( out . size () + total ); // GOOD: No zeros. int * cur = out . uninitialized_data (); int * end = cur + total ; for (; cur < end ; cur += step ) { memcpy ( cur , pattern . data (), // GOOD: No bookkeeping. step * sizeof ( int ))); } out . insert_from_capacity ( total ); // GOOD: No-op. }
4. Related work
As mentioned above, all related work to date has used default initialization. This is the first proposal that uses implicit lifetime types.
4.1. Google basic_string
Google has hacked their internal
implementation to
provide a related
API. They have measured performance
improvements (that are not public) that justify maintaining this extension.
Google’s Abseil open source library provides hooks for other users that want to independently apply the same hack. See: https://github.com/abseil/abseil-cpp/blob/master/absl/strings/internal/resize_uninitialized.h
Google’s Protocol Buffers open source library takes advantage of Abseil’s hooks to improve performance. See: https://github.com/google/protobuf/blob/master/src/google/protobuf/stubs/stl_util.h#L61
The Snappy compression library has hooks for a similar hack:
https://github.com/google/snappy/search?q=STLStringResizeUninitialized
Tensor Flow does too:
https://github.com/tensorflow/tensorflow/search?q=STLStringResizeUninitialized
4.2. Boost containers
Boost provides a related optimization for vector-like containers, introduced in [SVN r85964] by Ion Gaztañaga.
Default initialization for vector-like containers Complexity guarantees for associative container constructors and ordered input ranges Added benchmark for associative containers Fixes #9166
E.g.: boost/container/vector.hpp:
//! <b>Effects</b>: Constructs a vector that will use a copy of allocator a //! and inserts n default initialized values. //! //! <b>Throws</b>: If allocator_type’s allocation //! throws or T’s default initialization throws. //! //! <b>Complexity</b>: Linear to n. //! //! <b>Note</b>: Non-standard extension vector ( size_type n , default_init_t ); vector ( size_type n , default_init_t , const allocator_type & a ) ... void resize ( size_type new_size , default_init_t ); ...
These optimizations are also supported in Boost Container’s
,
,
,
, and
.
4.3. MongoDB
MongoDB has a string builder that could have been implemented in terms of
as a return value. However, as explained by Mathias Stearn, zero
initialization was measured and was too costly. Instead a custom string builder
type is used:
E.g.: https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/unicode/string.h
/** * Strips diacritics and case-folds the utf8 input string, as needed to support * options. * * The options field specifies what operations to *skip*, so kCaseSensitive * means to skip case folding and kDiacriticSensitive means to skip diacritic * striping. If both flags are specified, the input utf8 StringData is returned * directly without any processing or copying. * * If processing is performed, the returned StringData will be placed in buffer. * buffer’s contents (if any) will be replaced. Since we may return the input * unmodified the returned StringData’s lifetime is the shorter of the input * utf8 and the next modification to buffer. The input utf8 must not point into * buffer. */ static StringData caseFoldAndStripDiacritics ( StackBufBuilder * buffer , StringData utf8 , SubstrMatchOptions options , CaseFoldMode mode );
(Comments re-wrapped.)
4.4. VMware string builders
VMware has an internal string builder implementation that avoids
due, in part, to
’s zero-writing
behavior. This is similar in spirit to the MongoDB example above.
4.5. Discussion on std-proposals
This topic was discussed in 2013 on std-proposals in a thread titled "Add
basic_string::resize_uninitialized (or a similar mechanism)":
https://groups.google.com/a/isocpp.org/forum/#!topic/std-proposals/XIO4KbBTxl0
4.6. P1020R0
See also [P1020R0] "Smart pointer creation functions for default initialization".5. Alternatives Considered
5.1. Free function std :: implicit_construct
This proposal adds an optional member to
. In [P0653R2], which added the optional member
,
the author also added a free function
. The free function
provides uniform access to user specializations of
that do not
define
. A free function is not necessary or applicable here because
containers need to directly test for well-formedness of
to
correctly enable / disable methods in their interface.
Also consider that
is designed to provide uniform operations
over custom allocators. [allocator.requirements]p2 does allow for user
specialization of
, but do users need this additional
customization point? Should
have been named
? Should we discourage or deprecate user specialization?
5.2. insert_from_capacity
What name and return type should we use? Return type | Name | Comments |
---|---|---|
|
| as proposed |
|
| like . does not have .
|
|
| but the elements are initialized |
|
| implies / is associated with reallocation, but this operation
cannot reallocate
|
| , , etc.
| some new term... |
5.3. implicit_construct
What name should we use? Return type | Name | Comments |
---|---|---|
|
| as proposed |
|
| From [P0593R2]. |
5.4. uninitialized_data
What name and return type should we use?
Return type | Name | Comments |
---|---|---|
|
| as proposed |
|
| Nevin commented: the current proposal makes a salient property
of and that that may be unsirable from the LEWG point of view.
Mathias suggested returning .
|
6. Open Issues
-
Q: Can
throw?implicit_construct A: Yes.
- It would be useful to add a discussion of the tradeoffs between implicit lifetime types and default initialization.
7. Wording
7.1. [allocator.requirements]
In [allocator.requirements] Table 33 add the
expression:
...
Expression Return type Assertion/note
pre-/post-conditionDefault ... a.construct(c, args)
(not used) Effects: Constructs an object of type C
atc
::new((void*)c) C(forward
(args)...) a.implicit_construct(c)
(not used) Effects: Informs the allocator that an object of type C
has been implicitly created atc
. Only participates in overload resolution ifC
is an implicit lifetime type.Does nothing a.destroy(c)
(not used) Effects: Destroys the object at c
c->~C()
And then in [allocator.requirements]p9 add references to
:
- An allocator may constrain the types on which it can be instantiated and the arguments for which its
construct
,implicit_construct
, ordestroy
members may be called. If a type cannot be used with a particular allocator, the allocator class or the call toconstruct
,implicit_construct
, ordestroy
may fail to instantiate.
7.2. [allocator.traits]
Note: Following the pattern of
we do not add
to the synopsis in [allocator.traits]. Placing it there
would make it a required member of
which would break existing
user specializations.
Add a new § 19.10.9.3:
19.10.9.3 Allocator traits optional static member functions [allocator.traits.optmem]
template<class T, class... Args> static void implicit_construct(Alloc& a, T* p);
- Effects: Calls
a.implicit_construct(p)
if that call is well-formed; otherwise, does nothing ifT
is an implicit lifetime type anda.construct(p)
is not well-formed; otherwise, does not participate in overload resolution.
7.3. [container.requirements.general]
In [container.requirements.general]p3 add references to
:
- For the components affected by this subclause that declare an
allocator_type
, objects stored in these components shall be constructed using the functionallocator_traits<allocator_type>::rebind_traits<U>::construct
orallocator_traits<allocator_type>::rebind_traits<U>::implicit_construct
and destroyed using the functionallocator_traits<allocator_type>::rebind_traits<U>::destroy
(19.10.9.2), whereU
is eitherallocator_type::value_type
or an internal type used by the container. These functions are called only for the container’s element type, not for internal types used by the container. [Note: This means, for example, that a node-based container might need to construct nodes containing aligned buffers and call construct to place the element into the buffer. — end note ]
7.4. [vector]
In [vector.overview] add the declaration for
:
namespace std { template<class T, class Allocator = allocator<T>> class vector { ... // 21.3.11.4, data access T* data() noexcept; const T* data() const noexcept; T* uninitialized_data() noexcept; // 21.3.11.5, modifiers template<class... Args> reference emplace_back(Args&&... args); void push_back(const T& x); void push_back(T&& x); void pop_back(); template<class... Args> iterator emplace(const_iterator position, Args&&... args); iterator insert(const_iterator position, const T& x); iterator insert(const_iterator position, T&& x); iterator insert(const_iterator position, size_type n, const T& x); template<class InputIterator> iterator insert(const_iterator position, InputIterator first, InputIterator last); iterator insert(const_iterator position, initializer_list<T> il); iterator insert_from_capacity(size_type n); iterator erase(const_iterator position); iterator erase(const_iterator first, const_iterator last); ...
In [vector.data] add new p3-5:
T* data() noexcept; const T* data() const noexcept;
- Returns: A pointer such that
[data(), data() + size())
is a valid range. For a non-empty vector,data() == addressof(front())
.- Complexity: Constant time.
T* uninitialized_data() noexcept;
- Returns: A pointer to uninitialized storage that would hold elements in the range
[size(), capacity())
. [Note: This storage may be initialized through a pointer obtained by castingT*
tovoid*
and then tochar*
,unsigned char*
, orstd::byte*
. ([basic.life]p6.4). - end note ]- Remarks: This member function does not participate in overload resolution if
allocator_traits<Allocator>::rebind_traits<U>::implicit_construct(U *)
is not well-formed.- Complexity: Constant time.
In [vector.modifiers] add new p3-6:
- Complexity: The complexity is linear in the number of elements inserted plus the distance to the end of the vector.
iterator insert_from_capacity(size_type n);
- Requires: -
n <= capacity() - size()
.- Remarks: - Appends
elements by implicitly creating them from capacity. The application must have initialized the storage backing these elements otherwise the behavior is undefined. This member function does not participate in overload resolution if
n allocator_traits<Allocator>::rebind_traits<U>::implicit_construct(U *)
is not well-formed.- Returns: - an iterator to the first element inserted, otherwise
end()
.- Complexity: - The complexity is linear in the number of elements inserted. [Note: For some allocators, including the default allocator, actual complexity is constant time. - end note ]
iterator erase(const_iterator position); iterator erase(const_iterator first, const_iterator last); void pop_back();
- Effects: Invalidates iterators and references at or after the point of the erase.
- ...
8. Acknowledgments
Special thanks go to Glen Fernandes and Mathias Stearn for early comments and help with wording. Thanks also for early comments from Nevin Liber, Agustín Bergé, and Arthur O’Dwyer provided guidance on object lifetime and allocator interactions.