1. Changelog
-
R0:
-
Initial revision.
-
2. Motivation and proposal
We have two largely similar active proposals for "trivial relocation" in C++: Arthur’s [P1144], and Bloomberg’s [P2786] / [P2959] / [P2967]. We also have two recent proposals for "non-trivial relocation" as a new fundamental operation: Bloomberg’s [P2839] (rejected by EWG in Varna) and Bini and Catmur’s [P2785] (still active).
A major motivation for "(trivial) relocation" is that it allows library authors to choose "fast paths"
for (trivially) relocatable types. For example,
for non-relocatable
must
use
in a loop (represented here by
), but for trivially relocatable
it can avoid
and use the moral equivalent of memcpy
(represented here by P1144
).
void erase ( iterator it ) { if constexpr ( std :: is_trivially_relocatable_v < value_type > ) { std :: destroy_at ( it ); std :: uninitialized_relocate ( it + 1 , end_ , it ); } else { std :: move ( it + 1 , end_ , it ); // operator= std :: destroy_at ( end_ - 1 ); } -- end_ ; }
The exact details of that snippet are up for debate: should the library provide a trait
? should
be guaranteed, or merely encouraged,
to use memcpy instead of move-and-destroy-in-a-loop? and so on. This paper P3055 considers all
those little details to be "out of scope"; they don’t affect the gist of this paper.
This paper concerns itself with one giant problem — anything like the above implementation
is, formally, forbidden by the current Standard! To permit the above implementation, we must
loosen the specification of
([vector.modifiers]/3)
along these lines:
constexpr iterator erase ( const_iterator position ); constexpr iterator erase ( const_iterator first , const_iterator last ); constexpr void pop_back (); 3․ Effects: Invalidates iterators and references at or after the point of the erase.
4․ Throws: Nothing unless an exception is thrown by the assignment operator or move assignment operator of
.
T 5․ Complexity:
The destructor ofLinear in the number of elements following the first erased element in the original vector.is called the number of times equal to the number of the elements erased, but the assignment operator of
T is called the number of times equal to the number of elements in the vector after the erased elements.
T
This change would be consistent with LWG’s wording choices in the modern era: it specifies the complexity only in terms of big-O, and does not directly mandate any particular implementation strategy. So for example it would become legal for the implementation to do just this:
void erase ( iterator it ) { std :: rotate ( it , it + 1 , end_ ); std :: destroy_at ( end_ -- ); }
according to
’s current wording. But furthermore, we propose to loosen
’s wording
([alg.rotate]) too:
1․ Preconditions:
and
[ first , middle ) are valid ranges. For the overloads in namespace
[ middle , last ) ,
std meets the Cpp17ValueSwappable requirements ([swappable.requirements]), and the type of
ForwardIterator meets the Cpp17MoveConstructible (Table 31) and Cpp17MoveAssignable (Table 33) requirements.
* first 2․ Effects: For each non-negative integer
, places the element from the position
i < ( last - first ) into position
first + i . [Note: This is a left rotate. — end note]
first + ( i + ( last - middle )) % ( last - first ) 3․ Returns:
for the overloads in namespace
first + ( last - middle ) .
std
for the overload in namespace
{ first + ( last - middle ), last } .
ranges 4․ Complexity:
At mostLinear inswaps.
last - first .
last - first
’s previous wording was defined in terms of "swaps." Look at the specification for
([utility.swap]):
template < class T > constexpr void swap ( T & a , T & b ) noexcept ( see below ); 1․ Constraints:
is
is_move_constructible_v < T > true
andis
is_move_assignable_v < T > true
.2․ Preconditions: Type
meets the Cpp17MoveConstructible (Table 31) and Cpp17MoveAssignable (Table 33) requirements.
T 3․ Effects: Exchanges values stored in two locations.
4․ Remarks: The exception specification is equivalent to:
.
is_nothrow_move_constructible_v < T > && is_nothrow_move_assignable_v < T >
Here the status quo is sufficiently loose to permit an efficient implementation by means of relocation,
and in fact Arthur’s libc++ fork does exactly that
(source, Godbolt). The following code omits details such as
which are present in the actual library.
void swap ( T & a , T & b ) noexcept ( is_nothrow_move_constructible_v < T > && is_nothrow_move_assignable_v < T > ) { if constexpr ( std :: is_trivially_relocatable_v < T > ) { __builtin_memswap ( & a , & b , __datasizeof ( T )); } else { T temp = std :: move ( a ); a = std :: move ( b ); b = std :: move ( temp ); } }
In other words, this paper P3055 is needed, even after P1144/P2786. If one of those papers
is adopted without P3055, then conforming implementations will still be technically forbidden to do
the optimizations we want to enable (which are already done by
,
, etc).
Vice versa, as soon as P3055 is adopted (even without P1144/P2786), a conforming implementation will
be able to use these optimizations on types it knows to be trivially relocatable
(e.g.
), even if we continue to lack a standardized vocabulary
for relocatable user-defined types.
2.1. Benchmark
Trivial relocation is an "infinitely valuable optimization"
in the same sense as C++11 move semantics. For the following type
,
mainline libc++ compiles
into 74 lines of assembly.
Arthur’s P1144 fork of libc++ compiles it into 18 lines. (Godbolt.)
struct S { S (); std :: unique_ptr < int > p ; std :: shared_ptr < int > q ; bool b ; }; void test ( S & a , S & b ) { std :: swap ( a , b ); }
This propagates back up the call-stack as high as we’re willing to let it propagate.
Arthur’s libc++ effectively applies this paper P3055’s proposed wording already,
permitting
to be implemented in terms of
and
to be implemented
in terms of
.
Operation | Mainline libc++ LOC | P1144 libc++ LOC |
---|---|---|
| 74 | 18 |
| 145 | 122 |
| 108 | 39 |
3. Breakage of existing code
Since the proposed wording is looser than the existing wording, all vendors already conform to it by definition;
we’re not asking any vendor to change their implementation for C++26. However, a vendor who takes advantage of
the new freedom may change the behavior of certain algorithms and containers for non-regular types.
Following [P2959], we’ll use
as our canonical example.
assigns-through on
assignment and swap; it never rebinds (except on initial construction). This is the polar opposite of
, which rebinds on assignment and swap, and never assigns-through. In P1144’s
terminology,
is trivially relocatable and
is not trivially relocatable.
(Godbolt.)
Recall that
is already loosely specified — it "exchanges the values" of its arguments — so
our proposal leaves the following example untouched:
int i = 1 , j = 2 ; std :: tuple < int &> a = { i }, b = { j }; std :: swap ( a , b ); assert ( i == 2 && j == 1 );
’s Effects are specified via the phrase "places the element from position x into position y";
its semantics are coupled to
only through the phrase "At most n swaps" in the Complexity element,
which we propose to remove. After that change, a vendor might reasonably construe that this old behavior...
int a [ 3 ] = { 1 , 2 , 3 }; std :: tuple < int &> ts [ 3 ] = {{ a [ 0 ]}, { a [ 1 ]}, { a [ 2 ]}}; std :: rotate ( ts , ts + 1 , ts + 3 ); assert ( a [ 0 ] == 2 && a [ 1 ] == 3 && a [ 2 ] == 1 );
...was no longer strictly mandated. They might choose to "place" the
s as-if-by relocation, rebinding
each
and leaving the array
untouched. (However, Arthur’s libc++ doesn’t change this behavior,
because Arthur’s libc++ optimizes only trivially relocatable types, and
is not trivially relocatable.)
Consider
, whose semantics are coupled to
only through wording in its Complexity element
which we propose to remove. After that change, a vendor might reasonably construe that this old behavior...
int a [ 3 ] = { 1 , 2 , 3 }; std :: vector < std :: tuple < int &>> ts = {{ a [ 0 ]}, { a [ 1 ]}, { a [ 2 ]}}; ts . erase ( ts . begin ()); assert ( a [ 0 ] == 2 && a [ 1 ] == 3 && a [ 2 ] == 3 );
...was no longer strictly mandated. They might choose to "erase the element pointed to"
([sequence.reqmts]/47) as-if-by relocation,
rebinding each
and leaving the array
untouched. As [P2959] points out, this is
exactly what happens anyway if you switch out
for
. (Again, Arthur’s libc++ doesn’t change
this behavior, because
is not trivially relocatable; but we certainly have no desire to
continue mandating the old behavior.)
4. Implementation experience
Since the proposed wording is looser than the existing wording, all vendors already conform to it by definition; we’re not asking any vendor to change their implementation for C++26.
Arthur has implemented trivial-relocation optimizations in his fork of libc++, and used it to compile both LLVM/Clang/libc++ and another large C++17 codebase. No problems were found (naturally).
5. Proposed wording
Note: We’re trying to eliminate places where the Effects and Complexity elements specifically mention
assignment. We don’t mind e.g. when [deque.modifiers] specifies that
"causes a single call to a constructor of
,"
because that’s still correct even if we’re optimizing trivially relocatable types. We don’t even mind when
[vector.modifiers] specifies that
calls
’s destructor "the number of times equal to the number of the elements erased,"
because of course it does; but we propose to remove that sentence anyway because it is redundant. We also don’t
mind when an operation says "Throws: Nothing unless an exception is thrown from the assignment operator
of
," because our new trivial-relocation "happy path" will never throw. Such a Throws element continues
to describe the circumstances under which the operation might throw. We never propose to loosen any Throws element.
5.1. [vector.modifiers]
Modify [vector.modifiers] as follows:
constexpr iterator insert ( const_iterator position , const T & x ); constexpr iterator insert ( const_iterator position , T && x ); constexpr iterator insert ( const_iterator position , size_type n , const T & x ); template < class InputIterator > constexpr iterator insert ( const_iterator position , InputIterator first , InputIterator last ); template < container - compatible - range < T > R > constexpr iterator insert_range ( const_iterator position , R && rg ); constexpr iterator insert ( const_iterator position , initializer_list < T > ); template < class ... Args > constexpr reference emplace_back ( Args && ... args ); template < class ... Args > constexpr iterator emplace ( const_iterator position , Args && ... args ); constexpr void push_back ( const T & x ); constexpr void push_back ( T && x ); template < container - compatible - range < T > R > constexpr void append_range ( R && rg ); 1. Complexity: If reallocation happens, linear in the number of elements of the resulting vector; otherwise, linear in the number of elements inserted plus the distance to the end of the vector.
2. Remarks: Causes reallocation if the new size is greater than the old capacity. Reallocation invalidates all the references, pointers, and iterators referring to the elements in the sequence, as well as the past-the-end iterator. If no reallocation happens, then references, pointers, and iterators before the insertion point remain valid but those at or after the insertion point, including the past-the-end iterator, are invalidated. If an exception is thrown other than by the copy constructor, move constructor, assignment operator, or move assignment operator of
or by any
T operation there are no effects. If an exception is thrown while inserting a single element at the end and
InputIterator is Cpp17CopyInsertable or
T is
is_nothrow_move_constructible_v < T > true
, there are no effects. Otherwise, if an exception is thrown by the move constructor of a non-Cpp17CopyInsertable, the effects are unspecified.
T constexpr iterator erase ( const_iterator position ); constexpr iterator erase ( const_iterator first , const_iterator last ); constexpr void pop_back (); 3. Effects: Invalidates iterators and references at or after the point of the erase.
4. Throws: Nothing unless an exception is thrown by the assignment operator or move assignment operator of
.
T 5. Complexity:
The destructor ofLinear in the number of elements after the first erased element in the original vector.is called the number of times equal to the number of the elements erased, but the assignment operator of
T is called the number of times equal to the number of elements in the vector after the erased elements.
T
5.2. [deque.modifiers]
Modify [deque.modifiers] as follows:
iterator insert ( const_iterator position , const T & x ); iterator insert ( const_iterator position , T && x ); iterator insert ( const_iterator position , size_type n , const T & x ); template < class InputIterator > iterator insert ( const_iterator position , InputIterator first , InputIterator last ); template < container - compatible - range < T > R > iterator insert_range ( const_iterator position , R && rg ); iterator insert ( const_iterator position , initializer_list < T > ); template < class ... Args > reference emplace_front ( Args && ... args ); template < class ... Args > reference emplace_back ( Args && ... args ); template < class ... Args > iterator emplace ( const_iterator position , Args && ... args ); void push_front ( const T & x ); void push_front ( T && x ); template < container - compatible - range < T > R > void prepend_range ( R && rg ); void push_back ( const T & x ); void push_back ( T && x ); template < container - compatible - range < T > R > void append_range ( R && rg ); 1. Effects: An insertion in the middle of the deque invalidates all the iterators and references to elements of the deque. An insertion at either end of the deque invalidates all the iterators to the deque, but has no effect on the validity of references to elements of the deque.
2. Complexity:
The complexity is linearLinear in the number of elements inserted plus the lesser of the distances to the beginning and end of the deque. Inserting a single element at either the beginning or end of a deque always takes constant time and causes a single call to a constructor of.
T 3. Remarks: If an exception is thrown other than by the copy constructor, move constructor, assignment operator, or move assignment operator of
there are no effects. If an exception is thrown while inserting a single element at either end, there are no effects. Otherwise, if an exception is thrown by the move constructor of a non-Cpp17CopyInsertable
T , the effects are unspecified.
T iterator erase ( const_iterator position ); iterator erase ( const_iterator first , const_iterator last ); void pop_front (); void pop_back (); 4. Effects: An erase operation that erases the last element of a deque invalidates only the past-the-end iterator and all iterators and references to the erased elements. An erase operation that erases the first element of a deque but not the last element invalidates only iterators and references to the erased elements. An erase operation that erases neither the first element nor the last element of a deque invalidates the past-the-end iterator and all iterators and references to all the elements of the deque. [Note:
and
pop_front are erase operations. — end note]
pop_back 5. Throws: Nothing unless an exception is thrown by the assignment operator of
.
T 6. Complexity:
The number of calls to the destructor ofLinear in the lesser of the number of elements after the first erased element and the number of elements before the last erased element in the original deque.is the same as the number of elements erased, but the number of calls to the assignment operator of
T is no more than the lesser of the number of elements before the erased elements and the number of elements after the erased elements.
T
5.3. [alg.rotate]
Modify [alg.rotate] as follows:
template < class ForwardIterator > constexpr ForwardIterator rotate ( ForwardIterator first , ForwardIterator middle , ForwardIterator last ); template < class ExecutionPolicy , class ForwardIterator > ForwardIterator rotate ( ExecutionPolicy && exec , ForwardIterator first , ForwardIterator middle , ForwardIterator last ); template < permutable I , sentinel_for < I > S > constexpr subrange < I > ranges :: rotate ( I first , I middle , S last ); 1. Preconditions:
and
[ first , middle ) are valid ranges. For the overloads in namespace
[ middle , last ) ,
std meets the Cpp17ValueSwappable requirements ([swappable.requirements]), and the type of
ForwardIterator meets the Cpp17MoveConstructible (Table 31) and Cpp17MoveAssignable (Table 33) requirements.
* first 2. Effects: For each non-negative integer
, places the element from the position
i < ( last - first ) into position
first + i . [Note: This is a left rotate. — end note]
first + ( i + ( last - middle )) % ( last - first ) 3. Returns:
for the overloads in namespace
first + ( last - middle ) .
std
for the overload in namespace
{ first + ( last - middle ), last } .
ranges 4. Complexity:
At mostLinear inswaps.
last - first .
last - first
5.4. [alg.swap]
Modify [alg.swap] as follows:
template < class ForwardIterator1 , class ForwardIterator2 > constexpr ForwardIterator2 swap_ranges ( ForwardIterator1 first1 , ForwardIterator1 last1 , ForwardIterator2 first2 ); template < class ExecutionPolicy , class ForwardIterator1 , class ForwardIterator2 > ForwardIterator2 swap_ranges ( ExecutionPolicy && exec , ForwardIterator1 first1 , ForwardIterator1 last1 , ForwardIterator2 first2 ); template < input_iterator I1 , sentinel_for < I1 > S1 , input_iterator I2 , sentinel_for < I2 > S2 > requires indirectly_swappable < I1 , I2 > constexpr ranges :: swap_ranges_result < I1 , I2 > ranges :: swap_ranges ( I1 first1 , S1 last1 , I2 first2 , S2 last2 ); template < input_range R1 , input_range R2 > requires indirectly_swappable < iterator_t < R1 > , iterator_t < R2 >> constexpr ranges :: swap_ranges_result < borrowed_iterator_t < R1 > , borrowed_iterator_t < R2 >> ranges :: swap_ranges ( R1 && r1 , R2 && r2 ); 1. Let:
be
last2 for the overloads with no parameter named
first2 + ( last1 - first1 ) ;
last2 M be
.
min ( last1 - first1 , last2 - first2 ) 2. Preconditions: The two ranges
and
[ first1 , last1 ) do not overlap. For the overloads in namespace
[ first2 , last2 ) ,
std is swappable with ([swappable.requirements])
* ( first1 + n ) for all
* ( first2 + n ) in the range
n .
[ 0 , M ) 3. Effects: For each non-negative integer n < M
performs:
exchanges the values of
swap ( * ( first1 + n ), * ( first2 + n )) and
* ( first1 + n ) for the overloads in namespace
* ( first2 + n ) ;
std - performs
for the overloads in namespace
ranges :: iter_swap ( first1 + n , first2 + n ) .
ranges 4. Returns:
for the overloads in namespace
last2 .
std
for the overloads in namespace
{ first1 + M , first2 + M } .
ranges 5. Complexity: Exactly M swaps.
template < class ForwardIterator1 , class ForwardIterator2 > constexpr void iter_swap ( ForwardIterator1 a , ForwardIterator2 b ); 6. Preconditions:
and
a are dereferenceable.
b is swappable with ([swappable.requirements])
* a .
* b 7. Effects: As if by
.
swap ( * a , * b )
5.5. [utility.swap] (unchanged)
[utility.swap] does not seem to require any changes:
template < class T > constexpr void swap ( T & a , T & b ) noexcept ( see below ); 1. Constraints:
is
is_move_constructible_v < T > true
andis
is_move_assignable_v < T > true
.2. Preconditions: Type
meets the Cpp17MoveConstructible (Table 31) and Cpp17MoveAssignable (Table 33) requirements.
T 3. Effects: Exchanges values stored in two locations.
4. Remarks: The exception specification is equivalent to:
is_nothrow_move_constructible_v < T > && is_nothrow_move_assignable_v < T > template < class T , size_t N > constexpr void swap ( T ( & a )[ N ], T ( & b )[ N ]) noexcept ( is_nothrow_swappable_v < T > ); 5. Constraints:
is
is_swappable_v < T > true
.6. Preconditions:
is swappable with ([swappable.requirements])
a [ i ] for all
b [ i ] in the range
i .
[ 0 , N ) 7. Effects: As if by
.
swap_ranges ( a , a + N , b )
5.6. [iterator.cust.swap] (unchanged)
Note: Trivial types may ignore the first bullet below, because even if a user-defined ADL
is
available, it must "exchange the values denoted by
and
" — that is, it must not be observably different
from an ordinary (possibly trivial) swap. The second and third bullets explicitly describe ways of performing
an ordinary (possibly trivial) swap by hand; for any P1144 trivially relocatable type, these are guaranteed to
be tantamount to swapping the bytes. Therefore vendors can already optimize
today,
without any change in this section’s wording.
[iterator.cust.swap] does not seem to require any changes:
1. The name
denotes a customization point object ([customization.point.object]) that exchanges the values ([concept.swappable]) denoted by its arguments.
ranges :: iter_swap 2. Let
be the exposition-only function template:
iter - exchange - move template < class X , class Y > constexpr iter_value_t < X > iter - exchange - move ( X && x , Y && y ) noexcept ( noexcept ( iter_value_t < X > ( iter_move ( x ))) && noexcept ( * x = iter_move ( y ))); 3. Effects: Equivalent to:
iter_value_t < X > old_value ( iter_move ( x )); * x = iter_move ( y ); return old_value ; 4. The expression
for subexpressions
ranges :: iter_swap ( E1 , E2 ) and
E1 is expression-equivalent to:
E2
, if either
( void ) iter_swap ( E1 , E2 ) or
E1 has class or enumeration type and
E2 is a well-formed expression with overload resolution performed in a context that includes the declaration
iter_swap ( E1 , E2 ) template < class I1 , class I2 > void iter_swap ( I1 , I2 ) = delete ; and does not include a declaration of
. If the function selected by overload resolution does not exchange the values denoted by
ranges :: iter_swap and
E1 , the program is ill-formed, no diagnostic required. [Note: This precludes calling unconstrained
E2 . When the deleted overload is viable, program-defined overloads need to be more specialized ([temp.func.order]) to be selected. —end note]
std :: iter_swap Otherwise, if the types of
and
E1 each model
E2 , and if the reference types of
indirectly_readable and
E1 model
E2 ([concept.swappable]), then
swappable_with .
ranges :: swap ( * E1 , * E2 ) Otherwise, if the types
and
T1 of
T2 and
E1 model
E2 and
indirectly_movable_storable < T1 , T2 > , then
indirectly_movable_storable < T2 , T1 > , except that
( void )( * E1 = iter - exchange - move ( E2 , E1 )) is evaluated only once.
E1 Otherwise,
is ill-formed. [Note: This case can result in substitution failure when
ranges :: iter_swap ( E1 , E2 ) appears in the immediate context of a template instantiation. —end note]
ranges :: iter_swap ( E1 , E2 )