1. Changelog
-
R8 (pre-Varna 2023):
-
dcl.attr.trivreloc: Removed "If a type
is declaredT
, andtrivially_relocatable
is either not move-constructible or not destructible, the program is ill-formed." This should have been removed in R7 along with "move-constructible, destructible."T -
meta.unary.prop: Adjusted the precondition of
to matchis_trivially_relocatable_v
. Removedis_trivially_copyable_v
,is_relocatable_v
for insufficient motivation (but keptis_nothrow_relocatable_v
because it expresses semantic requirements).concept relocatable -
Marked
asstd :: relocate
; changed its return type to[[ nodiscard ]]
.remove_cv_t < T > -
Resolved the Appendix E question about potentially overlapping subobjects; we have implementation experience now that proves this is not a problem. Expanded discussion of
types.pmr -
Replaced §3 "Design goals" with a shorter summary.
-
-
R7 (post-Issaquah 2023):
-
[P2786R0] debuted at Issaquah; it raised several open questions (see Appendix E: Open questions). The authors of P1144 and P2786 plan a joint paper to resolve some of these design issues.
-
Added
, the building block ofstd :: uninitialized_relocate_backward
.vector :: insert -
Removed "move-constructible, destructible" from the requirements in basic.types.general. Now trivially relocatable types, like trivially copyable types, can be non-relocatable.
-
Added table §2.1 directly comparing this proposal to Folly, BSL, and Qt.
-
Added straw poll results from Issaquah; removed much background and discussion (leaving only notes for the interested reader to consult [P1144R6]).
-
Removed mentions of EASTL.
-
Added Appendix E.
-
-
R6:
-
Added
.T std :: relocate ( T * source ) -
Changed the exception guarantee of
anduninitialized_relocate
in accord with feedback from David Stone.uninitialized_relocate_n -
Removed
(but keptstd :: ranges :: relocate_at
).std :: relocate_at
-
-
R5 (post-Prague 2020):
-
Polymorphic types (those with any vptr at all) inhibit "natural" trivial relocatability. David Stone gave the motivating example. See Polymorphic downcast effectively relies on offset-into-self.
-
Updated
. Added semantic requirements on assignment and some discussion of the issue withconcept relocatable
.vector :: insert
-
-
R3 (post-Kona 2019):
-
User-provided copy constructors and copy/move assignment operators inhibit "natural" trivial relocatability. This permits optimizing
andvector :: insert
.vector :: erase -
Adopted
.[[ trivially_relocatable ( bool )]] -
Added
.std :: relocate_at ( source , dest ) -
Removed the core-language blanket wording that permitted "the implementation" to eliminate moves and destroys of trivially relocatable types. Instead, the existence of
and handwaviness of e.g. [vector.capacity] combine to strongly imply (but still not to mandate) that library vendors will usestd :: relocate_at
wherever it helps with performance. We no longer propose to permit eliminating moves and destroys in e.g.std :: relocate_at
statements, except as already permitted under the as-if rule.return
-
2. Introduction and motivation
Given an object type
and memory addresses
and
,
the phrase "relocate a
from
to
" means
"move-construct
from
, and then immediately destroy the object at
."
Any type which is both move-constructible and destructible is relocatable.
A type is nothrow relocatable if its relocation operation is noexcept, and a type
is trivially relocatable if its relocation operation is trivial (which,
just like trivial move-construction and trivial copy-construction, means
"tantamount to
").
In practice, almost all relocatable types are trivially relocatable:
,
,
(on libc++),
(on MSVC).
Examples of non-trivially relocatable types include
(on libstdc++ and libc++),
(on libstdc++ and MSVC),
(everywhere). See Appendix C: Examples of non-trivially relocatable class types.
P1144 allows the library programmer to warrant to the compiler that a resource-management type is trivially relocatable. Explicit warrants are rarely needed because the compiler can infer trivial relocatability for Rule-of-Zero types. See § 4.5 Trivially relocatable type [basic.types.general].
The most important thing about P1144 relocation is that it is backward compatible and does not break either API or ABI. My intention is simply to legalize the well-understood tricks that many industry codebases ([BSL], [Folly], [Deque]) are already doing in practice. P1144 is not intended to change the behavior of any existing source code (except to speed it up), nor does it require major work from standard library vendors.
2.1. Optimizations enabled by trivial relocatability
The following optimizations are possible according to P1144R8’s notion of trivial relocatability. Here’s who does these optimizations in practice:
vector realloc | type erasure | fixed-cap move | vector
| vector
|
|
| |
Arthur’s libc++
|
| no | N/A |
(Godbolt) |
(Godbolt) | yes, uses
(Godbolt) |
(Godbolt) |
[BSL]
|
| no | ? |
|
| , unused by
| no |
[Folly]
|
| no | proposed for
|
|
| N/A | N/A |
[Qt]
|
|
|
|
|
|
| N/A |
2.1.1. Contiguous reallocation
Trivial relocation can be used anywhere we do the moral equivalent of
,
such as in
.
[Bench] (presented at C++Now 2018) shows a 3x speedup on
. This Reddit thread demonstrates a similar 3x speedup using the online tool Quick-Bench.
2.1.2. Moving in-place/SBO type-erased types like any
and function
Trivial relocation can be used to de-duplicate the code generated by type-erasing wrappers
like
,
, and
.
For these types, a move of the wrapper object is implemented in terms of a relocation of the contained object. (See for example libc++'s
.)
In general, the relocate operation must have a different instantiation for each
different contained type
, leading to code bloat. But every trivially relocatable
of a given size can share the same instantiation.
2.1.3. Moving fixed-capacity containers like static_vector
and small_vector
The move-constructor of
can be implemented naïvely as an element-by-element move (leaving the source vector’s elements in their moved-from state),
or efficiently as an element-by-element relocate (leaving the source vector empty).
For a detailed analysis of this case, see [FixedCapacityVector].
currently implements the
naïve element-by-element-move strategy, but after LEWG feedback, [P0843R5]
does permit the faster relocation strategy.
2.1.4. Contiguous insert and erase
Trivial relocation can be used anywhere we shift a contiguous range left or right,
such as in
(i.e., destroy the erased elements and "close the window"
by memmoving left) and
(i.e., "make a window" by memmoving right and
then construct the new elements in place). [Folly]'s
does
this optimization; see
.
Bloomberg’s [BSL] also does this optimization.
But § 5.2 Confusing interactions with std::pmr determines whether this optimization will be possible for certain types.
2.1.5. Swap
Given a reliable way of detecting trivial relocatability,
we can optimize any routine that uses the moral equivalent of
, such as
,
, or
.
Optimizing
produces massive code-size improvements for all swap-based
algorithms, including
and
.
See @19:56–21:22 in my C++Now 2019 talk,
and see this Godbolt. This Quick-Bench shows a 25% speedup
in
when it is allowed to use bitwise swap on a Rule-of-Zero type.
But § 5.2 Confusing interactions with std::pmr determines whether this optimization will be possible for certain types.
2.2. Assertions, not assumptions
Some data structures might reasonably assert the trivial relocatability of their elements, just as they sometimes assert the stronger property of trivial copyability today. This might help them guarantee reasonable performance, or guarantee exception-safety.
2.3. Eliminate manual warrants, improve safety
Many real-world codebases contain templates which require
trivial relocatability of their template parameters, but cannot today verify trivial relocatability. For example, [Folly] requires the programmer to warrant the trivial
relocatability of any type stored in a
:
class Widget { std :: vector < int > lst_ ; }; folly :: fbvector < Widget > vec ; // FAILS AT COMPILE TIME for lack of warrant
This merely encourages the programmer to add the warrant and continue. An incorrect warrant will be discovered only at runtime, via undefined behavior. See Allocated memory contains pointer to self, [FollyIssue889], and (most importantly) @27:26–31:47 in my C++Now 2019 talk.
class Gadget { std :: list < int > lst_ ; }; // sigh, add the warrant on autopilot template <> struct folly :: IsRelocatable < Gadget > : std :: true_type {}; folly :: fbvector < Gadget > vec ; // CRASHES AT RUNTIME due to fraudulent warrant
If this proposal is adopted, then Folly can start using
in the implementation of
, and the programmer can stop writing explicit warrants.
Finally, the programmer can start writing assertions of correctness, which aids maintainability and
can even find real bugs. Example:
class Widget { std :: vector < int > lst_ ; }; static_assert ( std :: is_trivially_relocatable_v < Widget > ); // correctly SUCCEEDS class Gadget { std :: list < int > lst_ ; }; static_assert ( std :: is_trivially_relocatable_v < Gadget > ); // correctly ERRORS
The improvement in user experience and safety in real-world codebases ([Folly], [BSL], [Qt], etc.) is the most important benefit to be gained by this proposal.
3. Design goals
For the former, longer, contents of this section, see P1144R7 §3.
Briefly: We want to support all five of the following use-cases (Godbolt).
static_assert ( std :: is_trivially_relocatable_v < std :: unique_ptr < int >> ); // #1 struct RuleOfZero { std :: unique_ptr < int > p_ ; }; static_assert ( std :: is_trivially_relocatable_v < RuleOfZero > ); // #2 struct [[ trivially_relocatable ]] RuleOf3 { RuleOf3 ( RuleOf3 && ); RuleOf3 & operator = ( RuleOf3 && ); ~ RuleOf3 (); }; static_assert ( std :: is_trivially_relocatable_v < RuleOf3 > ); // #3 struct [[ trivially_relocatable ]] Wrap0 { boost :: movelib :: unique_ptr < int > p_ ; static_assert ( ! std :: is_trivially_relocatable_v < decltype ( p_ ) > ); // it’s not annotated, but we know it’s actually trivial }; static_assert ( std :: is_trivially_relocatable_v < Wrap0 > ); // #4 struct [[ trivially_relocatable ]] Wrap3 { Wrap3 ( Wrap3 && ); Wrap3 & operator = ( Wrap3 && ); ~ Wrap3 (); int i_ ; boost :: interprocess :: offset_ptr < int > p_ = & i_ ; static_assert ( ! std :: is_trivially_relocatable_v < decltype ( p_ ) > ); // it’s not trivial, but we preserve our invariant correctly }; static_assert ( std :: is_trivially_relocatable_v < Wrap3 > ); // #5
4. Proposed wording for C++2b
The wording in this section is relative to the current working draft ([N4944]).
Note: There is no difficulty in changing the attribute syntax to a contextual-keyword syntax; the only downsides are aesthetic. We can defer that decision to the last minute, according to CWG’s feedback on the final wording.
4.1. Nothrow bidirectional iterator [algorithms.requirements]
Modify [algorithms.requirements] as follows:
If an algorithm’s template parameter is named
,
ForwardIterator ,
ForwardIterator1 , or
ForwardIterator2 , the template argument shall meet the Cpp17ForwardIterator requirements ([forward.iterators]) if it is required to be a mutable iterator, or model
NoThrowForwardIterator ([iterator.concept.forward]) otherwise.
forward_iterator If an algorithm’s template parameter is named
, the template argument is also required to have the property that no exceptions are thrown from increment, assignment, or comparison of, or indirection through, valid iterators.
NoThrowForwardIterator If an algorithm’s template parameter is named
,
BidirectionalIterator ,
BidirectionalIterator1 or, or
BidirectionalIterator2 , the template argument shall meet the Cpp17BidirectionalIterator requirements ([bidirectional.iterators]) if it is required to be a mutable iterator, or model
NoThrowBidirectionalIterator ([iterator.concept.bidir]) otherwise.
bidirectional_iterator - If an algorithm’s template parameter is named
, the template argument is also required to have the property that no exceptions are thrown from increment, decrement, assignment, or comparison of, or indirection through, valid iterators.
NoThrowBidirectionalIterator
4.2. Relocation operation [defns.relocation]
Add a new section in [intro.defs]:
- relocation operation
the homogeneous binary operation performed by
, consisting of a move construction immediately followed by a destruction of the source object
std :: relocate_at
4.3. relocate_at
and relocate
[specialized.relocate]
Add a new section after [specialized.destroy]:
template < class T > T * relocate_at ( T * source , T * dest ); Mandates:
is a complete non-array object type.
T Effects: Equivalent to:
struct guard { T * t ; ~ guard () { destroy_at ( t ); } } g ( source ); return :: new ( voidify ( * dest )) T ( std :: move ( * source )); except that if
is trivially relocatable [basic.types], side effects associated with the relocation of the value of
T might not happen.
* source template < class T > [[ nodiscard ]] remove_cv_t < T > relocate ( T * source ); Mandates:
is a complete non-array object type.
T Effects: Equivalent to:
remove_cv_t < T > t = std :: move ( source ); destroy_at ( source ); return t ; except that if
is trivially relocatable [basic.types], side effects associated with the relocation of the object’s value might not happen.
T
Note: These functions have both been implemented in my libc++ fork; for
, see godbolt.org/z/cqPP4oeE9 and [StdRelocateIsCute]. My implementation of their "as-if-by-memcpy" codepaths relies on
Clang’s
; vendors can use any vendor-specific means to implement them.
The wording also deliberately permits a low-quality implementation with no such codepath at all.
See @45:23–48:39 in my C++Now 2019 talk.
4.4. uninitialized_relocate
, uninitialized_relocate_n
, uninitialized_relocate_backward
[uninitialized.relocate]
Add a new section after [uninitialized.move]:
template < class InputIterator , class NoThrowForwardIterator > NoThrowForwardIterator uninitialized_relocate ( InputIterator first , InputIterator last , NoThrowForwardIterator result ); Effects: Equivalent to:
try { for (; first != last ; ++ result , ( void ) ++ first ) { :: new ( voidify ( * result )) typename iterator_traits < NoThrowForwardIterator >:: value_type ( std :: move ( * first )); destroy_at ( addressof ( * first )); } return result ; } catch (...) { destroy ( ++ first , last ); throw ; } except that if the iterators' common value type is trivially relocatable, side effects associated with the relocation of the object’s value might not happen.
Remarks: If an exception is thrown, all objects in both the source and destination ranges are destroyed.
template < class InputIterator , class Size , class NoThrowForwardIterator > pair < InputIterator , NoThrowForwardIterator > uninitialized_relocate_n ( InputIterator first , Size n , NoThrowForwardIterator result ); Effects: Equivalent to:
try { for (; n > 0 ; ++ result , ( void ) ++ first , -- n ) { :: new ( voidify ( * result )) typename iterator_traits < NoThrowForwardIterator >:: value_type ( std :: move ( * first )); destroy_at ( addressof ( * first )); } return { first , result }; } catch (...) { destroy_n ( ++ first , -- n ); throw ; } except that if the iterators' common value type is trivially relocatable, side effects associated with the relocation of the object’s value might not happen.
Remarks: If an exception is thrown, all objects in both the source and destination ranges are destroyed.
template < class BidirectionalIterator , class NoThrowBidirectionalIterator > NoThrowBidirectionalIterator uninitialized_relocate_backward ( BidirectionalIterator first , BidirectionalIterator last , NoThrowBidirectionalIterator result ); Effects: Equivalent to:
try { for (; last != first ; ) { -- last ; -- result ; :: new ( voidify ( * result )) typename iterator_traits < NoThrowBidirectionalIterator >:: value_type ( std :: move ( * last )); destroy_at ( addressof ( * last )); } return result ; } catch (...) { destroy ( first , ++ last ); throw ; } except that if the iterators' common value type is trivially relocatable, side effects associated with the relocation of the object’s value might not happen.
Remarks: If an exception is thrown, all objects in both the source and destination ranges are destroyed.
Note: The Remarks allude to blanket wording in [specialized.algorithms.general]/2.
4.5. Trivially relocatable type [basic.types.general]
Add a new section in [basic.types.general]:
An object typeis a trivially relocatable type if it is:
T
a trivially copyable type, or
an array of trivially relocatable type, or
a (possibly cv-qualified) class type declared with a
attribute with value
trivially_relocatable true
[dcl.attr.trivreloc], ora (possibly cv-qualified) class type which:
has no user-provided move constructors or move assignment operators,
has no user-provided copy constructors or copy assignment operators,
has no user-provided destructors,
has no virtual member functions,
has no virtual base classes,
all of whose members are either of reference type or of trivially relocatable type, and
all of whose base classes are of trivially relocatable type.
[Note: For a trivially relocatable type, the relocation operation (as performed by, for example,
or
std :: swap_ranges ) is tantamount to a simple copy of the underlying bytes. —end note]
std :: vector :: reserve [Note: It is intended that most standard library types be trivially relocatable types. —end note]
Note: Polymorphic types are disallowed from "natural" trivial relocatability. See Appendix C, example 5. Volatile members are not disallowed. See [Subobjects].
4.6. [[ trivially_relocatable ]]
attribute [dcl.attr.trivreloc]
Add a new section after [dcl.attr.nouniqueaddr]:
The attribute-tokenspecifies that a class type’s relocation operation has no visible side-effects other than a copy of the underlying bytes, as if by the library function
trivially_relocatable . It may be applied to the definition of a class. It shall appear at most once in each attribute-list. An attribute-argument-clause may be present and, if present, shall have the form
std :: memcpy The constant-expression shall be an integral constant expression of type( constant - expression ) . If no attribute-argument-clause is present, it has the same effect as an attribute-argument-clause of
bool .
( true) If any definition of a class type has a
attribute with value V, then each definition of the same class type shall have a
trivially_relocatable attribute with value V. No diagnostic is required if definitions in different translation units have mismatched
trivially_relocatable attributes.
trivially_relocatable If a class type is declared with the
attribute, and the program relies on observable side-effects of its relocation other than a copy of the underlying bytes, the behavior is undefined.
trivially_relocatable
4.7. Type trait is_trivially_relocatable
[meta.unary.prop]
Add a new entry to Table 47 in [meta.unary.prop]:
Template Condition Preconditions
template < class T > struct is_trivially_relocatable ; is a trivially relocatable type.
T shall be a complete type or cv
remove_all_extents_t < T > .
void
4.8. relocatable
concept [concept.relocatable]
Add a new section after [concept.copyconstructible]:
template < class T > concept relocatable = move_constructible < T > ; If
is an object type, then let
T be an rvalue of type
rv ,
T an lvalue of type
lv equal to
T , and
rv a distinct object of type
u2 equal to
T .
rv models
T only if
relocatable
After the definition
,
T u = rv ; is equal to
u .
u2
is equal to
T ( rv ) .
u2 If the expression
is well-formed, then the expression has the same semantics as
u2 = rv
u2 . ~ T (); :: new (( void * ) std :: addressof ( u2 )) T ( rv ); If the definition
is well-formed, then after the definition
T u = lv ; is equal to
u .
u2 If the expression
is well-formed, then the expression’s result is equal to
T ( lv ) .
u2 If the expression
is well-formed, then the expression has the same semantics as
u2 = lv
u2 . ~ T (); :: new (( void * ) std :: addressof ( u2 )) T ( lv );
Note: We intend that a type may be relocatable
regardless of whether it is copy-constructible; but, if it is copy-constructible then copy-and-destroy
must have the same semantics as move-and-destroy. We intend that a type may be relocatable regardless of
whether it is assignable; but, if it is assignable then assignment must have the same semantics as
destroy-and-copy or destroy-and-move.
The semantic requirements on assignment help us optimize
and
.
satisfies
, but it models
only when all relevant objects have equal allocators.
5. Rationale and alternatives
5.1. Attribute [[ maybe_trivially_relocatable ]]
My Clang implementation, currently available on Godbolt, supports both
and another attribute called
,
which John McCall requested that I explore.
In Issaquah (February 2023), [P2786R0] suggested a very similar design to
.
EWGI took a three-way straw poll on
versus
,
with an inconclusive 7–5–6 vote (the author of P1144 voting "For" and the three representatives of P2786
presumably voting "Against").
For discussion of
and why I don’t think it’s the right thing
to standardize, see
-
by itself cannot support § 3 Design goals examples #4 and #5[[ maybe_trivially_relocatable ]] -
"Should the compiler sometimes reject a
warrant?" (2023-03-10)[[ trivially_relocatable ]]
5.2. Confusing interactions with std :: pmr
Note: This section was added in P1144R5, revised in P1144R7, and again in P1144R8.
Here I assume libc++, where
is trivially relocatable.
Feel free to substitute your favorite trivially relocatable container type;
I’m sticking with
for these examples because it is short
and easy to spell.
P1144 proposes a "trivial relocatability" very similar to "trivial copyability": if the user provides any special member, we assume that the user wants control over this type’s value semantics, and we won’t implicitly infer triviality. This includes the assignment operator.
struct A { A ( A && ) = default ; A & operator = ( A && ); ~ A () = default ; }; static_assert ( ! std :: is_trivially_relocatable_v < A > ); // P2786 would say the opposite here
P1144 treats move as an optimization of copy; thus we assume that it always ought to be
safe to replace "copy and destroy" with "move and destroy" (and thus with "relocate").
This is informed by Arthur’s experience standardizing "implicit move" in P1155 (C++20) and P2266 (C++23).
"Implicit move" affects
types because their copy differs from their move. Example:
struct ByValueSink { ByValueSink ( std :: pmr :: vector < int > v ) : v_ ( std :: move ( v )) {} std :: pmr :: vector < int > v_ ; }; ByValueSink f ( std :: pmr :: memory_resource * mr ) { auto v = std :: pmr :: vector < int > ( mr ); return v ; // C++17, Clang 12-: Copies (changes allocator) // C++20, GCC, Clang 13+: Moves (preserves allocator) }
We didn’t care about this when fiddling with implicit move; P1144 implicitly believes we shouldn’t care about it now. (This position might be debatable, but it’s definitely part of the worldview that led to P1144’s specific proposed semantics, so it’s good to have this background in mind.)
P1144 wants to be able to optimize not just vector reserve, but also vector insert and erase
(see table §2.1). Today, vector insert and erase use move-assignment, not relocation.
(In fact, [vector.modifiers]/5 specifies
the exact number of times "the assignment operator of
is called." We might want to fix that,
but that’s out of scope for this paper.)
We can optimize vector insert to use (trivial) relocation for element types that model
([concept.relocatable]).
Some types invariably model
; that is, their assignment is naturally tantamount to construct-and-destroy.
But for
types and polymorphic types, assignment is not tantamount to construct-and-destroy, except under certain conditions. For
types, the condition is "both objects have the same
allocator." For polymorphic types, the condition is "both objects have the same dynamic type."
std :: vector < Derived > dv = { ~~~ }; dv . erase ( dv . begin ()); // Every object in the vector certainly has the same dynamic type. std :: pmr :: vector < std :: pmr :: string > pv = { ~~~ }; pv . erase ( pv . begin ()); // Every string in the vector certainly has the same allocator. Derived da [ 5 ] = { ~~~ }; std :: rotate ( da , da + 2 , da + 5 ); // Every object in the contiguous range certainly has the same dynamic type. std :: pmr :: string pa [ 5 ] = { ~~~ }; std :: rotate ( pa , pa + 2 , pa + 5 ); // Objects in the contiguous range might have different allocators?
P1144 aims to permit efficient insertion and erasure in
. Example:
std :: vector < std :: string > vs ( 501 ); vs . erase ( vs . begin ());
This
requires us to shift down 500
objects, which can be done either
by 500 calls to
followed by one call to
, or by
one call to
followed by a
(as seen for example in [Folly] and BSL;
see table §2.1). We want to permit BSL’s implementation.
That’s why since P1144R5, the definition of
in §4.8 places semantic requirements on
's assignment operators (if they exist) as well as on
's constructors and destructors.
If
is considered trivially relocatable, this will trickle down into all Rule-of-Zero
types with a
member. (Godbolt.)
static_assert ( std :: is_trivially_relocatable_v < std :: pmr :: string > ); // Before: not true, of course // After: suppose this is true struct A { std :: pmr :: string ps ; }; static_assert ( std :: is_trivially_relocatable_v < A > ); // After: then this will also be true, by the Rule of Zero std :: vector < A > v ; v . push_back ({ std :: pmr :: string ( "one" , & mr1 )}); v . push_back ({ std :: pmr :: string ( "two" , & mr2 )}); v . erase ( v . begin ()); // Before: Well-defined behavior, acts as-if by assignment // After: Do we somehow make this a precondition violation? assert ( v [ 0 ]. ps . get_allocator (). resource () == & mr1 ); // Before: Succeeds // After: Might fail (and on a quality implementation, *will* fail)
If we (1) advertise
as
; (2) propagate
trivial relocatability in the core language as both P1144 and P2786 propose to do; and
(3) optimize vector insert and erase for trivially relocatable types; then we inevitably
arrive here. Arthur’s solution is to impose preconditions on user-defined types
used within certain parts of the STL: Just as their destructors are forbidden to (dynamically)
throw exceptions, and their copy constructors are forbidden to (dynamically) make things that aren’t copies,
their assignment operators ought to be forbidden to (dynamically) violate
's semantic requirements.
What new wording is needed to achieve this?
5.3. Non-issue: Overlapping base-class subobjects
Note: This section was added in P1144R7 and resolved in P1144R8.
In the following snippet,
is trivially copyable, so we
might think that
should be allowed to swap the bytes of
with the bytes of
.
A naïve approach is to swap
bytes, but that’s unsafe,
because the tail padding of
is allowed to contain the value of
. If
is to do "trivial swapping," it must account
for potentially overlapping subobjects.
struct Dangerous { Dangerous ( int i , int j ) : i ( i ), j ( j ) {} int i ; short j ; }; static_assert ( std :: is_standard_layout_v < Dangerous > ); static_assert ( std :: is_trivially_copyable_v < Dangerous > ); static_assert ( std :: is_trivially_relocatable_v < Dangerous > ); struct Derived : Dangerous { Derived ( int i , int j , int k ) : Dangerous ( i , j ), k ( k ) {} short k ; }; static_assert ( ! std :: is_standard_layout_v < Derived > ); static_assert ( std :: is_trivially_copyable_v < Derived > ); static_assert ( std :: is_trivially_relocatable_v < Derived > ); int main () { Derived a = Derived ( 1 , 2 , 3 ); Derived b = Derived ( 4 , 5 , 6 ); Dangerous & ra = a , & rb = b ; std :: swap ( ra , rb ); // must swap only 6 bytes, // not sizeof(Dangerous) = 8 bytes assert ( a . k == 3 && b . k == 6 ); }
Arthur’s libc++ fork implements this solution (P1144R7’s "possible solution #4"):
is allowed to bytewise-swap trivially copyable and/or trivially relocatable types
only when they are definitely not potentially overlapping subobjects. In practice, this means:
-
can swap bytes as long as the type has no occupied padding. On the Itanium ABI, this means we can swapstd :: swap
bytes: non-tail padding bytes are never occupied. On an unknown ABI, we can still swapdatasizeof ( T )
bytes if we know thatsizeof ( T )
, because such ahas_unique_object_representations_v < T >
cannot have any padding at all. Worst case, we simply treat the type as non-trivial.T -
can swap bytes as long asstd :: swap_ranges
. A range of length 1 might be a single object, which might have padding that cannot be swapped. Implementors already must do this logic indistance ( first , last ) >= 2
etc. (see libstdc++ bug #108846) so doing it also instd :: copy
is no hardship.swap_ranges
6. Acknowledgements
Thanks to Pablo Halpern for [N4158], to which this paper bears a striking resemblance —
Significantly different approaches to this problem have previously appeared in Rodrigo Castro Campos’s [N2754], Denis Bider’s [P0023R0] (introducing a core-language "relocation" operator), and Niall Douglas’s [P1029R3] (treating trivial relocatability as an aspect of move-construction in isolation, rather than an aspect of the class type as a whole). A less different approach is taken in Mungo Gill & Alisdair Meredith’s [P2786R0].
Thanks to Elias Kosunen, Niall Douglas, John Bandela, and Nicolas Lesser for their feedback on early drafts of P1144R0.
Thanks to Nicolas Lesser and John McCall for their review comments on the Clang implementation [D50119].
Many thanks to Matt Godbolt for allowing me to install my Clang fork on Compiler Explorer (godbolt.org). See also [Announcing].
Thanks to Howard Hinnant for appearing with me on [CppChat], and to Jon Kalb and Phil Nash for hosting us.
Thanks to Marc Glisse for his work integrating a "trivially relocatable" trait into GNU libstdc++ and for answering my questions on GCC bug 87106.
Thanks to Jens Maurer for his feedback on P1144R3 at Kona 2019, and to Corentin Jabot for championing P1144R4 at Prague 2020.
Thanks to Dana Jansens for her contributions re overlapping and non-standard-layout types (see [Subspace]), to Alisdair Meredith for our extensive discussions during the February 2023 drafting of [P2786R0], to Giuseppe D’Angelo and Thiago Maceira for contributing the Qt entries in table §2.1, and to Giuseppe D’Angelo for extensive review comments and discussion.
Thanks to Mungo Gill and Alisdair Meredith for [P2786R0] and [P2814R0].
Appendix A: Straw polls
Polls taken at EWGI at Issaquah on 2023-02-10
Arthur O’Dwyer presented [P1144R6]. Alisdair Meredith presented [P2786R0] (which proposed a
-style facility, and expressed it as a contextual keyword instead
of an attribute). EWGI took the following straw polls (as well as polls on attribute syntax and on both papers' readiness for EWG).
SF | F | N | A | SA | |
---|---|---|---|---|---|
The problem presented in P1144/P2786 is worth solving. | 10 | 8 | 0 | 0 | 0 |
The problem being introduced in P1144/P2786 should be solved in a more general way instead of as proposed. | 3 | 0 | 5 | 6 | 4 |
The annotation should "trust the user" as in P1144R6’s ("sharp knife"),
instead of diagnosing as in P1144R6’s and P2786R0’s ("dull knife"). Three-way poll.
| — | 7 | 5 | 6 | — |
Polls taken at EWGI at Prague on 2020-02-13
Corentin Jabot championed P1144R4. EWGI discussed P1144R4 and Niall Douglas’s [P1029R3] consecutively, then took the following straw polls (as well as a poll on the attribute syntax).
SF | F | N | A | SA | |
---|---|---|---|---|---|
We believe that P1029 and P1144 are sufficiently different that they should be advanced separately. | 7 | 3 | 2 | 0 | 0 |
EWGI is ok to have the spelling as an attribute with an expression argument. | 3 | 5 | 1 | 1 | 0 |
EWGI thinks the author should explore P1144 as a customizable type trait. | 0 | 0 | 0 | 9 | 2 |
Forward P1144 to EWG. | 1 | 3 | 4 | 1 | 0 |
For polls taken September–November 2018, see [P1144R6].
Appendix B: Sample code
See [P1144R6]'s Appendix B for reference implementations of
,
, and
P1144R6’s version of the
library algorithm, plus
a conditionally trivially relocatable
.
I have implemented the entire Standard Library using the
attribute; you can find the source code on my GitHub and explore the resulting codegen on Godbolt Compiler Explorer.
Appendix C: Examples of non-trivially relocatable class types
Class contains pointer to self
This fictional
illustrates a mechanism that can apply
to any small-buffer-optimized class. libc++'s
uses this mechanism and is thus not trivially relocatable.
However, different mechanisms for small-buffer optimization exist. libc++'s
achieves small-buffer optimization
on a 24-byte buffer, without (necessarily) sacrificing trivial relocatability.
struct short_string { char * data_ = buffer_ ; size_t size_ = 0 ; char buffer_ [ 8 ] = {}; const char * data () const { return data_ ; } short_string () = default ; short_string ( const char * s ) : size_ ( strlen ( s )) { if ( size_ < sizeof buffer_ ) strcpy ( buffer_ , s ); else data_ = strdup ( s ); } short_string ( short_string && s ) { memcpy ( this , & s , sizeof ( * this )); if ( s . data_ == s . buffer_ ) data_ = buffer_ ; else s . data_ = nullptr ; } ~ short_string () { if ( data_ != buffer_ ) free ( data_ ); } };
Allocated memory contains pointer to self
needs somewhere to store its "past-the-end" node, commonly referred to
as the "sentinel node," whose
pointer points to the list’s last node.
If the sentinel node is allocated on the heap, then
can be trivially
relocatable; but if the sentinel node is placed within the
object itself
(as happens on libc++ and libstdc++), then relocating the
object requires
fixing up the list’s last node’s
pointer so that it points to the
new sentinel node inside the destination
object. This fixup of an arbitrary
heap object cannot be simulated by
.
Traditional implementations of
and
also store a "past-the-end"
node inside themselves and thus also fall into this category.
struct node { node * prev_ = nullptr ; node * next_ = nullptr ; }; struct list { node n_ ; iterator begin () { return iterator ( n_ . next_ ); } iterator end () { return iterator ( & n_ ); } list ( list && l ) { if ( l . n_ . next_ ) l . n_ . next_ -> prev_ = & n_ ; // fixup if ( l . n_ . prev_ ) l . n_ . prev_ -> next_ = & n_ ; // fixup n_ = l . n_ ; l . n_ = node {}; } // ... };
Class invariant depends on this
The
provided by [Boost.Interprocess] is an example of this category.
struct offset_ptr { uintptr_t value_ ; uintptr_t here () const { return uintptr_t ( this ); } uintptr_t distance_to ( void * p ) const { return uintptr_t ( p ) - here (); } void * get () const { return ( void * )( here () + value_ ); } offset_ptr () : value_ ( distance_to ( nullptr )) {} offset_ptr ( void * p ) : value_ ( distance_to ( p )) {} offset_ptr ( const offset_ptr & rhs ) : value_ ( distance_to ( rhs . get ())) {} offset_ptr & operator = ( const offset_ptr & rhs ) { value_ = distance_to ( rhs . get ()); return * this ; } ~ offset_ptr () = default ; };
Program invariant depends on this
In the following snippet,
is relocatable, but not
trivially relocatable, because the relocation operation of destroying a
at point A
and constructing a new
at point B has behavior that is observably different
from a simple
.
std :: set < void *> registry ; struct registered_object { registered_object () { registry . insert ( this ); } registered_object ( registered_object && ) = default ; registered_object ( const registered_object & ) = default ; registered_object & operator = ( registered_object && ) = default ; registered_object & operator = ( const registered_object & ) = default ; ~ registered_object () { registry . erase ( this ); } }; struct Widget : registered_object {};
Polymorphic downcast effectively relies on offset-into-self
Thanks to David Stone for this example.
In the following snippet,
is relocatable, but not trivially relocatable,
because its copy constructor and assignment operator do not copy the entire state of the
right-hand object. (Notice that
is initialized with
, not with a copy of
.)
struct Base { static int f ( Base * ) { return 21 ; } int ( * pf )( Base * ); Base ( int ( * pf )( Base * ) = f ) : pf ( pf ) {} Base ( const Base & o ) : pf ( f ) {} Base & operator = ( const Base & ) { return * this ; } }; struct Derived : Base { static int f ( Base * self ) { return (( Derived * ) self ) -> x ; } Derived () : Base ( f ) {} Derived ( const Derived & ) = default ; Derived & operator = ( const Derived & o ) { x = o . x ; return * this ; } int x = 42 ; }; int main () { Base && d = Derived (); Base && b = Base (); std :: swap ( b , d ); printf ( "%d \n " , b . pf ( & b )); }
The above snippet is isomorphic to a classically polymorphic hierarchy
with virtual methods. Here is the same snippet using
:
struct Base { virtual int f () { return 21 ; } }; struct Derived : Base { int f () override { return x ; } int x = 42 ; }; int main () { Base && b = Base (); Base && d = Derived (); std :: swap ( b , d ); printf ( "%d \n " , b . f ()); }
This is why (since P1144R5) the compiler will not consider types with virtual methods to be "naturally" trivially relocatable.
Appendix D: Implementation experience
A prototype Clang/libc++ implementation is at
-
github.com/Quuxplusone/llvm-project/tree/trivially-relocatable
-
godbolt.org, under the name "x86-64 clang (experimental P1144)"
Side-by-side case studies of
,
,
and
are given in [P1144R6].
As of November 2018, libstdc++ performs the
optimization
for any type which has manually specialized
.
(See this Godbolt.)
Manual specialization is also the approach used by [Qt], [Folly], and [BSL].
As of May 2023, the only libstdc++ library type for which
has
been specialized is
; see [Deque].
Clang trunk provides a compiler builtin type trait
(see [D114732]), which is largely the same as the trait proposed in this paper. (There are slight
differences; e.g., Clang reports polymorphic types, reference types, and incomplete types
as trivially relocatable, whereas P1144 does not. I’m not aware that any of Clang’s differences were intentional.)
Clang trunk has no equivalent of the
attribute, so
is true only for
trivially copyable types and types marked with the
attribute. As of May 2023, Clang trunk has no conception of a type which is non-trivial for purposes of calls
and yet is trivially relocatable. But Clang’s current status is compatible with P1144 (modulo the
few unintentional differences in
mentioned above).
Appendix E: Open questions
In P1144R8, types with vptrs are never naturally trivially relocatable. Arthur claims that types with vptrs should be used only with inheritance, which means they never do value semantics (they slice instead), which means relocating one is always a bug, just like assigning or copying one. But [P2786R0] proposes to make polymorphic types naturally trivially relocatable. Is there any use-case for relocating polymorphic types? Is P1144 being too conservative here?
In P1144R8,
can use trivial relocation to
"close up the window" in the vector; this is also done in [Qt], [Folly], and [BSL]. The difference between
closing-the-window-by-move-assignment and closing-the-window-by-memmove is detectable in a conforming C++ program.
Therefore [P2786R0] proposes that relocation should not be allowed here.
Is there any implementation experience of P2786R0’s conservative position here?
([BSL]'s
follows P1144R8’s liberal position already.)
In P1144R8,
on an array
can use trivial relocation to
swap the elements. The difference between swap-by-move-assignment and swap-by-trivial-relocation is not detectable
in a conforming C++ program, because swapping
s with unequal allocators is UB. [P2786R0] conservatively proposes that relocation should not be allowed here, anyway.
Is there any usage experience in [Qt], [BSL], [Folly], or elsewhere that bears on this question?
(Arthur’s libc++ fork does relocation here, but doesn’t really count as usage experience since he’s the only one using it.)
If
is considered trivially relocatable, then so would be a Rule-of-Zero type
with a
data member. This means that we optimize
(Godbolt),
which is a huge performance win but again is observably different behavior
from the old way — and in this particular case the old behavior is not undefined!
Still, it would be sad to give up this massive performance improvement for
types.
Can we modify the preconditions of {
,
,
,
} to make this optimization conforming?