Document #: | P2839R0 |
Date: | 2023-05-12 |
Project: | Programming Language C++ |
Audience: |
EWGI |
Reply-to: |
Brian Bi <bbi10@bloomberg.net> Joshua Berne <jberne4@bloomberg.net> |
A new type of reference, known as the owning reference, is
proposed with the spelling T~
.
An owning reference is responsible for destroying the object it refers
to and may be used to initialize the parameter of a constructor of the
form T::T(T~)
, which is known as
a relocation constructor and performs the responsibilities of
both a constructor and destructor. An owning reference that has been
moved from is disengaged, does not refer to an object, and is
ill formed when named by an expression. Further extensions that build on
top of the basic concept of owning references are proposed to facilitate
the implementation of user-defined relocation constructors.
Numerous proposals have attempted to introduce the ability to tie together the construction of one object with the destruction of a source object, migrating the value of that source object in the process. We discuss some such proposals in Section 7. Normal C++ move-initialization accomplishes migration of an object’s value but fails to address the source object’s lifetime, thus requiring that the source object support being in a valueless state (i.e., a state having a logically valid value that must be accounted for by any function having a wide contract, while semantically representing the absence of a value).
When an object’s lifetime can also be ended as part of moving its value to a new object, performance can be improved in various ways:
On top of that, although not all move-initializations immediately precede the destruction of the source object, many of them do:
std::vector
often involves
chains of move-construct and destroy operations.Surprisingly, types that do not include references to themselves tend
to not only be relocatable, but to be relocatable in a trivial fashion;
i.e., the relocation can be accomplished by simply invoking
memcpy
on the source object and
then not invoking that object’s destructor but nonetheless ending its
lifetime. This case is so prevalent and the performance benefits when
taking advantage of it within containers is so significant that numerous
historical proposals have been written to add just trivial
relocation to the library or language: [P1029R1], [P1144R6], and [P2786R0].
A common initial response to these proposals for trivial relocation is confusion about proposing a trivial version of an operation where we do not have a nontrivial option. Certainly no other fundamental C++ operation is supported only when trivial and provides no mechanism to insert a user-defined version of that operation. This proposal aims to provide the complete context in which to understand how trivial relocation could fit into a larger picture. In particular, should a trivial relocation proposal such as [P2786R0] move forward, it would be completely compatible with extending to arbitrary user-defined relocation through the owning references that we propose here.
By expressing relocation through owning references, rather than simply providing library functions that allow such relocation, we also extend the ability to leverage relocation in more places as well as to safely prevent one of the most common issues with move operations, use after move.
Our proposal is layered in three parts, with each part dependent on only the previous parts. Later parts could easily be delayed for future Standards while reaping a subset of the benefits and expressivity with a smaller initial feature.
Part I introduces owning references and the core language rules governing their behavior that are needed to support defaulted relocation constructors, which the authors believe will suffice for the vast majority of use cases that can benefit from nontrivial relocation.
Part II is a minimal extension to enable users to write their own relocation constructors. A new syntax is proposed to allow the compiler to track which subobjects have been relocated from within the ctor-initializer of a user-defined relocation constructor.
Part III builds on top of Part II to provide further usability benefits in the implementation of relocation constructors and to enable destructors to relocate subobjects.
Appendix A discusses extensions that depend only on Part I but are not included in any of the three main parts because of their potential impact upon existing code.
For every object type T
, we
propose the introduction of a type called owning reference to
T
, which is denoted by
T~
. An owning reference binds
only to a value category known as the rlvalue. An rlvalue
denotes an object that can be relocated from, just as an xvalue denotes
an object that can be moved from. (See Appendix C for a discussion of
alternative names.)
A value of owning reference type may be either “engaged” or “disengaged.” An engaged owning reference owns an object: When the owning reference’s lifetime ends, the object it owns is destroyed. (If the owning reference is a function parameter, the implicit destructor call at the end of the reference’s lifetime is performed in callee context because the caller cannot know whether the callee has disengaged its owning reference parameter.) Whether a particular owning reference is engaged or disengaged at a particular program point is always known statically according to the rules that we will describe later in this section, and no runtime flags need to be maintained to track that status.
The name of a variable of type
T~
is an lvalue, just like the
name of any other reference variable. The lvalue can be converted to an
rlvalue using the reloc
operator, which will be discussed in more detail later in this section.
The resulting rlvalue is then engaged, and the original variable is
disengaged. An id-expression that names a disengaged owning
reference is ill formed. If a variable of owning reference type is
disengaged along some paths of control flow, it is implicitly disengaged
at the end of all other branches (i.e., immediately before they rejoin
the branch containing the explicit disengagement), as necessary, to
ensure that it is known to be disengaged when the branches rejoin.
struct T {
int m;
};
void g(T& x);
void f(T~ ref) { // `ref` is engaged and owns some object.
(ref); // OK; `ref` is an lvalue.
g
~ ref2 = reloc ref;
T// `ref` is disengaged; `ref2` is engaged and owns the object.
(ref); // ill formed; `ref` is disengaged
g++ref.m; // ditto
(ref2); // OK
g
if (rand() % 2) {
{
~ ref3 = reloc ref2;
T// `ref3` is engaged and `ref2` is disengaged.
// `ref3`'s lifetime ends here; `ref3.~T()` is called.
}
(ref2); // error
g} else {
(ref2); // OK
g// `ref2` is implicitly disengaged here; `ref2.~T()` is called.
}
(ref2); // error
g}
See also Section 4.8 below.
A relocation constructor is a nontemplate constructor of a
class T
whose first parameter is
of type T~
and all of whose
remaining parameters (if any) have default arguments. From the previous
paragraph, it is apparent that when the constructor’s parameter is
initialized, the parameter has unique ownership of the object it refers
to, and any other rlvalue referring to the same object will have become
disengaged. A relocation constructor, like any other constructor,
creates an object. Because the
T~
parameter’s lifetime ends at
the closing brace of the constructor, the source object’s lifetime will
end by the time the relocation constructor returns.
Certain types will have implicitly declared relocation constructors
that are declared and defined in a similar manner to other special
member functions but are unconditionally
noexcept
.
C
such that
overload resolution for direct-initializing
C
from an xvalue of
C
succeeds and finds a
nondeleted constructor and C
has
a nondeleted destructor, the declared relocation constructor will have
the more restrictive of access levels of those two functions.Throwing relocation constructors raise difficult specification problems. When a relocation constructor throws, some of the source object’s subobjects will have been destroyed already, and destroying the remaining subobjects might not be safe because the order of destruction in a relocation constructor is opposite to the usual order of destruction (i.e., in a destructor). Like throwing move constructors, throwing relocation constructors are likely to cause problems for authors of generic code. For these reasons, so we do not propose to allow throwing relocation constructors at this time.
The behavior of an (implicitly or explicitly) defaulted relocation constructor is described in the following list.
std::memcpy
. As with any other
relocation constructor, a trivial relocation constructor ends the
lifetime of the source object. However, a trivial relocation constructor
does not actually call the destructor for the source object, because the
notion of trivial relocatability is that performing a bitwise copy
followed by forgetting to destroy the old object has the desired
semantics.C
is a class
type described by the second bullet in the previous list:
If both the constructor selected for direct-initialization of
C
from an xvalue of
C
and the destructor of
C
are defaulted at their first
declaration and if all direct subobjects and virtual base class
subobjects of C
are relocatable,
the default relocation constructor for
C
performs a memberwise
relocation of C
’s direct
subobjects and virtual base class subobjects. (The rationale is that we
can assume that such a class does not need to be patched up after a
memberwise relocation; any such required patchups would have to be
performed by the constructor selected for move-construction, which would
therefore have to be user provided.)
Otherwise, the relocation constructor for
C
behaves as if it delegates to
the constructor of C
that would
be selected to perform a move, which results in the subsequent implicit
destruction of the source object when the
T~
parameter’s lifetime
ends:
(T~ source) : T(static_cast<T&&>(source)) {} T
An rlvalue of type cv1
T
can be implicitly converted to
an rlvalue of type cv2
T
if cv2 is more
cv-qualified than cv1.
An rlvalue of type T
can be
implicitly converted to
T&&
. This conversion
occurs automatically when the rlvalue expression is the left operand of
the .
or
.*
operator.
A prvalue of object type T
can be implicitly converted to an rlvalue of type
T
, which has the effect of
materializing a temporary that is owned by the resulting rlvalue. During
overload resolution, this conversion is considered better than binding
to an rvalue reference or const lvalue reference. For example:
void foo(T~ r); // 1
void foo(T&& r); // 2
int main() {
(T{});
foo}
A temporary of type T
is
materialized and converted to an rlvalue. Then
// 1
is called, and
r
is bound to the resulting
rlvalue. When the owning reference
r
is destroyed at the end of its
lifetime, it implicitly destroys the temporary object (unless
r
was disengaged prior to the
end of its lifetime). The binding of the rlvalue to the temporary object
extends the storage lifetime for the object in the same manner as the
binding of any other reference and suppresses the implicit destruction
of the temporary object at the end of the full-expression in which it
was created; the rlvalue has ownership and is responsible for destroying
the object.
There is no implicit conversion from
D~
to
B~
, where
B
is a base class of
D
. Allowing such a conversion
would allow the referenced object to be passed to
B
’s relocation constructor,
leaving the complete D
object in
a partially destroyed state. Since such a conversion is not permitted,
the implicit destructor call at the end of the lifetime of an engaged
owning reference does not perform dynamic dispatch.
A glvalue of type T
can be
explicitly converted to T~
by
static_cast
. This generates an
rlvalue referring to the object that the glvalue refers to, which
implies that this rlvalue will be responsible for destroying the object,
and the caller must ensure that they do not otherwise destroy the
object. Such casts should therefore generally be used only with objects
that have dynamic storage duration.
An rlvalue of type T
decays
to simply T
when deduced by
value. An owning reference behaves like any other reference when named
by a simple-capture. As when capturing a variable of object
type, the programmer must ensure that the lambda closure object does not
outlive the captured entity, lest the reference become dangling. A
lambda closure object cannot have an owning reference member, for
reasons that are discussed in Section 4.7.
struct T {
int m;
};
template <class U>
void g(U u);
void f1(T~ ref) {
(reloc ref);
g// Calls `g<T>`, not `g<T~>`;
// the parameter `u` is relocated from the object that `ref` refers to.
// `ref` is disengaged and the lifetime of the object `ref` refers to ends.
(reloc ref); // ill formed
g}
auto f2(T~ ref) {
auto result = [ref] { return ref.m; };
// The closure type has a member of type `T`, which is *copied* from the
// object that `ref` refers to.
(reloc ref); // OK; `ref` was not previously disengaged.
g
return result;
}
auto f3(T~ ref) {
auto result = [&ref] { return ref.m; };
(result); // OK
g
(reloc ref); // OK; `ref` was not previously disengaged.
g
return result; // UB; the reference is now dangling.
}
We propose to change the current model of automatic variables to
allow them to be relocated using the
reloc
operator. Automatic
variables that are not passed to the
reloc
operator will continue to
be implicitly destroyed upon scope exit, just as they always have been —
though that mechanism now becomes defined in terms of owning
references.
To accomplish these objectives, we propose that for each automatic
variable, x
, an implicit owning
reference (call it __x~
) is
considered to be declared immediately after the locus of
x
’s declaration in the same
scope. Immediately after its declaration,
__x~
is engaged and owns
x
.
x
is no longer inherently
implicitly destroyed when it goes out of scope, but since
__x~
owns
x
, it will destroy
x
upon scope exit, unless some
other owning reference takes over ownership of
x
first or an rlvalue referring
to x
has been passed to a
relocation constructor or otherwise disengaged.
__x~
cannot be named directly
but is needed to define the
reloc
operator (see below). An
id-expression naming x
is ill formed if __x~
is
disengaged.
Although one might occasionally want to construct a new object in the
storage location designated by x
and re-engage __x~
to that
object, we do not currently propose to allow
x
to be named for such purposes,
nor do we propose any method by which a disengaged owning reference can
be re-engaged, because of the complexity of specifying such a feature
and because the safety of doing so isn’t clear. See Appendix A for more
discussion.
struct T {
int m;
};
int main() {
= {0};
T x
T y;~ r = reloc x; // `__x~` is disengaged; `r` owns `x`.
T++x.m; // ill formed; `__x~` is disengaged.
++r.m; // OK; `r` is an lvalue.
// `r` goes out of scope and destroys `x`.
// `__y~` goes out of scope and destroys `y`.
// `y` goes out of scope; `~T()` is not implicitly called.
// `__x~` goes out of scope and does nothing since already disengaged.
// `x` goes out of scope; `~T()` is not implicitly called.
}
reloc
operatorThe reloc
operator is used to
obtain an rlvalue expression that owns a given entity and to disengage
the previous owner. For these purposes, it may be applied to the
following categories of id-expressions.
x
of type
T~
belonging to a block scope or
function parameter scope associated with the immediately enclosing
function definition, the result is an rlvalue referring to the object
that x
referred to;
x
is thereby disengaged.x
of object type belonging to a
block scope associated with the immediately enclosing function
definition, the result is
reloc __x~
.We do not propose allowing
reloc
to be applied to a
reference variable that is extending the lifetime of the temporary
object it is bound to, because the reference might be to a subobject of
the temporary object. See Section 4.3.
Because some ABIs require function parameters of object type to be
destroyed on the caller side, applying
reloc
to the names of such
parameters is not permitted in general; if the programmer wishes to
relocate from a function parameter, they should ensure that the function
parameter is declared with type
T~
rather than
T
. However, as an optional
add-on to Part I, we propose adopting an idea from [D2785]: If
T
is a relocate-only type (i.e.,
a type that has no eligible copy constructor and no eligible move
constructor but does have an eligible relocation constructor), then it
is permitted to relocate from a
T
function parameter (implying
that callee-destroy is required for such types, which currently do not
exist).
Because T~
is a reference
type, there shall be no pointers to
T~
, references to
T~
, or arrays of
T~
. Writing out a type such as
T~&
directly is ill formed.
However, owning references participate in reference collapsing.
These reference collapsing rules follow the “principle of lesser privilege” that currently governs the collapsing of lvalue and rvalue references. Owning references give the most privileges (the holder is permitted to destroy the object it refers to, possibly relocating its value to another object), followed by rvalue references (the holder is permitted to take ownership of the held resources, leaving the object in a moved-from state, but is not permitted to destroy the object), and lvalue references.
It follows that in the presence of owning references, forwarding
references should be spelled
“T~
”, where
T
is a template parameter of a
function that has a parameter of type
T~
. The template argument for
T
is then deduced as an lvalue
reference, rvalue reference, or nonreference when the function argument
is, respectively, an lvalue, xvalue, or rlvalue. (Since, as discussed in
Section 4.3, a prvalue of type U
prefers to be bound to U~
rather
than U&&
, using such a
prvalue as the function argument will also result in
T
being deduced as
U
.) A forwarding reference that
is spelled T&&
can bind
to an rlvalue of type U
but
cannot forward it as an rlvalue; the function parameter type will be
U&&
, not
U~
.
The issue of how to actually perform forwarding (which is typically
done using an expression of the form
std::forward<T>(r)
, static_cast<T&&>(r)
, or
static_cast<decltype(r)>(r)
in
current C++) is thorny. When r
is an owning reference, reloc
must be used so that the disengagement of
r
that must be performed at the
call site is visible to the compiler. However, it is essential to
support a single syntax that perfectly forwards
r
regardless of whether it is an
lvalue reference, rvalue reference, or owning reference; any alternative
that would force users to implement a compile-time switch to call
reloc
on forwarding references
of owning reference type — and an ordinary
static_cast
(or call to
std::forward
) in other cases —
is not workable. For this reason, we propose to resurrect the proposal
for a unary >>
forwarding
operator, which was described in [P0644R1] and rejected in Albuquerque
(November 2017). When applied to a forwarding reference that is an
owning reference, >>
would
be equivalent to reloc
, and when
applied to any other entity,
>>
would be equivalent to
a static_cast
as originally
proposed. A function template that needs to perfectly forward one or
more arguments would then take this form:
template <class T, class... Args>
<T> make_foo(Args~... args) {
fooreturn foo<T>(>> args...);
}
As an alternative to the
>>
forwarding operator, we
propose to adopt an idea from [D2785], wherein
reloc
can also be applied to
lvalue references and rvalue references, not only to the entities
described in the previous section. Using
reloc
as the forwarding
operator, the above function template could be written:
template <class T, class... Args>
<T> make_foo(Args~... args) {
fooreturn foo<T>(reloc args...);
}
The main disadvantage of
reloc
as the forwarding operator
is that it would use the same keyword for two essentially
distinct operators: an operator that disengages its operand to
allow the compiler to track who owns a particular object and an operator
that simply casts to lvalue reference or rvalue reference to facilitate
perfect forwarding. To mitigate this disadvantage, we propose that when
reloc
is applied to an lvalue or
rvalue reference, that operand shall be an owning reference. This
restriction would not completely eliminate the inelegance and possible
confusion arising from the use of
reloc
as the forwarding
operator. For this reason, we believe that the unary
>>
operator would provide
a better solution for perfect forwarding in the presence of owning
references.
A third option is to specify that
static_cast<T~>(r)
implicitly applies reloc
to
r
when
r
is a forwarding reference with
declared type T~
. The above
function template could then be written:
template <class T, class... Args>
<T> make_foo(Args~... args) {
fooreturn foo<T>(static_cast<Args~>(args)...);
}
This syntax is much more verbose than the
>>
and
reloc
syntaxes, and would likely
increase the popularity of FWD
macros. We consider this outcome undesirable. We also believe that it is
dangerous to allow the
static_cast
operator, which can
accept any expression as an operand, to implicitly disengage its operand
only when that operand has a very specific form. We do not propose this
syntax for forwarding, but include it only for completeness.
We discuss some alternative specifications for forwarding references in Appendix B.
A variable of owning reference type must have automatic storage duration. The purpose of this rule is to make it harder to accidentally create an owning reference that later becomes dangling. The rules we propose make it ill formed to reference an owning reference of automatic storage duration after it has become disengaged; there does not seem to be a similar strategy to prevent such unsafe accesses to owning references of static and dynamic storage duration. In the particular case of dynamic storage duration, there is a considerable risk that an owning reference attempts to destroy an object whose storage has already been released or reused (e.g., a variable of automatic storage duration whose block has already been exited).
Because owning reference variables are required to have automatic storage duration, they are not permitted as nonstatic data members. (The alternative — namely to make classes containing nonstatic data members of owning reference type ineligible to have any storage duration other than automatic storage duration — would create more problems than it solves.)
In addition, ~
is not
permitted as a ref-qualifier in a function declarator; no
variable could take ownership in such a case (considering that
this
is a pointer).
Explicit object parameters are permitted to have owning reference type. Note that calling a function with such an explicit object parameter will usually result in the implicit destruction of the object argument:
struct S {
/* ... */
void self_destruct(this S~ self);
};
S s;(reloc s).self_destruct();
We discuss an application for explicit object parameters of owning reference type in Part III.
Structured binding declarations are not permitted to have owning
reference type; they suffer from the same issue as
~
on an implicit object member
function: you can’t actually name the entity to which the
ref-qualifier applies (known as e in 9.6
[dcl.struct.bind]).
If all flow-of-control paths through a particular branch result in a jump that exits the scope to which an owning reference belongs, the other branches do not implicitly disengage the owning reference. The reason for this exception to the usual implicit disengagement rules is that the branch containing the jump cannot rejoin the other branches, so implicit disengagement is not required in the other branches to prevent a situation in which the owning reference may or may not be disengaged after such rejoining.
struct T {
void method();
};
void g(T);
(bool b) {
T f
T t;if (b) {
return reloc t;
} else {
.method(); // OK
t}
return reloc t; // OK
// `__t~` is disengaged and goes out of scope.
// `t` goes out of scope.
}
A jump construct is not permitted to jump from a point where an owning reference is disengaged to a point that follows the definition of the owning reference but precedes an id-expression naming the owning reference. An implicit jump from the end of a loop back to its beginning is considered to occur.
int g(int~ r);
int f1(int~ r) {
while (true) {
(reloc r); // ill formed
g}
}
int f2(int~ r) {
while (true) {
int x = 0;
(reloc x); // OK
gif (rand() % 8 == 0) {
(reloc r);
greturn; // OK
}
}
}
When an rlvalue expression is evaluated and is not otherwise disengaged by the end of the containing full-expression, the rlvalue is implicitly disengaged as part of the last step in evaluating the full-expression; the timing of this implicit disengagement is the same as the timing of the implicit destructor call for a hypothetical temporary object that was created at the point at which the rlvalue expression was evaluated. This implicit disengagement can occur, for example, when the rlvalue expression is a discarded-value expression or when it is converted to an xvalue instead of being used to initialize an owning reference variable.
void f(T~ ref) {
{}, (reloc ref), V{};
U// `V` object destroyed, then `ref.~T()` called, and then `U` object destroyed.
}
If an evaluation that disengages an owning reference variable is indeterminately sequenced or unsequenced relative to another evaluation in the same full-expression that names the owning reference variable (where an id-expression naming an automatic variable is considered to name its implicit owning reference for the purpose of this rule), the program is ill formed because we have no guarantee that the latter occurs before the former.
struct S {
(int x, int~ r);
S};
void bar(int~ r) {
(r, reloc r); // Ill formed; `reloc r` may occur before the copy.
S s1{r, reloc r}; // OK; `x` is copied from `r`, and then `reloc r` is evaluated.
S s2}
See Part II for an example in which this rule must be carefully understood.
Because the evaluation of a ternary conditional expression entails
control flow, it performs implicit disengagement in the same manner as
an if
statement:
int bar(int~);
void foo(bool b) {
int x = 0;
int y = b ? bar(reloc x) : x;
// OK; if `b` is false, `__x~` is implicitly disengaged after the third
// operand is evaluated.
int z = x; // Ill formed; `__x~` is disengaged.
}
Some of this section’s subsections propose new library facilities. The library facilities that will be proposed in a future revision of this paper should Part I move forward are not exhaustively enumerated herein.
return
statementsThe return x;
statements that
currently implicitly move will behave as if by
return reloc x;
instead. Note
that if no relocation constructor is available, the prvalue of
T~
will implicitly convert to an
rvalue of T
, so the move
constructor will be selected. The behavior of returning an object whose
type does not have a relocation constructor (or whose type has a
defaulted relocation constructor that is defined as deleted) will
therefore be unchanged by this rule.
If x
is an
id-expression (possibly parenthesized) naming an automatic
variable of type T~
belonging to
a block scope or function parameter scope associated with the
immediately enclosing function definition, the expression
x.~T()
destroys the referenced
object and disengages x
. (This
effect can also be achieved by evaluating
reloc x
in a discarded-value
expression context, but the pseudo-destructor syntax is more
evocative.)
std::force_relocate
functionWe propose that a library function,
std::force_relocate
, shall be
provided by <utility>
:
template <class T>
constexpr T~ force_relocate(T&& r) {
return static_cast<T~>(r);
}
The std::force_relocate
function can be used by, e.g., a
std::vector
-like container when
reallocating. Let’s look at an example of how such reallocation can be
performed. The reallocation does not suppress any implicit destructor
call that would occur for its argument; the caller must remember not to
destroy the source object separately.
template <class T>
void my_vector<T>::reallocate(size_type new_capacity) {
* new_buf = std::allocator_traits<Alloc>::allocate(alloc_, new_capacity);
Tfor (size_type i = 0; i < size_; i++) {
::new (static_cast<void*>(new_buf + i)) T(std::force_relocate(buf_[i]));
}
= new_capacity;
capacity_ = new_buf;
buf_ }
(Factory functions such as std::allocator_traits<Alloc>::construct
should be updated to accept a pack of the new forwarding reference,
Args~...
. We have not yet
enumerated all Standard Library function templates to which this change
should be made. After the Standard Library function templates are
updated with this change, the above placement-new expression should be
replaced by a call to std::allocator_traits<Alloc>::construct
.)
std::relocate_ptr
smart
pointerWe propose a smart pointer type that is similar to
std::unique_ptr
but can be only
relocated (not moved). Like
std::unique_ptr
, the smart
pointer type guarantees that the deleter it holds will eventually be
called to release the resources owned by the raw pointer it owns.
However, while a std::unique_ptr
can be accidentally dereferenced after it has been moved from (and
become null), a
std::relocate_ptr
cannot be
accessed in any way after it has been relocated from and omits the
release
and
reset
functions that can be used
to change its value to null. We expect that
std::relocate_ptr
can be used in
place of std::unique_ptr
in most
situations where std::unique_ptr
is currently used, leading to safer code.
std::disengage
functionWe propose a library function,
std::disengage
:
template <class T>
constexpr void disengage(T~) requires is_object_v<T>;
Calling disengage
(unsurprisingly) disengages the rlvalue argument and ends the lifetime
of the object to which it refers, without calling any
destructors or relocation constructors. (Therefore, the effect of
calling disengage
is different
from that of an implicit disengagement that occurs when
reloc
is applied to an owning
reference or when the compiler inserts a disengagement along some
branches of control flow; such implicit disengagements always call the
destructor.)
#include <utility>
struct T {
int m;
};
int main() {
{1};
T x& r = x;
T::disengage(x); // OK; `x.~T()` not called.
stdint y = x.m; // Ill formed; `x` is disengaged.
int z = r.m; // UB; dangling reference
}
Not particularly useful in Part I, the effect of
std::disengage
is purely to end
the lifetime of the object to which the rlvalue refers. This effect can
be easily misused to subvert RAII but may be useful in user-provided
relocation constructors; see Part II.
A user cannot implement
std::disengage
because it
behaves as if it stashes away an owning reference in some place where
the latter can live until the program terminates, which is not possible
in user code since all owning references have automatic storage
duration.
If Part I of this proposal is adopted, we expect that the vast majority of relocatable types will be trivially relocatable, and for the vast majority of nontrivially relocatable types, the defaulted relocation constructor (which will move then destroy) will do the right thing, because the necessary nontrivial work will already have been done when writing the move constructor and destructor. However, as relocate-only types become more common, so will class types that cannot be moved because they contain relocate-only subobjects. In some cases, patch-ups will need to be performed after memberwise relocation of these types, and since such types cannot be given a move constructor that performs the patch-ups, users must be able to write their own relocation constructor. In other words, if Part I is adopted without provisions to enable users to provide their own relocation constructors, relocation in C++ will become a victim of its own success. However, we propose Part II separately from Part I because specifying the semantics of user-provided relocation constructors involves additional complexities with less clear-cut solutions.
When users are allowed to write their own relocation constructors, the source object must not be implicitly destroyed, since the relocation operation takes the place of destruction. Therefore, the relocation constructor must ensure that each subobject of the source object is either relocated from or destroyed to avoid leaks. For usability and safety, we must ensure that destruction occurs automatically for each source subobject that is not relocated (i.e., the burden should not be on the user to remember to destroy them). (A relocation constructor thus offers the same guarantee with respect to its source object as a destructor, except it destroys the subobjects in the opposite order.) If the implicit destruction of subobjects that were not relocated does not occur in the ctor-initializer, then the body of the relocation constructor will see a source object that is partially alive. This situation is likely to result in unsafe code. The desire to prevent this situation leads to the conclusion that implicit destruction should occur in the ctor-initializer.
For the compiler to know which source subobjects to implicitly
destroy, there must be a mechanism for the compiler to know which
destination subobjects will be constructed by relocation from the
corresponding source subobjects. The [D2785] approach in this area is for
relocation constructors to implicitly relocate each destination
subobject from the corresponding source subobject unless the subobject
is explicitly named in the ctor-initializer. However, since we
have owning references in our proposal, we can support more general
constructors that do not have the exact signature
T::T(T~)
, and such constructors
can have additional parameters as well. This raises the question of
which such constructors should receive this implicit relocation
treatment. We explain a use case for such extended relocation
constructors below.
We propose the reloc
specifier (distinct from the
reloc
operator that was
introduced in Part I) that may be applied only to a parameter of a
constructor for type T
, where
the parameter must have type T~
and at most one parameter may have this specifier. The use of this
specifier marks the corresponding parameter to have its subobjects
implicitly relocated to the destination subobject unless overridden by
the ctor-initializer. The
reloc
specifier is an
implementation detail of the definition of the constructor and is not
part of the constructor’s signature. We recommend that it be omitted on
nondefining declarations of a constructor. The
reloc
specifier also tells the
compiler to implicitly call
std::disengage
on the owning
reference parameter when the ctor-initializer is left (either
because it has completed or because it was interrupted by an exception),
unless the constructor is a delegating constructor. When the
ctor-initializer completes normally, this implicit
disengagement is necessary because after the ctor-initializer
runs, each subobject of the T
object owned by the owning reference parameter will be either relocated
or destroyed; for the owning reference to also remain engaged and to be
destroyed at the end of the destructor, thus double-destroying the
object it would otherwise continue to own, would make no sense. When the
ctor-initializer is interrupted by an exception, implicit
disengagement is needed to ensure that the subobjects of the source
object that have already been relocated from or implicitly destroyed are
not destroyed a second time, since the entire source object’s destructor
would be called if the owning reference were not disengaged first.
The reloc
specifier is not
permitted in a delegating constructor because its semantics of
performing memberwise relocation and destruction do not make sense for a
constructor that does not itself initialize any subobjects.
Note that if the programmer does not mark a parameter
reloc
and instead attempts to
manually relocate from one of its subobjects in the
ctor-initializer, the compiler will tell the programmer that
the reloc
operator can be
applied to only an id-expression, not to the class member
access or cast expression that they would need to write to reference the
subobject they are trying to relocate from. We recommend that
implementations try to provide a helpful diagnostic in such cases:
struct S {
T d_foo;(S~ other) : d_foo(reloc other.d_foo) {
S// ^^^^^^^^^^^^^^^^^
// Possible error message:
// "The `reloc` operator may only be applied to the name
// of a variable; to relocate from subobjects of `other`,
// declare `other` with the `reloc` specifier".
::cout << "S(S~)\n";
std}
};
A possible extension (that we are not currently proposing) is to
extend the reloc
specifier
(possibly spelled differently) to other kinds of parameters (typically
for copy and move constructors), with the same meaning of “use this
parameter as the default source for initialization of subobjects” (i.e.,
by copy or move depending on the parameter type). The implicit
disengagement would still apply to only owning references.
For the same reasons that we propose that relocation constructors be
implicitly noexcept
when
implicitly declared or when explicitly defaulted on their first
declaration (see Part I), we also propose that every constructor having
a parameter that is declared
reloc
be implicitly
noexcept
and that it be a
diagnosable error if such a constructor is declared
noexcept(false)
.
A small vector is a class template that provides inline
storage for up to N
objects of
type T
, where
N
is a template parameter and
either employs dynamic memory allocation when the user attempts to store
more than N
objects or causes
the operation to fail (e.g., by throwing an exception or terminating the
program).
If Part I of this proposal is accepted, library authors might
implement a small_vector
template that supports relocate-only types. Such a
small_vector
would itself be
relocate-only. Because a
small_vector
stores its elements
inline, the relocation of a
small_vector
object invalidates
all iterators into that object.
Consider now a struct S
that
holds both a
small_vector<T>
(where
T
is a relocate-only type) and
an iterator into that
small_vector<T>
.
S
will not be trivially
relocatable, since the iterator member must be patched up during
relocation, nor will S
have a
usable defaulted relocation constructor, since it is not movable (see
Section 4.2). The author of S
must implement a relocation constructor:
struct S {
<T> d_v;
small_vector<T>::iterator d_it;
small_vector
(const S&) = delete;
S& operator=(const S&) = delete;
S
(S~ src);
S};
To relocate both d_v
and
d_it
correctly to the
destination object, the relocation constructor must compute the value
d_it - d_v.begin()
(call it
idx
) for the source object, and
then initialize the destination’s
d_it
member with
idx + d_v.begin()
. Because
d_v
will be implicitly relocated
by the ctor-initializer, the computation of
idx
cannot be deferred to the
compound-statement of the relocation constructor. It follows
that S::S(S~)
needs to compute
idx
and then immediately
delegate to another constructor that actually relocates
d_v
:
struct S {
// other members previously described...
::S(size_t idx, reloc S~ src)
S: d_it(d_v.begin() + idx) {}
::S(S~ src) : S{src.d_it - src.d_v.begin(), reloc src} {}
S}
A number of features of the above implementation are noteworthy:
src
from being accessed after
relocation (and would therefore be ill formed).reloc
specifier is
omitted for the two-parameter constructor, the compiler will attempt to
default-initialize d_v
and then
destroy src.d_v
at the end of
the ctor-initializer, which is unlikely to be the semantics the
programmer desired.reloc
specifier, the construction of the
d_v
member by relocation from
src.d_v
is implicit. The
explicit mem-initializer for
d_it
overrides the implicit
relocation that would otherwise occur for
d_it
.We believe that such subtleties make user-provided relocation
constructors an expert-only feature, and even experts are likely to err.
The additional features that we propose in Part III will simplify the
implementation of S
but at the
cost of further complexity in the language specification.
The example given in Part II for a class containing a small vector of
a relocate-only type can be rewritten much more simply if we introduce a
feature that allows the construction of bases and members of the
destination object to be deferred until some point in the
compound-statement of its constructor. We propose that such
deferred construction be performed by a new kind of statement called a
delayed-ctor-initializer, consisting of
this :
and followed by a list of
mem-initializers and terminated by a semicolon:
struct S {
<T> d_v;
small_vector<T>::iterator d_it;
small_vector
(const S&) = delete;
S& operator=(const S&) = delete;
S
::S(reloc S~ src) {
Sconst size_t idx = src.d_it - src.d_v.begin();
this : d_it(d_v.begin() + idx);
}
}
When the definition of a constructor contains a delayed-ctor-initializer, it shall not contain a ctor-initializer and shall not implicitly initialize bases and members prior to the constructor’s compound-statement.
Since C++11, no compelling need for delayed-ctor-initializers in the language has arisen because delegating constructors can be employed as an alternative. We believe that the example from Part II, with its various gotchas, demonstrates a case in which delegation is particularly difficult to use correctly and difficult to read when used correctly due to the interaction of delegation with owning references. Thus, we propose delayed-ctor-initializers as part of this paper. Note that we propose to allow delayed-ctor-initializers in all constructors, not just relocation constructors. We believe that judicious use of delayed-ctor-initializers can result in less error-prone implementation of move constructors. (When programmers introduce a bug related to evaluation order in delegating move constructors, they will not receive a compile-time diagnostic as they would for a relocation constructor like the one discussed in Part II, so although delayed-ctor-initializers are important in supporting user-defined relocation constructors, delayed-ctor-initializers could end up being used more widely in move constructors than relocation constructors.)
To define a constructor in which control flow can pass through more
than one delayed-ctor-initializer shall be a diagnosable error.
To be more specific, suppose a hypothetical owning reference variable
named __r
were declared at the
very beginning of the constructor’s compound-statement, and
each delayed-ctor-initializer were replaced by
reloc __r;
. The constructor is
ill formed if the transformed version would be ill formed due to
potentially referencing __r
when
__r
is disengaged. Any implicit
disengagement of __r
and
destruction of its referent that would occur due to the rules about
owning references will instead result in the implicit execution of a
delayed-ctor-initializer of the form
this : ;
.
The implicit relocation and disengagement semantics provided by the
reloc
specifier might not always
be desired. In some cases, the programmer might wish for more explicit
control. Consider, for example, an allocator-extended
relocation constructor that does not use the allocator from the source
object but instead uses an allocator supplied by the caller. That
allocator must then be passed down to the allocator-extended relocation
constructors of any subobjects that use allocators:
using allocator_type = ...;
struct S {
allocator_type d_alloc;
(S~ src);
S(allocator_type alloc, S~ src);
S};
struct T {
allocator_type d_alloc;
S d_s;
(T~ src);
T(allocator_type alloc, T~ src);
T};
Implementing T
’s
allocator-extended relocation constructor using the tools provided by
Part II is not possible because that constructor needs some way to
execute a mem-initializer resembling
d_s(alloc, reloc src.d_s)
, but
src.d_s
isn’t an
id-expression, so under the rules in Parts I and II,
reloc src.d_s
is ill formed. As
we explained in Part II, such constructs are not permitted because they
make it impossible, in general, for the compiler to know which
subobjects of the source object must be prevented from being destroyed a
second time.
To allow this allocator-extended relocation constructor to be
implemented, we need to specify the meaning of
reloc src.d_s
. The intuition
behind the semantics of such an expression is that
reloc
acts upon an owning
reference with a known declaration, so
src.d_s
must behave as if it
names an owning reference variable, even though
src.d_s
is not one (nor could it
be, since owning references are not permitted as members). Furthermore,
if owning references are to exist to the subobjects of
src
, then at the point where
such owning references exist, there must not be an owning reference to
the complete object that is still planning on destroying it, lest the
subobjects of src
be destroyed
twice.
We must therefore have an operation that acts upon the owning
reference src
, such that after
this operation has been executed,
src
still refers to the same
object as it did before, and referring to
src
is still well formed, but
src
no longer intends to destroy
the object to which it refers. Such an owning reference is said to be
under destruction. We chose this name because the state of an
owning reference that is under destruction parallels the state of an
object whose destructor has begun execution (namely, its base and member
subobjects remain to be destroyed). The current “placeholder” syntax for
this operation is
reloc_begin_destruction src
.
(The identifier
reloc_begin_destruction
seems
unlikely to be have been used in real code but is unappealing; we hope
to propose a better syntax eventually.)
The allocator-extended relocation constructor described at the beginning of this section can be implemented easily:
struct T {
allocator_type d_alloc;
S d_s;
(T~ src);
T(allocator_type alloc, T~ src) {
T
reloc_begin_destruction src;this : d_alloc(alloc),
(alloc, reloc src.d_s);
d_s
// `src.d_s` goes out of scope and was already disengaged.
// `src.d_alloc` goes out of scope and is destroyed.
}
};
The small vector example from the previous section can be rewritten
so that it employs explicit relocation instead of the
reloc
specifier:
struct S {
<T> d_v;
small_vector<T>::iterator d_it;
small_vector
(const S&) = delete;
S& operator=(const S&) = delete;
S
::S(S~ src) {
Sconst size_t idx = src.d_it - src.d_v.begin();
reloc_begin_destruction src;this : d_v(reloc src.d_v),
(d_v.begin() + idx);
d_it
// `src.d_it` goes out of scope and is destroyed.
// `src.d_v` goes out of scope and was already disengaged.
}
}
Control flow that potentially calls
reloc_begin_destruction
twice on
the same owning reference is ill formed. Essentially,
reloc_begin_destruction r
is
permitted only if replacing all such evaluations for a given
r
with
reloc r
would not result in any
such invented reloc r
violating
the rules on use of owning references after disengagement. Implicit
calls to reloc_begin_destruction
are inserted as necessary (in a manner similar to implicit
disengagements) to ensure that whether an owning reference is under
destruction at a particular point is statically known.
reloc_begin_destruction
is
permitted outside of a constructor but only if every nonstatic data
member, direct base class, and virtual base class of the operand would
be accessible at the point where
reloc_begin_destruction
occurs.
reloc_begin_destruction src
must not only place src.d_s
under destruction, but also must initialize the subordinate owning
references src.d_alloc
and
src.d_s
. In general, there is
one such subordinate owning reference for each direct nonstatic data
member of object type and direct base class and one for each virtual
base class if src
is not itself
a subordinate owning reference (i.e., it refers to a complete object).
These subordinate owning references are declared in the same order in
which the subobjects would be constructed so that when the subordinate
owning references go out of scope, they destroy the corresponding
subobjects in the same order in which the destructor of the complete
object would destroy them.
At a particular point in the constructor, if
src
is under destruction, then a
member access expression naming a member of
src
, whose left operand is the
id-expression src
(possibly parenthesized), is instead considered to name the subordinate
owning reference corresponding to the named member. If
src
has a direct base class of
type B
, the syntax
static_cast<B&>(src)
names the subordinate owning reference corresponding to that base class
subobject. (The syntax is not
static_cast<B~>(src)
because the result is an lvalue; this is consistent with the idea that
the expression names an owning reference variable and that the
name of an owning reference variable is always an lvalue referring to
the owned object.)
struct S2 {
::string d_s1;
std::string d_s2;
std::string d_s3 = "not used for this object yet";
std
(S2~ src) {
S2
reloc_begin_destruction src;// declares subordinate references to `src.d_s1`, `src.d_s2`, and
// `src.d_s3`, in that order;
// `src` is now under destruction
this : d_s1(reloc src.d_s1)
// disengages subordinate reference to `src.d_s1`
(reloc src.d_s2)
, d_s2// disengages subordinate reference to `src.d_s2`
;// initializes `d_s3` using its default member initializer
::cout << src.d_s1.size();
std// Ill formed: src.d_s1 names subordinate reference,
// which is disengaged.
::cout << src.d_s3.size();
std// OK
// Subordinate reference to `src.d_s3` goes out of scope and
// destroys `src.d_s3`.
// Subordinate reference to `src.d_s2` goes out of scope and is
// already disengaged.
// Subordinate reference to `src.d_s1` goes out of scope and is
// already disengaged.
// `src` goes out of scope, but it is under destruction so it does
// not call `src.~S()`.
}
};
The meaning of src.d_s
depends on whether src
is under
destruction, so performing a member access through
src
at a point where control
flow is ambiguous as to whether
reloc_begin_destruction src
has
been evaluated is ill formed.
When a constructor containing a delayed-ctor-initializer
also has a parameter that bears the
reloc
specifier, the implicit
relocation and destruction semantics of the
reloc
specifier do not go into
effect until the delayed-ctor-initializer is executed.
struct S1 {
T d_foo;
T d_bar;
(S1~ source) {
S1if (rand() % 2) {
reloc_begin_destruction source;this : d_foo(source.d_foo),
(0);
d_bar// `source.d_foo` may no longer be referenced;
// `source.d_bar` may still be referenced.
}
// implicitly:
// else {
// reloc_begin_destruction source;
// this : ;
// }
}
};
struct S2 {
T d_foo;
T d_bar;
(reloc S2~ source) {
S2if (rand() % 2) {
this : d_bar(0);
// `d_foo` is implicitly initialized by relocation;
// `source.d_foo` is destroyed by `T`'s relocation constructor
// while `source.d_bar` is implicitly destroyed;
// `std::disengage(reloc source)` is called implicitly.
}
// implicitly:
// else {
// this : ;
// }
}
}
Evaluating
reloc_begin_destruction
for an
owning reference parameter that is declared with the
reloc
specifier is ill formed.
(The reloc
specifier, described
in Part II, implicitly performs a function that is very similar to
reloc_begin_destruction
prior to
entering the ctor-initializer. However, note that the implicit
destruction semantics afforded by the
reloc
specifier will destroy
subobjects of the source object in the opposite order from the source
object’s destructor, while the subordinate references declared by
reloc_begin_destruction
will go
out of scope in the same order as their corresponding subobjects would
be destroyed by the source object’s destructor.)
We propose that the syntax
~T(this T~ self)
be permitted
for declaring a destructor. Note that destructors are currently not
permitted to use explicit object parameter syntax; e.g.,
~T(this T& self)
is not
permitted. We propose to permit a destructor to use explicit object
parameter syntax solely in the case where the parameter is an owning
reference to the class type to which the destructor belongs.
In a destructor so declared,
reloc_begin_destruction self
is
implicitly executed at the beginning of the destructor’s
compound-statement, and an id-expression naming a
direct member of the destructor’s class implicitly names a subordinate
owning reference. That is, if m
is a direct member of the destructor’s class, either
m
or
self.m
can be used to name the
subordinate owning reference. The syntax static_cast<Base&>(self)
must be used to name the subordinate owning reference corresponding to a
direct base class subobject of type
Base
.
The motivation for such destructor declarations is to permit a destructor to return a resource to a pool by relocation, where that resource is owned by a member of the destructor’s class:
struct S {
<Resource> d_resource;
relocate_ptr* d_pool;
ResourcePool::string d_name;
std
~S(this S~ self) {
// `reloc_begin_destruction self` is executed implicitly.
->return(reloc d_resource);
d_pool// `d_resource` is equivalent to `self.d_resource`.
// `d_pool` takes ownership of subordinate owning reference.
// Subordinate owning reference corresponding to `d_name` goes out of scope
// and destroys `d_name`.
// Subordinate owning reference corresponding to `d_pool` goes out of scope.
// Subordinate owning reference corresponding to `d_resource` is already disengaged.
// `self` is under destruction, so it does not attempt to re-destroy the
// object to which it refers.
}
};
Our reason for proposing to make this feature available only in the
presence of an explicit object parameter is to avoid giving special
meaning to the expression *this
,
which would need to be evaluated to name a subordinate owning reference
to a base class subobject — i.e., as part of the expression static_cast<Base&>(*this)
.
Since *this
is not an
id-expression, for
*this
(but not a more complex
expression) to be usable for naming subordinate owning references would
be counterintuitive.
All destructors that have such an explicit object parameter of owning
reference type are considered prospective (just like destructors with no
parameters) until the end of the class definition. Overload resolution
is then performed among all destructors that have an explicit object
parameter to select the one that is the most constrained. If both a
selected destructor with no parameters and a selected destructor with an
explicit owning reference parameter are present, the class definition is
ill formed. If the selected destructor has an explicit owning reference
parameter, any (explicit or implicit) call to that destructor implicitly
applies reloc
, as necessary, to
initialize the destructor’s parameter.
WG21 members have created many proposals for relocation in C++. We will first discuss the other known proposals for nontrivial relocation and explain why we are proposing the introduction of owning references while the other nontrivial relocation proposals make do without them. Afterward, we will discuss the interaction of our proposal with the two most recent trivial relocation proposals, which are known to be actively pursued by their authors.
[D2785] is most similar to our proposal
and introduces no additional types but does introduce a new kind of
prvalue obtained from relocating a glvalue: a prvalue that already has
storage backing it (as opposed to one that will construct an object into
the storage determined by the context). In effect, D2785 also proposes a
new value category but does not propose a generalized vocabulary for
manipulating expressions of this value category. The type
T~
in our proposal is a
reification of the fourth value category, just as
T&&
is a reification of
the xvalue category that was introduced in C++11.
We believe that the introduction of the rlvalue category — and of owning reference types that may bind to them — results in a conceptually simpler model than the model of D2785.
Owning references also provide practical benefits. Because an owning reference can be perfectly forwarded with only the runtime cost of copying a pointer and the actual relocation only occurs at the end of this process, our approach never requires intermediate relocations when multiple function calls intervene between the scope in which a source object is declared and the scope in which a destination object is constructed by relocation from that source object:
void consume(T t);
void logAndConsume(T~ r) {
::cout << "Eating: " << &r << std::endl;
std(reloc r); // calls relocation constructor
consume}
void f() {
T src;(reloc src);
logAndConsume}
Functions with T~
parameters
in our approach are expected to be declared with parameters of type
T
in the D2785 approach:
void consume(T t);
void logAndConsume(T r) {
::cout << "Eating: " << &r << std::endl;
std(reloc r); // calls relocation constructor
consume}
void f() {
T src;(reloc src); // may call relocation constructor
logAndConsume}
In the above snippet, the creation of a new
T
object named
r
when calling
logAndConsume
can be elided if
that function is inlined or if that function is given an ABI in which
the T
parameter is implicitly
passed by reference, which is not possible in general. (The
implementation decision to use such an ABI would necessarily affect
all functions with the same signature as
logAndConsume
.) Giving users the
ability to explicitly declare parameters to have type
T~
gives them a way to select
which ABI they want and also avoids the issue in the D2785 approach
wherein the value of &r
depends on whether r
has been
elided by the implementation.
The fundamental relocation operation in [N4158] is a call to a customization
point called
uninitialized_destructive_move
,
which takes two pointer arguments called
from
and
to
and constructs an object at
to
having the value held by
*from
while also ending the
lifetime of *from
.
A pure library facility such as that proposed by N4158 cannot be used
to relocate automatic variables because the call to
uninitialized_destructive_move
does not suppress the implicit destructor call when the variable goes
out of scope. Therefore, the N4158 approach is necessarily
pointer-based, while our approach is value-based and enables a natural
coding style where objects that are to be relocated can be declared as
local variables of object type.
In N4158, a programmer can avoid heap allocation for objects that are to be relocated by constructing them into a stack buffer but must then ensure that if the relocation actually occurs, the object is not thereafter accessed, and that if the relocation does not occur, the object’s destructor is eventually called to release any resources owned by the object. In our approach, by declaring the object as an automatic variable, the necessary guarantees are provided by the compiler. Our approach is therefore safer than N4158 because, by making most such accesses ill formed, it prevents accidental access to objects that might have been relocated from. (However, we are not proposing a borrow checker for C++; an lvalue reference to a local variable that is then relocated from can still be used to attempt to perform an access of that variable, resulting in undefined behavior.)
The N4158 approach encourages the programmer to pass the source
object by pointer until the point at which the relocation will actually
occur; doing so avoids unnecessary intermediate relocations, but the
intent to relocate cannot be perfectly forwarded when the
source pointer is passed, since it will appear to a factory function as
just a pointer, and the factory function will pass that pointer to a
constructor rather than calling
uninitialized_destructive_move
.
In our approach, rlvalues can be perfectly forwarded by functions that
have a forwarding reference parameter spelled
T~
, and no additional machinery
is required.
The
uninitialized_destructive_move
function proposed by N4158 could be implemented as follows under our
proposal:
template <class T>
void uninitialized_destructive_move(T* from, T* to) {
::new (static_cast<void*>(to)) T(static_cast<T~>(*from));
}
Note that customization of the functionality of the implementation
shown above would be accomplished by customizing the relocation
constructor of T
, not by
declaring an overload. Also note that because our proposal specifies a
move-and-destroy fallback behavior for defaulted move constructors, we
need not explicitly specify such fallback behavior for the std::uninitialized_destructive_move
function template.
[P0023R0] uses the syntax
new (dest) >>T(*src)
to
construct a T
object at
dest
having the value held by
*src
. The actual relocation is
performed by a function called a relocator, introduced in the
scope of T
by a declarator of
the form >>T(T& src)
.
Despite looking very different from N4158, P0023 has similar
limitations; it cannot be used to safely relocate objects with automatic
storage duration, does not prevent use-after-relocation, and does not
provide a facility for perfectly forwarding the intent to evaluate
new (dest) >>T(*src)
instead of new (dest) T(src)
,
where src
is an argument of
pointer type.
[P1144R7] and [P2786R0] are similar proposals.
std::relocate
and
std::relocate_at
, that take a
pair of pointers to T
and either
perform trivial relocation (when
T
is trivially relocatable) or
destroy the source object after move-constructing the destination
object. Such library facilities can be used to speed up operations that
are currently expressed exclusively in terms of move plus destroy, e.g.,
the operation of relocating elements of
std::vector<T>
when the
vector’s capacity is increased.The definition of the category of implicitly trivially relocatable types differs slightly between P1144R7 and P2786R0. We do not express an opinion on which definition should be chosen; debate in EWG should resolve this question, and the authors of P1144R7 and P2786R0 are well positioned to argue their respective cases. The same is true for the syntax and semantics of explicitly declaring a class type to be trivially relocatable, which also differ between P1144R7 and P2786R0. Our proposal can build upon the trivial relocatability machinery of either P1144R7 or P2786R0. (Clearly, a class with a user-provided relocation constructor — see Part II — will not be trivially relocatable and a diagnostic should be required if the programmer attempts to declare the class to be trivially relocatable.)
Because P1144R7 and P2786R0 both employ pointer-based approaches to
performing relocation, they suffer from the same limitations as N4158
and P0023 with respect to automatic variables. Nevertheless, the use of
such pointer-based interfaces for relocating objects of dynamic storage
duration is compatible with our proposal. For example, if P1144R7 is
accepted, our proposal will be to modify the specification of
std::relocate_at
so that when
T
is not trivially relocatable
but has a usable relocation constructor, that constructor will be called
in preference to performing a move-and-destroy operation.
[P1029R3] proposed a pure core language
extension to define a move constructor as performing a bitwise copy from
source to destination, followed by resetting the source object by
copying into it the bit pattern that would be produced by its default
constructor (which is required to be
constexpr
).
P1029R3 was made deliberately minimal to have the best possible chance of being adopted into C++23, and its author is no longer pursuing it. In particular, P1029R3 offers no form of nontrivial relocation.
Our proposal offers the semantics of a P1029R3-style relocation:
template <class T, int = (T(), 0)>
void P1029R3_relocate(T* from, T* to)
requires std::is_trivially_relocatable_v<T> {
static constexpr T zero;
::memcpy(to, from, sizeof(T));
std::memcpy(from, &zero, sizeof(T));
std}
x.~T()
, where
x
is an automatic variable of
type T
, can also disengage
__x~
. This rule would make
naming x
after this line ill
formed, unless and until a placement new expression is used to recreate
x
. This rule would also break
some existing code but potentially increase safety by preventing access
to a variable whose lifetime has ended. Specifying this rule would be
painful: We would need to define the class of expressions we consider to
be placement new expressions for the purposes of this rule, and we would
also need to introduce another set of control flow rules. (The control
flow rules for Part I assume that owning references can be disengaged
along some paths of control flow; here, we would need rules that account
for owning references being able to become re-engaged.)reloc
to be applied
to function parameters.We considered three possible approaches to perfectly forwarding owning references. We propose the first approach below but are open to polling to determine the best choice.
T~
syntaxThe approach we propose in this paper is that a function parameter
whose declared type is T~
, where
T
is the name of a template
parameter of the function, is a forwarding reference. The main advantage
of this syntax is its consistency with the reference collapsing rules in
the same way as the current forwarding reference syntax,
T&&
. Like
T&&
, the syntax
T~
requires only the addition of
a special template argument deduction rule to ensure that
T
is deduced as a type that will
give the appropriate reference type after the collapsing rules are
applied to T~
.
The T~
syntax has two
disadvantages. The first is that all function templates that currently
perform perfect forwarding using
T&&
, including Standard
Library function templates, would need to be updated to accept
T~
; otherwise, they would
forward rlvalues as xvalues, not as rlvalues. The second is that when a
programmer wants to write a function template that accepts rlvalues,
not glvalues, of any type and deduces that type, a constraint
must be introduced into the declaration, as we have done in our proposed
declaration of std::disengage
.
This annoyance very rarely arises in the context of
T&&
forwarding
references, because few situations arise where a function template must
accept only rvalues but doesn’t care about the types of those
rvalues. We anticipate that this annoyance will occur much more
frequently if the T~
syntax for
forwarding references is adopted.
T&&
syntaxTo enable all function templates that currently accept forwarding
references to perfectly forward rlvalues without any changes to their
signatures, EWG could adopt an approach in which
T&&
gains the ability to
perfectly forward rlvalues. However, all such approaches known to the
authors have considerable disadvantages. Furthermore, adopting such an
approach will not enable such function templates to
automatically begin supporting rlvalue forwarding; while the
signatures of such functions would not need to change, the function
implementations could not effect rlvalue forwarding using the syntax
std::forward<T>(t)
, since
such a function call would never be able to disengage
t
and transfer ownership to the
result of the function call.
One possible approach that would allow the
T&&
syntax to perfectly
forward rlvalues is to specify that
T~&&
collapses to
T~
, not
T&&
. Unfortunately, this
violates the principle of lesser privilege discussed in the Summary of
Part I. We do not fully understand the practical implications of such a
counterintuitive reference collapsing rule. Standardizing this rule
might not be catastrophic for the safety of the language, because a
variable whose type is spelled
T&&
and turns out to be
an owning reference is unlikely to be destroyed unintentionally; the
reloc
operator must be used to
transfer ownership to another owning reference. However, having
T~&&
collapse to
T~
would interfere with the
declarations of the std::get
function templates for
std::tuple
. When
std::get<T~>
is called on
an rvalue of type std::tuple
,
one of whose element types is
T~
, the declared return type is
U&&
, where
U
is
T~
. If
T~&&
is
T~
, then the result of the call
is an rlvalue, which it should not be, since the only way to return an
rlvalue would be to leave the tuple in a partially relocated state. This
issue with std::get
was not
immediately obvious to the authors, and other unanticipated issues are
likely if this reference collapsing rule is adopted.
A variant of subapproach 1 is to retain the natural reference
collapsing rules in which
T~&&
collapses to
T&&
but add a special
exemption solely for forwarding references: When
U&&
is a forwarding
reference and U
is
T~
, the result is
T~
, while in all other contexts,
the result would be T&&
.
That the meaning of code would depend too much on whether a reference is
a forwarding reference is the main disadvantage of this approach.
A variant of subapproach 2 is to make
T~&&
collapse to
T~
in forwarding reference
context and be ill formed in every other context. Such an approach
avoids the main disadvantage of the subapproach 1 but suffers from the
same issue as subapproach 1 concerning
std::get
and forces writers of
generic code to guard against the creation of the
T~&&
type.
Another subapproach for avoiding counterintuitive reference
collapsing outside of forwarding references is to specify that
T~&&
is neither
T~
nor
T&&
but is adjusted to
T~
in a function declaration
that uses a forwarding reference. In all other contexts,
T~&&
would be an
abominable type, and attempting to declare a variable or
evaluate an expression whose type would be
T~&&
would be ill
formed. This subapproach suffers from the same issue with
std::get
as subapproaches 1 and
3 but might avoid the disadvantages of subapproach 3 in other contexts.
Unfortunately, if WG21 adopts this subapproach and it later turns out to
be untenable, removing the abominable types from the language and
specify a different behavior will be difficult.
We could invent a new syntax for forwarding references that would
perfectly forward rlvalues, such as
T&&&
or
T~~
. This approach would avoid
all the disadvantages associated with the
T&&
syntax and the
second disadvantage associated with the
T~
syntax but suffers from a
severe disadvantage of its own: It removes design space for more general
improvements to perfect forwarding, such as a syntax that would enable
forwarding of overload sets or braced-init-lists.
We propose the T~
syntax
because, although it is imperfect, its problems are less severe than
those introduced by all known alternatives. The problems with the
T~
syntax parallel the problems
with the existing T&&
syntax in current C++, which have proven to be tractable.
A possible alternative name for rlvalue is dvalue, the d of which connotes permission to destroy the referent. The name dvalue is analogous to xvalue, whereas using the term rlvalue is more comparable to referring to xvalues as mvalues, i.e., connoting the likely (but not certain) fate of the object rather than the permission granted to the holder of the reference.