1. Changelog
-
R2: Added discussion of
.return ++ x -
R1: Added discussion of
.return x += y
2. Background
Each version of C++ has improved the efficiency of returning objects by value. By the middle of the last decade, copy elision was reliable (if not technically guaranteed) in situations like this:
Widget one () { return Widget (); // copy elision } Widget two () { Widget result ; return result ; // copy elision }
In C++11, a completely new feature was added: a change to overload resolution which I will call implicit move. Even when copy elision is impossible, the compiler is sometimes
required to implicitly move the
statement’s operand into the result object:
std :: shared_ptr < Base > three () { std :: shared_ptr < Base > result ; return result ; // copy elision } std :: shared_ptr < Base > four () { std :: shared_ptr < Derived > result ; return result ; // no copy elision, but implicitly moved (not copied) }
The wording for this optimization was amended by [CWG1579]. The current wording in [class.copy.elision]/3 says:
In the following copy-initialization contexts, a move operation might be used instead of a copy operation:
If the expression in a
statement is a (possibly parenthesized) id-expression that names an object with automatic storage duration declared in the body or parameter-declaration-clause of the innermost enclosing function or lambda-expression, or
return if the operand of a throw-expression is the name of a non-volatile automatic object (other than a
function or catch-clause parameter) whose scope does not extend beyond the end of the innermost enclosing try-block (if there is one),overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If the first overload resolution fails or was not performed, or if the type of the first parameter of the selected
constructor is not anrvalue reference tothe object’s type (possibly cv-qualified), overload resolution is performed again, considering the object as an lvalue.
The highlighted phrases above indicate places where the wording diverges from a naïve programmer’s intuition. Consider the following examples...
2.1. Throwing is pessimized
Throwing is pessimized because of the highlighted word
void five () { Widget w ; throw w ; // non-guaranteed copy elision, but implicitly moved (never copied) } Widget six ( Widget w ) { return w ; // no copy elision, but implicitly moved (never copied) } void seven ( Widget w ) { throw w ; // no copy elision, and no implicit move (the object is copied) }
Note: The comment in
matches the current Standard wording, and matches the behavior of GCC.
Most compilers (Clang 4.0.1+, MSVC 2015+, ICC 16.0.3+) already do this implicit move.
2.2. Non-constructor conversion is pessimized
Non-constructor conversion is pessimized because of the highlighted word
struct From { From ( Widget const & ); From ( Widget && ); }; struct To { operator Widget () const & ; operator Widget () && ; }; From eight () { Widget w ; return w ; // no copy elision, but implicitly moved (never copied) } Widget nine () { To t ; return t ; // no copy elision, and no implicit move (the object is copied) }
2.3. By-value sinks are pessimized
By-value sinks are pessimized because of the highlighted phrase
struct Fish { Fish ( Widget const & ); Fish ( Widget && ); }; struct Fowl { Fowl ( Widget ); }; Fish ten () { Widget w ; return w ; // no copy elision, but implicitly moved (never copied) } Fowl eleven () { Widget w ; return w ; // no copy elision, and no implicit move (the Widget object is copied) }
Note: The comment in
matches the current Standard wording, and matches the behavior of
Clang, ICC, and MSVC. One compiler (GCC 5.1+) already does this implicit move.
2.4. Slicing is pessimized
Slicing is pessimized because of the highlighted phrase
std :: shared_ptr < Base > twelve () { std :: shared_ptr < Derived > result ; return result ; // no copy elision, but implicitly moved (never copied) } Base thirteen () { Derived result ; return result ; // no copy elision, and no implicit move (the object is copied) }
Note: The comment in
matches the current Standard wording, and matches the behavior
of Clang and MSVC. Some compilers (GCC 8.1+, ICC 18.0.0+) already do this implicit move.
We propose to remove all four of these unnecessary limitations.
3. Proposed wording relative to N4762
Modify [class.copy.elision]/3 as follows:
In the following copy-initialization contexts, a move operation might be used instead of a copy operation:
If the expression in a
statement is a (possibly parenthesized) id-expression that names an object with automatic storage duration declared in the body or parameter-declaration-clause of the innermost enclosing function or lambda-expression, or
return if the operand of a throw-expression is the name of a non-volatile automatic object (other than a
function orcatch-clause parameter) whose scope does not extend beyond the end of the innermost enclosing try-block (if there is one),overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If the first overload resolution fails or was not performed,
or if the type of the first parameter of the selected constructor is not an rvalue reference to the object’s type (possibly cv-qualified),overload resolution is performed again, considering the object as an lvalue. [Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. —end note]
Note: I believe that the two instances of the word "constructor" in the quoted note remain correct. They refer to the constructor selected to initialize the result object, as the very last step of the conversion sequence. This proposed change merely permits the conversion sequence to be longer than a single step; for example, it might involve a derived-to-base conversion followed by a move-constructor, or a user-defined conversion operator followed by a move-constructor. In either case, as far as the quoted note is concerned, that ultimate move-constructor is the "constructor to be called," and indeed it must be accessible even if elision is performed.
4. Proposed wording relative to P0527r1
David Stone’s [P0527] "Implicitly move from rvalue references in return statements" proposes to alter the current rules "references are never implicitly moved-from" and "catch-clause parameters are never implicitly moved-from." It accomplishes this by significantly refactoring clause [class.copy.elision]/3.
In the case that [P0527]'s changes are adopted into C++2a, we propose to modify the new [class.copy.elision]/3 as follows:
A movable entity is a non-volatile object or an rvalue reference to a non-volatile type, in either case with automatic storage duration.
The underlying type of a movable entity is the type of the object or the referenced type, respectively.In the following copy-initialization contexts, a move operation might be used instead of a copy operation:
If the expression in a
statement is a (possibly parenthesized) id-expression that names a movable entity declared in the body or parameter-declaration-clause of the innermost enclosing function or lambda-expression, or
return if the operand of a throw-expression is a (possibly parenthesized) id-expression that names a movable entity whose scope does not extend beyond the end of the innermost enclosing try-block (if there is one),
overload resolution to select the constructor for the copy is first performed as if the entity were designated by an rvalue. If the first overload resolution fails or was not performed,
or if the type of the first parameter of the selected constructor is not an rvalue reference to the (possibly cv-qualified) underlying type of the movable entity,overload resolution is performed again, considering the entity as an lvalue. [Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. —end note]
5. Implementation experience
This feature has effectively already been implemented in Clang since February 2018; see [D43322].
Under the diagnostic option
(which is enabled as part of
,
, and
),
the compiler performs overload resolution according to both rules — the standard rule and also
a rule similar to the one proposed in this proposal. If the two resolutions produce different results,
then Clang emits a warning diagnostic explaining that the return value will not be implicitly moved and
suggesting that the programmer add an explicit
.
However, Clang does not diagnose the examples from §1.3 By-value sinks.
5.1. Plenitude of true positives
These warning diagnostics have proven helpful on real code. Many instances have been reported of code that is currently accidentally pessimized, and which would become optimized (with no loss of correctness) if this proposal were adopted:
-
[SG14]: a clever trick to reduce code duplication by using conversion operators, rather than converting constructors, turned out to cause unnecessary copying in a common use-case.
-
[Chromium]: a non-standard container library used
instead ofiterator :: operator const_iterator () &&
. (The actual committed diff is here.)const_iterator :: const_iterator ( iterator && ) -
[LibreOffice]: "An explicit std::move would be needed in the return statements, as there’s a conversion from
to base classVclPtrInstance
involved."VclPtr
However, we must note that about half of the true positives from the diagnostic are on code like the following example, which is not affected by this proposal:
std :: string fourteen ( std :: string && s ) { s += "foo" ; return s ; // no copy elision, and no implicit move (the object is copied) }
See [Khronos], [Folly], and three of the four diffs in [Chromium]. [AWS] is a particularly egregious variation. (The committed diff is here.)
std :: string fifteen () { std :: string && s = "hello world" ; return s ; // no copy elision, and no implicit move (the object is copied) }
Some number of programmers certainly expect a move here, and in fact [P0527] proposes to implicitly move in both of these cases. This paper does not conflict with [P0527], and we provide an alternative wording for the case that [P0527] is adopted.
5.2. Lack of false positives
In eleven months we have received a single "false positive" report ([Mozilla]), which complained that the move-constructor suggested
by Clang was not significantly more efficient than the actually selected copy-constructor. The programmer preferred not
to add the suggested
because the code ugliness was not worth the minor performance gain.
This proposal would give Mozilla that minor performance gain without the ugliness — the best of both worlds!
We have never received any report that Clang’s suggested move would have been incorrect.
6. Further proposal to handle assignment operators specially
Besides the cases of
handled by this proposal, and the cases of
handled by
David Stone’s [P0527], there is one more extremely frequent case where a copy is done instead
of an implicit move or copy-elision.
std :: string sixteen ( std :: string lhs , const std :: string & rhs ) { return lhs += rhs ; // no copy elision, and no implicit move (the object is copied) } std :: string seventeen ( const std :: string & lhs , const std :: string & rhs ) { std :: string result = lhs ; return result += rhs ; // no copy elision, and no implicit move (the object is copied) }
For a real-world example of this kind of code, see GNU libstdc++'s [PR85671], where even a standard library implementor fell into the trap of writing
path operator / ( const path & lhs , const path & rhs ) { path result ( lhs ); return result /= rhs ; // no copy elision, and no implicit move (the object is copied) }
We propose that — in order to make simple code like the above produce optimal codegen —
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the constructor selected for the copy/move operation and/or the destructor for the object have side effects. Each such case involves an expression, called the candidate expression, and a source object, called the copy elision candidate.
The copy elision candidate is computed from the candidate expression as follows:
- In a
statement with an expression, the candidate expression is the expression.
return - In a throw-expression, the candidate expression is the operand of
.
throw
- If the candidate expression is the (possibly parenthesized) name of a non-volatile automatic object, then the copy elision candidate is that object.
- If the candidate expression is an assignment-expression, and the logical-or-expression on the left-hand side of the assignment-operator is the (possibly parenthesized) name of a non-volatile automatic object, and the type of the assignment-expression is a non-cv-qualified lvalue reference to the type of the automatic object, then the copy elision candidate is the automatic object.
[Note: This happens regardless of the actual behavior of the assignment operator selected by overload resolution. The implementation essentially assumes that the return value of any (possibly compound) assignment operator is a reference to its left-hand operand. —end note]- If the candidate expression is a unary-expression involving the operator
or
++ , and the operand cast-expression is the (possibly parenthesized) name of a non-volatile automatic object, and the type of the unary-expression is a non-cv-qualified lvalue reference to the type of the automatic object, then the copy elision candidate is the automatic object.
-- The elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):
When copy elision occurs, the implementation treats the source and target of the omitted copy/move operation as simply two different ways of referring to the same object. If the first parameter of the selected constructor is an rvalue reference to the object’s type, the destruction of that object occurs when the target would have been destroyed; otherwise, the destruction occurs at the later of the times when the two objects would have been destroyed without the optimization.
in a
statement in a function with a class return type, when
return the expression is the name ofthe copy elision candidate is a non-volatile automatic object (other than a function parameter or a variable introduced by the exception-declaration of a handler (13.3)) with the same type (ignoring cv-qualification) as the function return type, the copy/move operation can be omitted by constructingthe automatic objectthe copy elision candidate object directly into the function call’s return objectin a throw-expression, when the
operandcopy elision candidate is the name of a non-volatile automatic object (other than a function or catch-clause parameter) whose scope does not extend beyond the end of the innermost enclosing try-block (if there is one), the copy/move operation fromthe operandthe copy elision candidate object to the exception object (13.1) can be omitted by constructing the automatic object directly into the exception object
This would be a novel special case; as the "Note" says, this would essentially permit the
core language to assume that every overloaded
and
which returns an
lvalue reference at all, returns an lvalue reference to
. It would be possible for
pathological code to observe the optimization happening:
struct Observer ; struct Observer { static int k = 0 ; static Observer global ; int i ; explicit Observer ( int i ) : i ( i ) {} Observer ( const Observer & rhs ) : i ( ++ k ) { printf ( "observed a copy from %d to %d" , rhs . i , i ); } Observer ( Observer && rhs ) : i ( ++ k ) { printf ( "observed a move from %d to %d" , rhs . i , i ); } Observer & operator = ( const Observer & rhs ) { i = rhs . i + 1 ; printf ( "observed a copy-assign from %d to %d" , rhs . i , i ); return & global ; // pathological! } }; Observer Observer :: global { 10 }; Observer foo () { Observer x { 20 }; Observer y { 30 }; return x = y ; } int main () { Observer o = foo (); printf ( "o.i is %d \n " , o . i ); }
In C++17, the above code has this behavior:
-
, thenobserved a copy - assign from 30 to 31
, thenobserved a copy from 10 to 1
(the behavior required by C++17, forbidden under the proposal)o . i is 1
Under the "further proposal" sketched above, the code would instead have one of the following behaviors:
-
, thenobserved a copy - assign from 30 to 31
, thenobserved a move from 10 to 1
(implicit move, permitted under the proposal)o . i is 1 -
, thenobserved a copy - assign from 30 to 31
(copy elision, permitted and encouraged under the proposal)o . i is 31