Author: | H. Carter Edwards |
---|---|
Contact: | hcedwar@sandia.gov |
Author: | Christian Trott |
Contact: | crtrott@sandia.gov |
Author: | Hal Finkel |
Contact: | hfinkel@anl.gov |
Author: | Jim Reus |
Contact: | reus1@llnl.gov |
Author: | Robin Maffeo |
Contact: | robin.maffeo@amd.com |
Author: | Ben Sander |
Contact: | ben.sander@amd.com |
Number: | P0018 |
Version: | 1 |
Date: | 2015-10-23 |
URL: | https://github.com/kokkos/ISO-CPP-Papers/blob/master/P0018.rst |
WG21: | Evolution Working Group (EWG) |
WG21: | Concurrency and Parallelism Study Group (SG1) |
Lambda expressions declared within a non-static member function explicilty or implicitly captures the this pointer to access to member variables of this. Both capture-by-reference [&] and capture-by-value [=] capture-defaults implicitly capture the this pointer, therefore member variables are always accessed by reference via this. Thus the capture-default has no effect on the capture of this.
struct S { int x ; void f() { // The following lambda captures are currently identical auto a = [&]() { x = 42 ; } // OK: transformed to (*this).x auto b = [=]() { x = 43 ; } // OK: transformed to (*this).x a(); assert( x == 42 ); b(); assert( x == 43 ); } };
Truly capturing the *this object by value allows an implicitly declared closure to be copied before invoking the closure's function.
Asynchronous dispatch of closures is a cornerstone of parallelism and concurrency.
When a lambda is asynchronously dispatched from within a non-static member function, via std::async or other concurrency / parallelism dispatch mechanism, the *this object cannot be captured by value. Thus when the future (or other handle) to the dispatched lambda outlives the originating class the lambda's captured this pointer is invalid.
class Work { private: int value ; public: Work() : value(42) {} std::future<int> spawn() { return std::async( [=]()->int{ return value ; }); } }; std::future<int> foo() { Work tmp ; return tmp.spawn(); // The closure associated with the returned future // has an implicit this pointer that is invalid. } int main() { std::future<int> f = foo(); f.wait(); // The following fails due to the // originating class having been destroyed assert( 42 == f.get() ); return 0 ; }
Current and future hardware architectures specifically targeting parallelism and concurrency have heterogeneous memory systems. For example, NUMA regions, attached accelerator memory, and processing-in-memory (PIM) stacks. In these architectures it will often result in significantly improved performance if the closure is copied to the data upon which it operates, as opposed to moving the data to and from the closure.
For example, parallel execution of a closure on large data spanning NUMA regions will be more performant if a copy of that closure residing in the same NUMA region acts upon that data. If a true (self-contained) capture-by-value lambda closure were given to a parallel dispatch, such as in the parallelism technical specification, then the library could create copies of that closure within each NUMA region to improve data locality for the parallel computation. For another example, a closure dispatched to an attached accelerator with separate memory must be copied to the accelerator's memory before execution can occur. Thus current and future architectures require the capability to copy closures to data.
A potential work-around for this deficiency is to explicitly capture a copy the originating class.
class Work { private: int value ; public: Work() : value(42) {} std::future<int> spawn() { return std::async( [=,tmp=*this]()->int{ return tmp.value ; }); } };
This work-around has two liabilities. First, the this pointer is also captured which provides a significant opportunity to erroneously reference a this-> member instead of a tmp. member as there are two distinct objects in the closure that reference two distinct member of the same name. Second, it is onerous and counter-productive to the introduction of asynchronously dispatched lambda expressions within existing code. Consider the case of replacing a for loop within a non-static member function with a parallel for each construct as in the parallelism technical specification.
class Work { public: void do_something() const { // for ( int i = 0 ; i < N ; ++i ) foreach( Parallel , 0 , N , [=,tmp=*this]( int i ) { // A modestly long loop body where // every reference to a member must be modified // for qualification with 'tmp.' // Any mistaken omissions will silently fail // as references via 'this->'. } ); } };
In this example every reference to a member in the pre-existing code must be modified to add the tmp. qualification. This onerous process must be repeated throughout an existing code base. A true lambda capture of *this would eliminate such an onerous and silent-error-prone process of injecting parallelism and concurrency into an large, existing code base.
As currently specified integration of lambda and concurrency capabilities is perilous, as demonstrated by the previous Work example. A lambda generated within a non-static member function cannot be a true (self-contained) closure and therefore cannot reliably be used with an asynchronous dispatch.
Lambda capability is a significant boon to productivity, especially when parallel or concurrent closures can be defined with lambdas as opposed to manually generated functors. If the capability to capture *this by value is not enabled then the productivity benefits of lambdas cannot be fully realized in the parallelism and concurrency domain.
Lambda captures of *this by value within a non-static member function is as if:
Requires: The type of *this to be copy constructable.
Requires: Lambda capture of *this by value cannot occur within a copy constructor or function invoked by a copy constructor. Such a circumstance would result in an infinite recursion of the copy constructor. Note that his requirement is also applicable to the onerous work-around of [tmp=*this].
The semantically consistent solution is for the capture-default [=] to capture *this by value for lambda expressions within a non-static member function. The capture-default [&] within a non-static member function conforms to the current capture specification for this.
struct S { int x ; void f() { auto a = [&]() { x = 42 ; } // OK: transformed to (*this).x auto b = [=]() mutable { x = 42 ; } // Modifying copy of x auto c = [=]() { x = 42 ; } // Error: captured copy of '*this' // and lambda function is 'const' } };
This solution corrects lambda capture semantics; however, it is likely to break existing code conforming to the C++11 standard which depend on the copying of the this pointer rather than copying of the object to which the this pointer refers. As such we currently consider this solution to be impractical and propose the following pragmatic solution.
Given that the semantically consistent preferred solution would break current standard behavior, a new capture mechanism is necessary to provide semantically consistent capture-by-value semantics for lambda expressions within non-status member functions.
Feature test macro: __cpp_lambda_capture_this_object_by_value
A new capture mechanism introduces a new capture interaction. For non-*this captures the interactions remain unchanged. When *this is captured by value via [*this] nested captures of this refer to the enclosing copy of *this.
void Work::foo() { auto x = [=,*this]() { // this, *this, and member variables of Work refer to // the copy *this contained in closure 'x' // does not refer to the original enclosing 'this' auto y = [&]() { // this, *this, and member variables of Work refer to // the copy *this contained in closure 'x' // does not refer to the original enclosing 'this' }; auto z = [=]() { // this, *this, and member variables of Work refer to // the copy *this contained in closure 'x' // does not refer to the original enclosing 'this' }; auto zz = [=,*this]() { // this, *this, and member variables of Work refer to // a new copy *this contained in closure 'z' // that is copied from the copy of *this contained // in the closure 'x' }; }; }
With proper lambda capture-by-value the earlier examples can have the correct behavior by generating a complete closure.
class Work { private: int value ; public: Work() : value(42) {} // Capture-by-value is correct and the asynchronously // dispatched closure may outlive the originating class, // and may be freely copied without losing correctness. std::future<int> do_something() const { // Trivial change to replace 'for' with asynchronously // dispatched parallel foreach. // for ( int i = 0 ; i < N ; ++i ) future<int> todo = foreach( Parallel , 0 , N , [=,*this]( int i ) { // A non-trivial loop body where // every reference to a member is // safely accessed from the // captured-by-value *this }); return todo ; } };
Assuming that correcting the capture-default behavior of [=] is impractical we seek a capture-default expression to simplify the true capture-by-value expression [=,*this]. A possible capture-default expression is to let [=*] be equivalent to [=] outside of a non-static member function and equivalent to [=,*this] within a non-static member function.