ISO/IEC JTC1 SC22 WG21 N2998 = 09-0188 - 2009-10-22
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
This paper revises N2957 = 09-0147 - 2009-09-25.
Problem
Importance
Solution
Wording
Acknowledgments
References
The adoption of [N2927] restricts the referents of a lambda expression to the immediately-enclosing function or lambda expression. This restriction is severe and does damage to the utility of the lambda feature.
One principle of lambda design is that programmers should be able to replace all well-structured control constructs with function calls taking lambda expressions as arguments. [N2927] prevents that replacement, and is essentially equivalent to saying that the body of a nested loop cannot access function parameters.
Consider the following function to multiply square matricies.
matrix operator*( const matrix& a, const matrix& b ) { int n = a.numrows(); matrix c( n, n ); for ( int i = 0; i < n; i++ ) { for ( int j = 0; j < n; j++ ) { double t = 0.0; for ( int k = 0; k < n; k++ ) { t += a[i][k] * b[k][j]; } c[i][j] = t; } } return c; }
Its rewrite into functions and lambdas, as intended by the adopted [N2550], is as follows. Note that the rewrite is approximately the same size.
matrix operator*( const matrix& a, const matrix& b ) { int n = a.numrows(); matrix c( n, n ); for_range( 0, n-1, [&]( int i ){ for_range( 0, n-1, [&]( int j ){ double t = 0.0; for_range( 0, n-1, [&]( int k ){ t += a[i][k] * b[k][j]; } ); c[i][j] = t; } ); } ); return c; }
However this straightforward reformulation is ill-formed as per [N2927] wording for 5.1.1 [expr.prim.lambda] paragraphs 9–12. To make the code well-formed, the function must be rewritten as follows, where inserted text shows new code, which sometimes alter variable references.
matrix operator*( const matrix& a, const matrix& b ) { int n = a.numrows( ); matrix c( n, n ); for_range( 0, n-1, [&]( int i ){ const matrix& a1 = a; const matrix& b1 = b; int& n1 = n; matrix& c1 = c; for_range( 0, n1-1, [&]( int j ){ const matrix& a2 = a1; const matrix& b2 = b1; int& n2 = n1; int& i2 = i; double t = 0.0; for_range( 0, n1-1, [&]( int k ){ t += a2[i2][k] * b2[k][j]; } ); c1[i][j] = t; } ); } ); return c; }
Such extensive mechanical editing will inhibit use, make errors more likely, and complicate code maintenance. Such editing is properly the domain of the compiler.
One view of lambdas is that
they are primarily useful for very short inline function arguments
to standard algorithms,
and hence do not need general scoping support.
This view considerably underestimates the power of control abstraction.
Indeed, the algorithms in the standard library
are the beginning of control abstraction, not the end.
In a highly concurrent world,
programmers need more control constructs
than can be reasonably provided by the language.
Programmers will need to define and use their own constructs,
and they will be using those constructs
with the same frequency that
current programmers use for
and if
.
Consider this following example, adapted from [CrowlLeBlanc94], for Gaussian elimination. It contains two constrol structures, one for the triangulation and one for the row reduction. Only the latter is a common control construct.
int size = system.size(); triangulate( size, [&]( int pivot, int reduce ) { auto fraction = system[reduce][pivot] / system[pivot][pivot]; forall( pivot, size-1, [&]( int variable ) { system[reduce][variable] -= system[pivot][variable] * fraction; } ); } );
In particular, the triangulate
control construct
encapsulates all the available parallelism in the triangulation.
Just as importantly, it also captures all the necessary synchronization.
So,
this highly parallel code contains no threads and no synchronization,
which are the source of most effort and bugs in parallel programming.
More imporantly,
there is no particular commitment
to any sequential or parallel execution of its control constructs.
A vector machine might choose to execute
triangulate
serially and forall
concurrently.
On the other hand, a NUMA machine would likely
execute triangulate
with coarse parallelism
and forall
sequentially.
Good control abstraction enables more effective use of machine resources. But to be effectively programmed, lambda expressions must be able to easily substitute for ordinary well-structured control statements. For that easy substitution, we must restore the intent of [N2550].
In outline, the solution is as follows.
Define reaching scope for a lambda expression as the enclosing lambda expressions out to and including the innermost function. That is, equivalent to replacing the lambda expressions by blocks.
Define explicit and implicit captures with reference to the reaching scope.
Define a capture of an entity to be a use (formally) of that entity within the scope containing the lambda expression.
Define a lambda expression to be ill-formed if it uses (formally) anything from its reaching scope that it does not capture.
Define a lambda expression to be ill formed if it captures an entity that is not either defined or captured in the immediately enclosing lambda expression or function.
For entities captured by an outer lambda expression, define its nested lambda expression to capture those entities from the closure object of the outer lambda expression. That is, all captures are one level deep. This definition ensures that a closure object for a lambda expression contains all the information necessary to form a closure object for nested lambda expressions.
All edits fall in section 5.1.2 [expr.prim.lambda] and are based on N2960 (which includes the changes of N2927).
Replace paragraph 9, including the example, by (a counterpart of the original paragraph 9 will be inserted in paragraph 12 below):
A lambda-expression whose smallest enclosing scope is a block scope ([basic.scope.local] 3.3.3) is a local lambda expression; any other lambda-expression shall not have a capture-list in its lambda-introducer. The reaching scope of a local lambda expression is the set of enclosing scopes up to and including the innermost enclosing function and its parameters. [Note: This reaching scope includes any intervening lambda-expressions. —end note]
Edit paragraph 10 as follows.
The identifiers in a capture-list are looked up using the usual rules for unqualified name lookup ([basic.lookup.unqual] 3.4.1); each such lookup shall find a variable or reference with automatic storage duration declared in the reaching scope of the local lambda expression. An entity (i.e., a variable, a reference, or
this
) is said to be explicitly captured if it appears in the lambda-expression's capture-list.An explicitly captured entity is used ([basic.def.odr] 3.2).
Edit paragraph 11 as follows.
If a lambda-expression has an associated capture-default and its compound-statement uses ([basic.def.odr] 3.2)
this
or a variable or reference with automatic storage durationdeclared in an enclosing function or lambda-expressionand the used entity is not explicitly captured, then the used entity is said to be implicitly captured; such entities shall be declared within the reaching scope of the lambda expression. [Note: The implicit capture of an entity by a nested lambda-expression can cause its implicit capture by the containing lambda-expression (see below). Implicit uses ofthis
can result in implicit capture. —end note]
Edit paragraph 12 as follows.
An entity is captured if it is captured explicitly or implicitly. An entity captured by a lambda-expression is used ([basic.def.odr] 3.2) in the scope containing the lambda-expression. If
this
is captured, either explicitly or implicitly, the lambda-expression shall appear directly in within the definition of a non-static member function, i.e., not in another lambda-expressionby a local lambda expression, its nearest enclosing function shall be a non-static member function .[Note: This rule prevents access from a nested lambda-expression to the members of the enclosing lambda-expression's closure object. —end note]If a lambda-expression uses ([basic.def.odr] 3.2)this
or a variable or reference with automatic storage duration from its reaching scope, that entity shall be captured by the lambda-expression. If a lambda-expression captures an entity, and that entity is not defined or captured in the immediately enclosing lamba expression or function, the program is ill-formed. [Example:void f1(int i) { int const N = 20; auto m1 = [=]{ int const M = 30; auto m2 = [i]{ int x[N][M]; // OK: N and M are not "used" x[0][0] = i; // OK: i is explicitly captured by m2 // and implicitly captured by m1 }; }; struct s1 { int f; int work(int n) { int m = n*n; int j = 40; auto m3 = [this,m]{ auto m4 = [&,j]{ // error: j not captured by m3 int x = n; // error: n implicitly captured by m4 // but not captured by m3 x += m; // OK: m implicitly captured by m4 // and explicitly captured by m3 x += i; // error: i is outside of the reaching scope x += f; // OK: this captured implicitly by m4 // and explicitly by m3 }; }; } }; }
—end example]
After paragraph 15, add a new paragraph as follows.
If a lambda-expression
m1
captures an entity, and that entity is captured by an immediately enclosing lambda-expressionm2
, thenm1
's capture is transformed as follows:
if
m2
captures the entity by copy,m1
captures the corresponding non-static data member ofm2
's closure type;if
m2
captures the entity by reference,m1
captures the same entity captured bym2
.[Example: The nested lambda expressions and invocations below will output
123234
.int a = 1, b = 1, c = 1; auto m1 = [a, &b, &c]() mutable { auto m2 = [a, b, &c]() mutable { std::cout << a << b << c; a = 4; b = 4; c = 4; }; a = 3; b = 3; c = 3; m2(); }; a = 2; b = 2; c = 2; m1(); std::cout << a << b << c;
—end example]
William M. Miller provided many helpful comments and alternate wording suggestions.