std::reference_closure
ISO/IEC JTC1 SC22 WG21 N2845 = 09-0035 - 2009-03-05
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Douglas Gregor, doug.gregor@gmail.com
David Abrahams, dave@boostpro.com
Introduction
Issues
Benchmark
std::function
Optimization
Small Function Object
Direct Copy Call
Move Semantics
Use LLVM
Results
Sources
Future Optimization
Conclusion
Proposal
5.1.1 Lambda expressions [expr.prim.lambda]
20.6.18 Class template reference_closure
[func.referenceclosure]
The specification of lambda expressions
adopted with
N2550 Lambda Expressions and Closures:
Wording for Monomorphic Lambdas (Revision 4)
included a specification that closures consisting only of references
be implemented as
a class derived from std::reference_closure
.
The intent of this specification was to
enable improved performance of an class of closures across binary interfaces.
N2830 Problems with reference_closure
proposed that std::reference_closure
be removed from the language
and provided some evidence for that position.
N2839 Response to "Problems with reference_closure"
disputed some of that evidence
and argued for keeping std::reference_closure
.
This paper provides new
techniques for aggressive optimization of std::function
and corresponding benchmark results
that show that the relative cost of std::function
to std::reference_closure
can be much lower than previous evidence suggested.
This new evidence enables a consensus agreement to remove
std::reference_closure
.
This paper summarizes the issues,
describes the new std::function
optimization techniques,
presents the benchmark results,
and proposes changes to the working draft.
Closures have anonymous types,
and are hence not suitable for binary interfaces.
The expected development model for binary interfaces using closures
is to first represent the closures with std::function
.
When there is evidence of a need for additional performance,
an additional overloaded interface
uses std::reference_closure
to handle the appropriate subset more efficiently.
There are problems with taking this approach.
std::reference_closure
.
This lack of support
could be ameliorated by changing the lambda,
but the workaround is not generally applicable.
std::reference_closure
,
which requires the closure type to contain a function pointer
that it might not otherwise require.
This unused space can be ameliorated by
a compiler that does function cloning and parameter propogation.
There are problems with not taking this approach.
std::function
parameter type
that the closure will not be used past completion of the function.
That is, there is no obvious guarantee
that the closure type will be used
only during its lifetime.
So, there is a risk of use after destruction.
This risk can be ameliorated by
passing the std::function
by reference.
std::function
are slower than implementations of std::reference_closure
.
Since the purpose of std::reference_closure
is performance,
a benchmark is appropriate.
The benchmark measures the penalties of using lambdas
as a control abstraction,
and early results for that benchmark
influenced the decision to adopt std::reference_closure
.
The basis of the benchmark is that:
[&]() { do_some_work(); }
.
The benchmark itself consists of a many repetitions of the following.
Logical Action Representation Operation Build the closure object. n/a Pass the closure to the task scheduler as a "callback". construct Copy the callback to the execution engine. copy Invoke the callback to the original closure object. indirect call
The benchmark environment consists of:
The initial results are
similar to those obtained at the adoption
of std::reference_closure
.
Those results show std::function
with 23.5 times the overhead of std::reference_closure
.
std::function
OptimizationThe methodology of the optimization work is:
std::function
without a loss of generality.
The implementation of std::function
has a "small function object" optimization.
This optimization eliminates a malloc
and free
pair
on each copy.
Unfortunately, this optimization was not enabled.
Specializing the trait corrected the problem.
We anticipate that in C++0x implementation, the problem will not arise.
The implementation of std::function
's copy constructor
uses an indirect call.
This call is needed for the general case.
When the "small function object" has a trivial copy constructor,
the implementation can simply copy the bits
and avoid that call.
The benchmark copies the callback. In C++0x, we would move from it, which eliminates a single branc in the copy (move) operation.
The LLVM compiler generated somewhat better code than the GCC compiler.
The results of the benchmark ranged from an overhead factor of 1.6 for the 32-bit architecture to 2.2 for the 64-bit architecture. (The difference is probably mostly because the 64-bit architecture passes small structs in registers.)
The benchmark is in Boost Subversion at http://svn.boost.org/svn/boost/sandbox/reference_closure.
The optimized std::tr1::function
is on the committee's Wiki (functional/functional_iterate.h).
This version can drop in to Apple GCC 4.0.1.
An unencumbered version of the optimized std::tr1::function
will be available in the Boost repository.
The compiled implementation of std::reference_closure
is generally fairly good.
However, it has some unnecessary memory operations
and could yield performance improvements with optimizer attention.
The implementation of std::function
throws an exception if its function pointer is null.
This implies testing that pointer for null,
which is expensive.
The implementation could use a pointer to a function that throws
rather than a null pointer,
thus saving the branch.
We conclude that std::function
has and will likely continue to have double the overhead
of std::reference_closure
.
However, there are significant compiler implementation
and user programability costs
associated with a second, logically equivalent,
binary representation for closures.
On balance, we recommend removing std::reference_closure
.
We propose to remove std::reference_closure
from the standard.
Remove paragraph 12.
If every name in the effective capture set is preceded by&
and the lambda expression is not mutable,F
is publicly derived fromstd::reference_closure<R(P)>
(20.6.18), whereR
is the return type andP
is the parameter-type-list of the lambda expression. Converting an object of typeF
to typestd::reference_closure<R(P)>
and invoking its function call operator shall have the same effect as invoking the function call operator ofF
. [Note: This requirement effectively means that suchF
’s must be implemented using a pair of a function pointer and a static scope pointer. —end note]
reference_closure
[func.referenceclosure]
Remove the entire section
20.6.18 (from N2800) Class template reference_closure
[func.referenceclosure],
including