ISO/IEC JTC1 SC22 WG21 N2974 = 09-0164 - 2009-09-27
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Introduction
Action Analysis Async/Future
Actions at the Send
Actions at the Receive
Action Combinations
Other Issues
Limiting Threads
Thread Pools
Speculative Concurrency and Execution
Design and Implementation Choices
Current Operations
Alternate Operations
Various Uses
Code for Query-Like Methods
The intent of the async
and future
combination is to potentially initiate a task concurrently,
and then retreive its return value or exception at a later time.
The semantics of async
and future
have come under considerable debate.
Part of the debate arises from the expected use,
part arises from implicit assumptions about implementation,
and part arises from a lack of clarity in the implications
of certain operations.
This paper is intended to clarify the latter two parts of the debate.
The implementation of async
and future
requires actions at two points, the 'send' and the 'receive'.
The 'send' is just the call to async
.
The 'receive' is the call to get
or any other call that
might be used to decide whether or not to call get
.
For the purposes of this discussion,
we assume a 'unique' future.
Clarifying the actions and their consequences really requires a table, but that table is too big to understand at once, so we first describe the attributes and then present the table.
The available actions are as follows.
get
removes the task from the queue
and executes it sequentially.
Thus, this action enables speculative concurrency.
With an operation to simply remove the task from the queue,
this technique enables speculative execution as well.
get
is never called,
the task is never executed.
The states communicate between a asynchronous threads that might affect the future storage. They are otherwise self-explanatory.
The use of async
necessarily imposes some restrictions on the type of synchronization
that the tasks may employ.
get
.
Most sender actions have this constraint.
async
.
The following table provides the various attributes of the various actions.
Action | Benefits | States | Restrictions |
---|---|---|---|
Fork | responsive | running, complete |
get-free |
Delay | thread limiting | running, complete |
get-free, delayable |
Enqueue | thread limiting, speculative concurrency, speculative execution |
enqueued, running, complete |
get-free, delayable |
Invoke | indirect sequential | async-free | |
Defer | direct sequential, thread limiting, speculative concurrency, speculative execution |
get-free, delayable |
Again, clarifying the actions and their consequences requires attributes and then table.
Implementing the receive actions requires varying amounts of synchronization.
The following table shows the possible receive actions for each send action, and lists the synchronization required for that combination. An empty entry indicates that the combination is not possible.
Sender Action | Receiver Actions | |||||
---|---|---|---|---|---|---|
Retreive | Wait | Check | Dequeue | Call | Morph | |
Fork | acquire | wait | test | |||
Delay | acquire | wait | test | |||
Enqueue | acquire | wait | test | lock | lock | |
Invoke | reuse | |||||
Defer | reuse | reuse | reuse |
There are some combinations of sender and receiver actions that are at best marginally useful. We highlight these combinations with the intent that they may not be supported.
The 'invoke' action has only one advantage over the 'defer' action, and that is to shift the task restrictions from get-free to async-free and to shift use of thread-local variables from the get to the async. However, these advantages are actually disadvantages, because no other action has those properties. The 'invoke' action prevents speculative execution, which is a desireable property. Furthermore, the 'invoke/retrieve' combination has higher overhead than the 'defer/call' combination, which does support speculative execution. That is, the 'defer' action is strictly more useful than the 'invoke'.
Delayed forking of a task seems less useful than enqueuing the task. However, it has a lower synchronization, storage, and implementation burden. Furthermore, for those situations it which it is desireable that receivers do not execute tasks sequentially, the delayed forking provides assurance that the task will execute on another thread.
The 'defer' sender action leaves significant choices to the receiver. A receiver 'fork' followed by 'wait' seems rarely desirable in comparison to 'call'. All 'fork' followed by 'wait' seems to offer only a more controlled environment for the use of thread-local storage, but at the high cost of increased thread overhead for no additional concurrency. A 'fork' or 'delay' sender action seems more appropriate to these circumstances.
There are two approaches to limiting the number of threads.
When applications desire to use speculative concurrency, task that are speculative should be assigned lower priority than those that are not speculative.
Tasks that may not be executed at all, should have an even lower priority. It may be difficult to determine when task may never be executed, so one possible approach is to simply execute only after some demand has been made for the future results.
It is often difficult to determine a priori when potential concurrency in an application is valuable. One mechanism to obtain decent performance in this environment is to make heavy use of nested tasks with 'enqueue/call', relying on the majority of tasks to be executed sequentially, while still leaving opportunity for tasks to be executed concurrently. However, in this environment, first-come-first-served scheduling tends to produce a pathologically large number of concurrent tasks. Better is to have each processor maintain its own double-ended queue. A processor usually pushes and pops from the front of that queue. However, when the queue is empty, the processor can steal from the rear of another processor. With this approach, processors tend to steal large task that other processors are not working on, and the number of tasks executed concurrently tends to be much more managable.
Whenever multiple sender actions exist, the implementation needs policies on which action to choose. The simplest policy is to simply execute a user-specified action. However, more useful policies offer the implementation a range of policies within which it can chose to best use system resources.
We anticipate the following policies and the actions they are permitted to perform.
Surprisingly,
thread pools have no impact on the discussion above.
However, the implementation of both async
and future
may be significantly affected.
In particular, the need to establish a happens-before relationship
between the last thread-local-storage destruction
and returning the future value
requires an intrusive implementation of async
,
a modification of futures to store std::thread
for joining,
or new language facilities.
Existing future designs and implementations are based on a set of actions that the future supports. As the set o factions expand, the representation may also increase. In this section, we look at some of the representations.
The representation and storage required by a future depends on the actions supportd by that future. The representation is the fields allocated within the class itself. The storage is the dynamic data.
std::function
Future Design | Sender Actions | Receiver Actions | Representation | Storage |
---|---|---|---|---|
N2671 unique_future (working draft) |
Fork, Delay | Retrieve, Wait, Check | storage pointer | value/exception, mutex&condvar |
N2889 joining_future |
Fork, Delay, Defer | Wait&Retrieve, Call | storage pointer*,std::thread ,std::function |
value/exception, mutex&condvar |
N2901 unique_future |
Fork, Delay, Defer | Retrieve, Wait, Check, Call, Dequeue, Morph | storage pointer*,std::function |
value/exception, mutex&condvar |
this paper, excepting invoke | Fork, Delay, Enqueue, Defer | Retrieve, Wait, Check, Call | storage pointer*,std::function |
std::thread ,std::function ,value/exception, mutex&condvar, queue handling |
FIX: We don't need all fields in all copies of the future.
The draft future design from
N2671
is underspecified with respect to async
and thus cannot distinguish between the fork, delay, and invoke actions.
Paper N2889 added an additional future that supported actions 'fork/delay' and 'defer&call'.
Paper N2901 implicitly extended futures to support 'fork/call' and 'defer&call'.
In this section we look at operations on
unique_future
defined in the working draft.
The use of unique_future
in data structures in hampered by the lack of a coordinated set of operations:
a default constructor, a move assignment operator, and a test for the default.
The std::thread
class has such operations
so as to enable storing them in data structures.
Furthermore, an 'empty' value associated with default constructors
can lead to defined exceptional behavior
rather than the current undefined behavior
for double get
operations.
std::unique_future<moveable< f = std::async(g); moveable v = f.get(); // fine moveable w = f.get(); // 'cannot get from empty' exception
These problems are addressed by N2888 and successors.
Anthony Williams points a problem with the return type of get
.
... given "
T&& get();
", "T&v=f.get();
" requires that the reference is valid beyond the call toget()
— specifically, it requires that it is valid long enough for the constructor ofv
to use it.This means that
get()
cannot free the internal storage for the value, even if it marks the future as "empty".Consequently, you may have "empty" futures that actually have a very large amount of internal storage associated with them, and even a valid object of some resource-owning type if that type does not have a move constructor. In this case, I would rather that the future remained "ready", and a subsequent call to
get()
was permitted.
He also points out that
the freedom to
"implement the get()
as a call
to a stored std::function
and never store the result"
requires returning by value rather than rvalue reference.
The is_ready
operation is ambiguous
with respect to the 'call' action.
Does f.is_ready()
true mean
that its value has been computed and stored,
or that obtaining its value does not require waiting on another thread?
The former definition could lead to problematic behavior
because for 'defered' tasks,
f.is_ready()
will never become true
and hence the call to get
may be indefinitely delayed.
On the other hand,
the latter definition could lead to unexpected delays
due to unintended sequential evaluation.
Furthermore, the latter definition could lead to retractions.
Consider an 'enqueued' future.
Its is_ready()
returns true,
then the scheduler assigns that future to a thread,
and some subsequent is_ready()
calls would return false.
However, the caller may have already decided to call get
and thus be subject to synchronization delays.
Assuming the later definition for is_ready
,
then sequential tasks
will have not been executed at the return from a call to is_ready
.
In that case,
what should has_value
and has_exception
return?
The information necessary for a precise answer is not available.
We must either provide a convention that
!f.has_value()&&!f.has_exception()
means that the value has not been computed,
or we must extend the interface.
Since exceptions are by definition rare,
these operations seem to be of little value,
and given their semantic difficulties
may well warrant being removed from the interface.
The wait
operations were designed to reflect
internal condition variable waits to the user of the future.
Unfortunately, the timed waits
do not have a corresponding operations on std::thread
.
To enable timed waits one must implement waiting a condition variable,
even when timed waits are not actually used
and a join
on the std::thread
would be sufficient.
Even using a condition variable may yield unexpected waiting
when the currently-still-necessary join
on the controlling thread
waits for destruction of thread-local variables.
Anthony Williams comments:
This suggests to me that
std::thread
is lacking a timed-join function. Such a function is implementable on both POSIX and Windows platforms. On Windows there is no overhead, as the platform supports it natively. On POSIX then the cost is larger. It does mean thatstd::thread
can't be a simple wrapper aroundpthread_t
(e.g. so you can store a semaphore forsem_timedwait
, or use a condition variable), but you generally need more than that anyway sincepthread_t
doesn't support all the operations required ofstd::thread::id
.Anyway, you could add this overhead exclusively to those threads that were tied to async-launched futures, even if it wasn't added to
std::thread
.
Reformulating the future operations may well provide interactions that are more robust to varying implementation. In order to outline these operations, we concentrate on the various ways futures might be used.
Similarly, another important fact
is whether or not the computation has completed.
That is, whether or not the value or exception is stored.
That operation is is_done
.
Ideally, though, user-level operations should not be calling functions that are specific to particular sender actions. We want operations that mean roughly the same thing regardless of the actions used to create the future. So, we try to recast the operations in those terms.
Before we can simplify uses, we must define some tools.
Central to how a future might be used
is whether or not the future 'owns' the task.
A future that is 'defered', and hence may be 'called', owns the task.
On the other hand,
a future cannot own a task that is currently executing on a thread.
More subtle is an enqueued task,
which is neither owned by the future nor committed to another thread.
In this case, we need an operation to resolve the situation,
that is to make the future take the task from the queue
to make it callable.
That operation is be_owner
,
which returns true
if either the future already owned the task
or if it was successful in obtaining ownership.
bool is_owner() { return sender_action == defer; } bool be_owner() { if ( sender_action == defer ) return true; else if ( sender_action != enqueue ) return false; else return dequeue(); /* whether or not it worked */ }
Likewise whether or not a task has already started, and is thus committed to concurrency, is important.
bool is_started() { return sender_action != defer && state != enqueued; }
Likewise whether or not a task has already completed, and is thus its value is ready for copy/move, is important.
bool is_done() { return sender_action != defered && state == complete; }
The various uses cases are coded in terms of the operations above and sender and receiver actions encoded as methods.
Of course, the most prevalent operation will be simply to obtain the value as soon as possible. The code is:
if ( f.be_owner() ) notional_work += f.call(); else { f.wait(); notional_work += f.retrieve(); }
In the case of the 'enqueue' futures, this operation provides the mechanism for speculative concurrency.
notional_work += f.value();
An alternate form returns a reference to a object stored within the future. This approach may be more efficient in some case, less in others.
notional_work += f.refer();
For speculative execution, simply not asking for the value will serve in the case of 'defered' futures. However, for 'enqueued' futures, efficiency is improved with an explicit action.
f.be_owner(); /* proceede to future destruction */
This code would be clearer as a distinct method.
f.abandon();
For algorthms that can profitably use any result computed, but need not use that the result, the following code serves.
if ( f.be_owner() ) /* ignore task not yet computed */ ; else { f.wait(); notional_work += f.retrieve(); }
The else clause is effectively a value()
.
if ( f.started_or_never() ) notional_work += f.value();
For algorthms that can profitably use any result available immediately, but need not use that the result, the following code serves. Note that there will be some wasted computation with this approach.
if ( f.is_done() ) notional_work += f.retrieve(); else f.be_owner() /* abandon the task */ ;
There does seem to be advantage to folding
the is_done
and be_owner
operations into a single method.
if ( f.done_or_never() ) notional_work += f.value();
For algorthms that can profitably that is otherwise being computed, but need not use that the result, the following code serves.
if ( f.is_started() ) { f.wait(); notional_work += f.retrieve(); }
Which can be simplified to:
if ( f.is_started() ) notional_work += f.value();
For algorthms that can profitably that has otherwise been computed, but need not use that the result, the following code serves.
if ( f.is_done() ) notional_work += f.retrieve();
Or
if ( f.is_done() ) notional_work += f.value();
If a GUI managament thread is to be responsive, it needs to avoid doing any extensive computation. That is, the task must execute on another thread. We assume that the management thread will periodically inspect the future, and if the computation is done, act on it. The following code will serve.
if ( f.is_owner() ) f.fork() /* or f.delay() */ ; else if ( f.is_done() ) notional_work += f.retrieve();
The first condition handles the case where a future has been 'defered',
and morphs it into future with asynchronous execution.
We avoid the 'enqueue' action
because its advantage over 'fork'
is that it can call tasks synchronously,
which is not wanted here.
There does seem to be advantage to folding
the is_owner
, fork
, and is_done
operations into a single method.
However, there is one complication.
We may want to apply a policy, not an action
in scheduling the task for execution.
This single method would thus need a policy parameter.
if ( f.be_done(immediately) ) notional_work += f.retrieve();