Additional std::async Launch Policies

This paper proposes the addition of two new launch policies to std::async, one sychronous (launch::sync) and one asynchronous (launch::task). It also suggests changes to the default launch policy.

launch::task

launch::task is an asynchronous execution policy that is similar to the existing launch::async, except that it doesn't require the creation of a new thread for each task.

Motivation and Rationale

The current asynchronous policy, launch::async, specifies that execution occurs "as if in a new thread". The implementation is thus required to create a new thread for each task. This is expensive.

The motivation for this imposed cost is that the task is guaranteed to start with fresh, default-constructed, thread local variables, and that those thread local variables are guaranteed to be destroyed immediately after completion.

A common use of thread local variables is to locally cache objects that are expensive to recreate. For such uses, destroying and reinitializing the thread local variables imposes an additional source of inefficiency on top of the mandated thread creation. Reuse of such thread locals is actually desirable.

In most other cases, reusing thread local variables across tasks is harmless.

Therefore, a launch policy that would allow the implementation to reuse a thread for more than one task execution would be a significant performance enhancement.

The common concerns about such thread reuse are:

Does this grant the implementation the license to introduce deadlocks when an earlier task waits for a later one and a thread is not available to run the later task?
Does this introduce a lifetime problem at program termination?

The answers, in this proposal, are no and no.

Implementation-induced deadlocks are specifically disallowed, by introducing a requirement that a task using the task (and async) policy shall be assigned a thread no later than the first call to a wait function. The implementation may avoid spawning too many threads and oversubscribing the CPU by taking advantage of its freedom to use deferred or synchronous execution, if the user has included launch::deferred or launch::sync as an allowed policy for the std::async call.

At program termination, completed or running tasks using the proposed launch::task policy have the thread local variables of their corresponding threads destroyed before static destruction takes place. This implies that exit may need to wait for the currently running tasks to complete. Tasks that are launched after static destruction starts behave as if launch::async has been used.

launch::sync

launch::sync is a synchronous policy that executes the task directly in the std::async call.

Motivation and Rationale

On its surface, a policy that executes the task immediately may seem superfluous; the user could have just executed the task instead of going through the trouble of using std::async. Its advantages become more apparent if we consider that a routine may take a launch policy as a parameter, as in the following pseudocode:

void routine( std::launch policy, args...)
{
    /* ... */

    std::future<X> fx = std::async( policy, ... );

    /* ... */
}

Such parameterization is desirable, for example, if we want to be able to experiment with different launch policies and pick the one that delivers the best performance.

In such cases, it is very convenient to be able to tell routine to execute everything synchronously, for the following reasons:

Debugging: If routine does not work as intended, the problem may have something to do with the asynchronous execution, or it may not. Switching to launch::sync allows us to quickly determine which of these two is the case.
Performance assessment: Measuring the performance of routine with launch::sync can be very useful both as a sanity check (is it by chance faster than the supposedly parallel version?) and as a baseline (how well does it scale?)
Control over parallelism: In a recursive parallel algorithm, passing launch::sync for some of the recursive calls allows us finer control over which branches is executed in parallel and which aren't.

In addition, launch::sync can be combined with other policies, to grant the implementation the option to execute in the calling thread. This allows the implementation to better balance the load if, for example, it detects that the task queue has grown too big.

Half-seriously, the policy also allows one to obtain a ready future holding a specific value or exception:

std::future<int> x = std::async( std::launch::sync, []{ return 42; } );
std::future<int> y = std::async( std::launch::sync, [] -> int { throw std::runtime_error( "Hello exceptional world!" ); } );

The Default Launch Policy

The default launch policy is currently launch::async | launch::deferred and is unnamed. This proposal suggest two changes. First, the default policy should be given a name, launch::default_. Second, the default should be launch::sync | launch::async | launch::task | launch::deferred.

Motivation and Rationale

The default policy should be given a name both to simplify the specification and isolate any eventual changes to a single place, and to allow users to name it without spelling it out.

The plain std::async call, which implicitly uses the default policy, is, for many programmers, their first encounter with parallelism in C++. It should make a good first impression, and good performance is essential. The default policy should afford the implementation maximum flexibility in meeting the performance expectations of a C++ programmer. That is why this paper suggests that the implementation should be free to choose among all of the available policies.

Currently, there is still not much code that depends on the default, so the change will be relatively painless. As more and more programmers take advantage of std::async, the default policy will progressively become more entrenched and harder to change. The time for a change is now.

Proposed Text

(All edits are relative to ISO/IEC 14882-2011.)

Change enum class launch in the synopsis of <future> in 30.6.1 [futures.overview] p1 as follows:

enum class launch : unspecified {
    async = unspecified,
    deferred = unspecified,
    task = unspecified,
    sync = unspecified,
    default_ = sync | async | task | deferred,
    implementation-defined
};

Change the first sentence of 30.6.1 [futures.overview] p2 as follows:

The enum type launch is an implementation-defined bitmask type (17.5.2.1.3) with launch::async, ~~and~~ launch::deferred, launch::task, and launch::sync denoting individual bits.

Change the first sentence of 30.6.8 [futures.async] p3 as follows:

Effects: The first function behaves the same as a call to the second function with a policy argument of ~~launch::async | launch::deferred~~ launch::default_ and the same arguments for F and Args.

Add the following two bullets to 30.6.8 [futures.async] p3:

if policy & launch::task is non-zero — equivalent to the policy & launch::async case, except that the task may inherit the thread_local variables from a previous completed task execution, and the thread_local variables of the current execution are not necessarily destroyed immediately after its completion. If the async call happens before a call to exit or return from main, destructors for thread_local variables corresponding to the task's thread will run before those for static duration objects. The call to exit or the return from main may implicitly wait for currently running tasks using the launch::task policy to complete. If the exit call or return from main happens before an std::async call with launch::task policy then that call behaves as though it had used launch::async policy. [Note: in a long-lived program, implementations are encouraged to eventually destroy the thread_local variables of completed executions. — end note.]
if policy & launch::sync is non-zero — calls INVOKE(DECAY_COPY(std::forward<F>(f)), DECAY_COPY(std::forward<Args>(args))...). Any return value is stored as the result in the shared state. Any exception propagated from the execution of INVOKE(DECAY_COPY(std::forward<F>(f)), DECAY_COPY(std::forward<Args>(args))...) is stored as the exceptional result in the shared state.

Add the following paragraph to 30.6.8 [futures.async] p3, after the bullets:

Tasks using the launch::async and launch::task policies shall be assigned a thread and begin execution no later than the first call to a wait function (30.6.4). [Note: In other words, the implementation is not allowed to deadlock if an earlier task waits for a later one. — end note.]

Change 30.6.8 [futures.async] p6 as follows:

Throws: system_error if policy is launch::async or launch::task and the implementation is unable to start a new thread.

Change 30.6.8 [futures.async] p7 as follows:

Error conditions:

resource_unavailable_try_again — if policy is launch::async or launch::task and the system is unable to start a new thread.

Thanks to Hans Boehm, Herb Sutter, Niklas Gustafsson and Anthony Williams.

— end