Document Number:	N3970
Date:	2014-05-22
Revises:	D3904
Editor:	Artur Laksberg Microsoft Corp. arturl@microsoft.com

General

1.1

Scope

This technical specification describes a number of concurrency extensions to the C++ Standard Library (1.2). These extensions are classes and functions that are likely to be used widely within a program and/or on the interface boundaries between libraries written by different organizations.

This technical specification is non-normative. Some of the library components in this technical specification may be considered for standardization in a future version of C++, but they are not currently part of any C++ standard. Some of the components in this technical specification may never be standardized, and others may be standardized in a substantially changed form.

The goal of this technical `specification` is to build more widespread existing practice for an expanded C++ standard library. It gives advice on extensions to those vendors who wish to provide them.

1.2

Normative references

[general.references]

The following referenced document is indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

ISO/IEC 14882:—¹
1) To be published. Section references are relative to N3797.
, Programming Languages — C++
RFC 2781, UTF-16, an encoding of ISO 10646

ISO/IEC 14882:— is herein called the C++ Standard. References to clauses within the C++ Standard are written as "C++14 §3.2". The library described in ISO/IEC 14882:— clauses 17–30 is herein called the C++ Standard Library.

1.3

Namespaces, headers, and modifications to standard classes

[general.namespaces]

Some of the extensions described in this Technical Specification represent types and functions that are currently not part of the C++ Standards Library, and because these extensions are experimental, they should not be declared directly within namespace std. Instead, such extensions are declared in namespace std::experimental.

[ Note: Once standardized, these components are expected to be promoted to namespace std. — end note ]

Unless otherwise specified, references to such entities described in this Technical Specification are assumed to be qualified with std::experimental, and references to entities described in the C++ Standard Library are assumed to be qualified with std::.

Executors and Schedulers

[executors]

2.1

General

[exec.general]

This proposal includes two abstract base classes, executor and scheduled_executor (the latter of which inherits from the former); several concrete classes that inherit from executor or scheduled_executor; and several utility functions.

Table 1 — Executors library summary
Subclause	Header(s)
[executors.base]	`<executor>`
[executors.classes]
[executors.classes.thread_pool]	`<thread_pool>`
[executors.classes.serial]	`<serial_executor>`
[executors.classes.loop]	`<loop_executor>`
[executors.classes.inline]	`<inline_executor>`
[executors.classes.thread]	`<thread_per_task_executor>`

2.2

Executor base classes

[executors.base]

The <executor> header defines abstract base classes for executors, as well as non-member functions that operate at the level of those abstract base classes.

2.3

Header `<experimental/executor>` synopsis

[executor.synop]

namespace std {
namespace experimental {
inline namespace concurrency_v1 {

    class executor;
    class scheduled_executor;

} // namespace concurrency_v1
} // namespace experimental
} // namespace std

2.4

Class `executor`

[executors.base.executor]

Class executor is an abstract base class defining an abstract interface of objects that are capable of scheduling and coordinating work submitted by clients. Work units submitted to an executor may be executed in one or more separate threads. Implementations are required to avoid data races when work units are submitted concurrently.

All closures are defined to execute on some thread, but which thread is largely unspecified. As such accessing a thread_local variable is defined behavior, though it is unspecified which thread's thread_local will be accessed.

The initiation of a work unit is not necessarily ordered with respect to other initiations.

[ Note: Concrete executors may, and often do, provide stronger initiation order guarantees. Users may, for example, obtain serial execution guarantees by using the serial_executor wrapper. — end note ]

There is no defined ordering of the execution or completion of closures added to the executor.

[ Note: The consequence is that closures should not wait on other closures executed by that executor. Mutual exclusion for critical sections is fine, but it can't be used for signalling between closures. Concrete executors may provide stronger execution order guarantees. — end note ]

    class executor {
    public:
        virtual ~executor();
        virtual void add(function<void()> closure) =0;
        virtual size_t uninitiated_task_count() const =0;
    };

executor::~executor()

Effects:: Destroys the executor.
Synchronization:: All closure initiations happen before the completion of the executor destructor. [ Note: This means that closure initiations don't leak past the executor lifetime, and programmers can protect against data races with the destruction of the environment. There is no guarantee that all closures that have been added to the executor will execute, only that if a closure executes it will be initiated before the destructor executes. In some concrete subclasses the destructor may wait for task completion and in others the destructor may discard uninitiated tasks. — end note ]
Remarks:: The behavior is undefined if the executor is destroyed during the execution of a closure running on that executor object. [ Note: One possible behavior is deadlock. — end note ]

void executor::add(std::function<void> closure);

Effects:: The specified function object shall be scheduled for execution by the executor at some point in the future. May throw exceptions if add cannot complete (due to shutdown or other conditions).
Synchronization:: completion of closure on a particular thread happens before destruction of that thread's thread-duration variables. [ Note: The consequence is that closures may use thread-duration variables, even though in general executors don't make guarantees about which thread an individual closure executes in. — end note ]
Requires:: The invoked closure should not throw an exception.
Throws:: An implementation may throw exceptions if add cannot complete (due to shutdown or other conditions)

size_t executor::uninitiated_task_count();

Returns:: the number of function objects waiting to be executed.
Notes:: this is intended for logging/debugging and for coarse load balancing decisions. Other uses are inherently risky because other threads may be executing or adding closures.

2.5

Class `scheduled_executor`

[executors.base.scheduled_executor]

Class scheduled_executor is an abstract base class that extends the executor interface by allowing clients to pass in work items that will be executed some time in the future.

    class scheduled_executor : public executor {
    public:
        virtual void add_at(const chrono::system_clock::time_point& abs_time,
                            function<void()> closure) = 0;
        virtual void add_after(const chrono::steady_clock::duration& rel_time,
                               function<void()> closure) = 0;
    };

void add_at(const chrono::system_clock::time_point& abs_time, function<void()> closure);

Effects:: The specified function object shall be scheduled for execution by the executor at some point in the future no sooner than the time represented by abs_time.
Synchronization:: completion of closure on a particular thread happens before destruction of that thread's thread-duration variables.
Requires:: The invoked closure should not throw an exception.

void add_after(const chrono::steady_clock::duration& rel_time, function<void()> closure);

Effects:: The specified function object shall be scheduled for execution by the executor at some point in the future no sooner than time rel_time from now.
Synchronization:: completion of closure on a particular thread happens before destruction of that thread's thread-duration variables.
Requires:: The invoked closure should not throw an exception.

2.6

Concrete executor classes

[executors.classes]

This section defines executor classes that encapsulate a variety of closure- execution policies.

2.6.1

Class `thread_pool`

[executors.classes.thread_pool]

2.6.2

Header `<experimental/thread_pool>` synopsis

[executors.classes.thread_pool.synop]

namespace std {
namespace experimental {
inline namespace concurrency_v1 {

  class thread_pool;

} // namespace concurrency_v1
} // namespace experimental
} // namespace std

Class thread_pool is a simple thread pool class that creates a fixed number of threads in its constructor and that multiplexes closures onto them.

class thread_pool : public scheduled_executor {
   public:
   explicit thread_pool(int num_threads);
   ~thread_pool();
   // [executor methods omitted]
};

thread_pool::thread_pool(int num_threads)

Effects:: Creates an executor that runs closures on num_threads threads.
Throws:: system_error if the threads can't be created and started.
Error conditions:: resource_unavailable_try_again — the system lacked the necessary resources to create another thread, or the system-imposed limit on the number of threads in a process would be exceeded.

thread_pool::~thread_pool()

Effects:: Waits for closures (if any) to complete, then joins and destroys the threads.

2.6.3

Class `serial_executor`

[executors.classes.serial]

2.6.4

Header `<experimental/serial_executor>` synopsis

[executors.classes.serial.synop]

namespace std {
namespace experimental {
inline namespace concurrency_v1 {

  class serial_executor;

} // namespace concurrency_v1
} // namespace experimental
} // namespace std

Class serial_executor is an adaptor that runs its closures by scheduling them on another (not necessarily single-threaded) executor. It runs added closures inside a series of closures added to an underlying executor in such a way so that the closures execute serially. For any two closures c1 and c2 added to a serial_executor e, either the completion of c1 happens before (as per C++14 §1.10) the execution of c2 begins, or vice versa. If e.add(c1) happens before e.add(c2), then c1 is executed before c2.

class serial_executor : public executor {
public
    explicit serial_executor(executor& underlying_executor);
    virtual ~serial_executor();
    executor& underlying_executor();
    // [executor methods omitted]
};

serial_executor::serial_executor(executor& underlying_executor)

Effects:: Creates a serial_executor that executes closures in FIFO order by passing them to underlying_executor.
Notes:: several serial_executor objects may share a single underlying executor.

serial_executor::~serial_executor()

Effects:: Finishes running any currently executing closure, then destroys all remaining closures and returns.

executor& serial_executor::underlying_executor()

Returns:: The underlying executor that was passed to the constructor.

2.6.5

Class `loop_executor`

[executors.classes.loop]

2.6.6

Header `<experimental/loop_executor>` synopsis

[executors.classes.loop.syn]

namespace std {
namespace experimental {
inline namespace concurrency_v1 {

  class loop_executor;

} // namespace concurrency_v1
} // namespace experimental
} // namespace std

Class loop_executor is a single-threaded executor that executes closures by taking control of a host thread. Closures are executed via one of three closure- executing methods: loop(), run_queued_closures(), and try_run_one_closure(). Closures are executed in FIFO order. Closure-executing methods may not be called concurrently with each other, but may be called concurrently with other member functions.

class loop_executor : public executor {
public:
    loop_executor();
    virtual ~loop_executor();
    void loop();
    void run_queued_closures();
    bool try_run_one_closure();
    void make_loop_exit();
    // [executor methods omitted]
};

loop_executor::loop_executor()

Effects:: Creates a loop_executor object. Does not spawn any threads.

loop_executor::~loop_executor()

Effects:: Destroys the loop_executor object. Any closures that haven't been executed by a closure-executing method when the destructor runs will never be executed.
Synchronization:: Must not be called concurrently with any of the closure-executing methods.

void loop_executor::loop()

Effects:: Runs closures on the current thread until make_loop_exit() is called.
Requires:: No closure-executing method is currently running.

void loop_executor::run_queued_closures()

Effects:: Runs closures that were already queued for execution when this function was called, returning either when all of them have been executed or when make_loop_exit() is called. Does not execute any additional closures that have been added after this function is called. Invoking make_loop_exit() from within a closure run by run_queued_closures() does not affect the behavior of subsequent closure-executing methods.
Requires:: No closure-executing method is currently running.
Remarks:: This function is primarily intended for testing.

bool loop_executor::try_run_one_closure()

Effects:: If at least one closure is queued, this method executes the next closure and returns.
Returns:: true if a closure was run, otherwise false.
Requires:: No closure-executing method is currently running.
Remarks:: This function is primarily intended for testing.

void loop_executor::make_loop_exit()

Effects:: Causes loop() or run_queued_closures() to finish executing closures and return as soon as the current closure has finished. There is no effect if loop() or run_queued_closures() isn't currently executing.
Notes:: make_loop_exit() is typically called from a closure. After a closure- executing method has returned, it is legal to call another closure-executing function.

2.6.7

Class `inline_executor`

[executors.classes.inline]

2.6.8

Header `<experimental/inline_executor>` synopsis

[executors.classes.inline.syn]

namespace std {
namespace experimental {
inline namespace concurrency_v1 {

  class inline_executor;

} // namespace concurrency_v1
} // namespace experimental
} // namespace std

Class inline_executor is a simple executor which intrinsically only provides the add() interface as it provides no queuing and instead immediately executes work on the calling thread. This is effectively an adapter over the executor interface but keeps everything on the caller's context.

class inline_executor : public executor {
public
    explicit inline_executor();
    // [executor methods omitted]
};

inline_executor::inline_executor()

Effects:: Creates a dummy executor object which only responds to the add() call by immediately executing the provided function in the caller's thread.

2.6.9

Class `thread_per_task_executor`

[executors.classes.thread]

2.6.10

Header `<experimental/thread_per_task_executor>` synopsis

[executors.classes.thread.syn]

namespace std {
namespace experimental {
inline namespace concurrency_v1 {

  class thread_per_task_executor;

} // namespace concurrency_v1
} // namespace experimental
} // namespace std

Class thread_per_task_executor is a simple executor that executes each task (closure) on its own std::thread instance.

class thread_per_task_executor : public executor {
public:
    explicit thread_per_task_executor();
    ~thread_per_task_executor();
    // [executor methods omitted]
};

thread_per_task_executor::thread_per_task_executor()

Effects:: Creates an executor that runs each closure on a separate thread.

thread_per_task_executor::~thread_per_task_executor()

Effects:: Waits for all added closures (if any) to complete, then joins and destroys the threads.

Improvements to `std::future<T>` and Related APIs

[future]

3.1

General

[futures.general]

The extensions proposed here are an evolution of the functionality of std::future and std::shared_future. The extensions enable wait free composition of asynchronous operations.

3.2

Changes to class template `future`

[futures.unique_future]

To the class declaration found in C++14 §30.6.6 paragraph 3, add the following to the public functions:

bool is_ready() const;

future(future<future<R>>&& rhs) noexcept;

template<typename F>
see below then(F&& func);

template<typename F>
see below then(executor &ex, F&& func);

template<typename F>
see below then(launch policy, F&& func);

In C++14 §30.6.6 between paragraphs 8 and 9, add the following:

future(future<future<R>>&& rhs) noexcept;

Effects:: Constructs a future object by moving the instance referred to by rhs and unwrapping the inner future.
Postconditions:: valid() returns the same value as rhs.valid() prior to the constructor invocation.

rhs.valid() == false.

After C++14 §30.6.6 paragraph 26, add the following:


  template<typename F>
  see below then(F&& func);

  template<typename F>
  see below then(executor &ex, F&& func);

  template<typename F>
  see below then(launch policy, F&& func);

Notes:: The three functions differ only by input parameters. The first only takes a callable object which accepts a future object as a parameter. The second function takes an executor as the first parameter and a callable object as the second parameter. The third function takes a launch policy as the first parameter and a callable object as the second parameter.
Effects:: The continuation INVOKE(DECAY_COPY (std::forward<F>(func))) is called when the object's shared state is ready (has a value or exception stored).

The continuation launches according to the specified launch policy or executor.

When the executor or launch policy is not provided the continuation inherits the parent's launch policy or executor.

Any value returned from the continuation is stored as the result in the shared state of the resulting future. Any exception propagated from the execution of the continuation is stored as the exceptional result in the shared state of the resulting future.

If the parent was created with std::promise or with a packaged_task (has no associated launch policy), the continuation behaves the same as the third overload with a policy argument of launch::async | launch::deferred and the same argument for func.

If the parent has a policy of launch::deferred, then it is filled by calling wait() or get() on the resulting future. [ Example:
auto f1 = async(launch::deferred, [] { return 1; }); auto f2 = f1.then([](future n) { return 2; }); f2.wait(); // execution of f1 starts here, followed by f2
— end example ]
Returns:: The return type of then depends on the return type of the closure func as defined below:

When result_of_t<decay_t<F>()> is future<R>, the function returns future<R>.

Otherwise, the function returns future<result_of_t<decay_t<F>()>>.
Notes:
The first rule above is called the implicit unwrapping. Without this rule, the return type of then taking a closure returning a future<R> would have been future<future<R>>. This rule avoids such nested future objects. [ Example: The type of f2 below is future<int> and not future<future<int>>:
future<int> f1 = g(); future<int> f2 = f1.then([](future<int> f) { future<int> f3 = h(); return f3; });
— end example ]
Notes:: The first rule above is called the implicit unwrapping. Without this rule, the return type of then taking a closure returning a future<R> would have been future<future<R>>. This rule avoids such nested future objects. [ Example: The type of f2 below is future<int> and not future<future<int>>:
future<int> f1 = g(); future<int> f2 = f1.then([](future<int> f) { future<int> f3 = h(); return f3; });
— end example ]
Postconditions:: The future object is moved to the parameter of the continuation function.

valid() == false on original future object immediately after it returns.

bool is_ready() const;

Returns:: true if the shared state is ready, false if it isn't.

3.3

Changes to class template `shared_future`

[futures.shared_future]

To the class declaration found in C++14 §30.6.7 paragraph 3, add the following to the public functions:

bool is_ready() const;

template<typename F>
see below then(F&& func);

template<typename F>
see below then(executor &ex, F&& func);

template<typename F>
see below then(launch policy, F&& func);

After C++14 §30.6.7 paragraph 28, add the following:


template<typename F>
see below shared_future::then(F&& func);

template<typename F>
see below shared_future::then(executor &ex, F&& func);

template<typename F>
see below shared_future::then(launch policy, F&& func);

Notes:: The three functions differ only by input parameters. The first only takes a callable object which accepts a shared_future object as a parameter. The second function takes an executor as the first parameter and a callable object as the second parameter. The third function takes a launch policy as the first parameter and a callable object as the second parameter.
Effects:: The continuation INVOKE(DECAY_COPY (std::forward<F>(func))) is called when the object's shared state is ready (has a value or exception stored).

The continuation launches according to the specified policy or executor.

When the scheduler or launch policy is not provided the continuation inherits the parent's launch policy or executor.

Any value returned from the continuation is stored as the result in the shared state of the resulting future. Any exception propagated from the execution of the continuation is stored as the exceptional result in the shared state of the resulting future.

If the parent was created with std::promise (has no associated launch policy), the continuation behaves the same as the third function with a policy argument of launch::async | launch::deferred and the same argument for func.

If the parent has a policy of launch::deferred, then it is filled by calling wait() or get() on the resulting shared_future. [ Note: This is similar to future. See example in 3.2. — end note ]
Returns:: The return type of then depends on the return type of the closure func as defined below:

When result_of_t<decay_t<F>()> is future<R>, the function returns future<R>.

Otherwise, the function returns future<result_of_t<decay_t<F>()>>.
Notes:
This is the same as in future. See the notes on future::then return type in 3.2.
Notes:: This is the same as in future. See the notes on future::then return type in 3.2.
Postconditions:: The shared_future passed to the continuation function is a copy of the original shared_future.

valid() == true on the original shared_future object.

bool is_ready() const;

Returns:: true if the shared state is ready, false if it isn't.

3.4

Function template `when_all`

[futures.when_all]

A new section 30.6.10 shall be inserted at the end of C++14 §30.6. Below is the content of that section.


template 
see below when_all(InputIterator first, InputIterator last);

template <typename... T>
see below when_all(T&&... futures);

Requires:: T is of type future<R> or shared_future<R>.
Notes:: There are two variations of when_all. The first version takes a pair of InputIterators. The second takes any arbitrary number of future<R0> and shared_future<R1> objects, where R0 and R1 need not be the same type.

Calling the first signature of when_all where InputIterator first equals last, returns a future with an empty vector that is immediately ready.

Calling the second signature of when_any with no arguments returns a future<tuple<>> that is immediately ready.
Effects:: Each future and shared_future is waited upon and then copied into the collection of the output (returned) future, maintaining the order of the futures in the input collection.

The future returned by when_all will not throw an exception, but the futures held in the output collection may.
Returns:: future<tuple<>> if when_all is called with zero arguments.

future<vector<future<R>>> if the input cardinality is unknown at compile and the iterator pair yields future<R>. R may be void. The order of the futures in the output vector will be the same as given by the input iterator.

future<vector<shared_future<R>>> if the input cardinality is unknown at compile time and the iterator pair yields shared_future<R>. R may be void. The order of the futures in the output vector will be the same as given by the input iterator.

future<tuple<future<R0>, future<R1>, future<R2>...>> if inputs are fixed in number. The inputs can be any arbitrary number of future and shared_future objects. The type of the element at each position of the tuple corresponds to the type of the argument at the same position. Any of R0, R1, R2, etc. may be void.
Postconditions:: All input future<T>s valid() == false.

All output shared_future<T> valid() == true.

3.5

Function template `when_any`

[futures.when_any]

A new section 30.6.11 shall be inserted at the end of C++14 §30.6. Below is the content of that section.


template <class InputIterator>
see below when_any(InputIterator first, InputIterator last);

template <typename... T>
see below when_any(T&&... futures);

Requires:: T is of type future<R> or shared_future<R>.
Notes:: There are two variations of when_any. The first version takes a pair of InputIterators. The second takes any arbitrary number of future<R> and shared_future<R> objects, where R need not be the same type.

Calling the first signature of when_any where InputIterator first equals last, returns a future with an empty vector that is immediately ready.

Calling the second signature of when_any with no arguments returns a future<tuple<>> that is immediately ready.
Effects:: Each future and shared_future is waited upon. When at least one is ready, all the futures are copied into the collection of the output (returned) future, maintaining the order of the futures in the input collection.

The future returned by when_any will not throw an exception, but the futures held in the output collection may.
Returns:: future<tuple<>> if when_any is called with zero arguments.

future<vector<future<R>>> if the input cardinality is unknown at compile time and the iterator pair yields future<R>. R may be void. The order of the futures in the output vector will be the same as given by the input iterator.

future<vector<shared_future<R>>> if the input cardinality is unknown at compile time and the iterator pair yields shared_future<R>. R may be void. The order of the futures in the output vector will be the same as given by the input iterator.

future<tuple<future<R0>, future<R1>, future<R2>...>> if inputs are fixed in number. The inputs can be any arbitrary number of future and shared_future objects. The type of the element at each position of the tuple corresponds to the type of the argument at the same position. Any of R0, R1, R2, etc. maybe void.
Postconditions:: All input future<T>s valid() == false.

All input shared_future<T> valid() == true.

3.6

Function template `when_any_back`

[futures.when_any_back]

A new section 30.6.12 shall be inserted at the end of C++14 §30.6. Below is the content of that section.


template <class InputIterator>
see below when_any_back(InputIterator first, InputIterator last);

Requires:: InputIterator's value type shall be convertible to future<R> or shared_future<R>. All R types must be the same.
Notes:: The function when_any_back takes a pair of InputIterators.

Calling when_any_back where InputIterator first equals last, returns a future with an empty vector that is immediately ready.
Effects:: Each future and shared_future is waited upon. When at least one is ready, all the futures are copied into the collection of the output (returned) future.

After the copy, the future or shared_future that was first detected as being ready swaps its position with that of the last element of the result collection, so that the ready future or shared_future may be identified in constant time. Only one future or shared_future is thus moved.

The future returned by when_any_back will not throw an exception, but the futures held in the output collection may.
Returns:: future<vector<future<R>>> if the input cardinality is unknown at compile time and the iterator pair yields future<R>. R may be void.

future<vector<shared_future<R>>> if the input cardinality is unknown at compile time and the iterator pair yields shared_future<R>. R may be void.
Postconditions:: All input future<T>s valid() == false.

All input shared_future valid() == true.

3.7

Function template `make_ready_future`

[futures.make_ready_future]

A new section 30.6.13 shall be inserted at the end of C++14 §30.6. Below is the content of that section.


  template <typename T>
  future<decay_t<T>> make_ready_future(T&& value);

  future<void> make_ready_future();

Effects:: The value that is passed in to the function is moved to the shared state of the returned future if it is an rvalue. Otherwise the value is copied to the shared state of the returned future.
Returns:: future<decay_t<T>>, if function is given a value of type T.

future<void>, if the function is not given any inputs.
Postconditions:: Returned future<decay_t<T>>, valid() == true.

Returned future<decay_t<T>>, is_ready() == true.

3.8

Function template `async`

[futures.async]

Change C++14 §30.6.8 paragraph 1 as follows:

The function template async provides a mechanism to launch a function potentially in a new thread and provides the result of the function in a future object with which it shares a shared state.

template <class F, class... Args>
future<result_of_t<decay_t<F>(decay_t<Args>...)>>
async(F&& f, Args&&... args);

template <class F, class... Args>
future<result_of_t<decay_t<F>(decay_t<Args>...)>>
async(launch policy, F&& f, Args&&... args);


template<class F, class... Args>
future<result_of_t<decay_t<F>(decay_t<Args>...)>>
async(executor& ex, F&& f, Args&&... args);

Change C++14 §30.6.8 paragraph 3 as follows:

Effects:

The first function behaves the same as a call to the second function with a policy argument of launch::async | launch::deferred and the same arguments for F and Args. The second and third functions creates a shared state that is associated with the returned future object. The further behavior of the second function depends on the policy argument as follows (if more than one of these conditions applies, the implementation may choose any of the corresponding policies):

if policy & launch::async is non-zero — calls INVOKE (DECAY_COPY (std::forward<F>(f)), DECAY_COPY (std::forward<Args>(args))...) (20.8.2, 30.3.1.2) as if in a new thread of execution represented by a thread object with the calls to DECAY_COPY () being evaluated in the thread that called async. Any return value is stored as the result in the shared state. Any exception propagated from the execution of INVOKE (DECAY_COPY (std::forward<F>(f)), DECAY_COPY (std::forward<Args>(args))...) is stored as the exceptional result in the shared state. The thread object is stored in the shared state and affects the behavior of any asynchronous return objects that reference that state.
if policy & launch::deferred is non-zero — Stores DECAY_COPY(std::forward<F>(f)) and DECAY_COPY (std::forward<Args>(args))... in the shared state. These copies of f and args constitute a deferred function. Invocation of the deferred function evaluates INVOKE std::move(g), std::move(xyz)) where g is the stored value of DECAY_COPY (std::forward<F>(f)) and xyz is the stored copy of DECAY_COPY (std::forward<Args>(args)).... The shared state is not made ready until the function has completed. The first call to a non-timed waiting function (30.6.4) on an asynchronous return object referring to this shared state shall invoke the deferred function in the thread that called the waiting function. Once evaluation of INVOKE (std::move(g), std::move(xyz)) begins, the function is no longer considered deferred. [ Note: If this policy is specified together with other policies, such as when using a policy value of launch::async | launch::deferred, implementations should defer invocation or the selection of the policy when no more concurrency can be effectively exploited. — end note ]
If no value is set in the launch policy, or a value is set that is neither specified in this International Standard or by the implementation, the behaviour is undefined.

The further behavior of the third function is as follows:

The executor::add() function is given a function<void()> which calls INVOKE (DECAY_COPY (std::forward<F>(f)) DECAY_COPY (std::forward<Args>(args))...). The implementation of the executor is decided by the programmer.

Working Draft, Technical Specification for C++ Extensions for Concurrency