Document Number:	N3113=10-0103
Date:	2010-08-18
Project:	Programming Language C++

Peter Sommerlad <peter.sommerlad@hsr.ch>

N3113: Async Launch Policies (CH 36)

Introduction

The limitation of async's launch strategies to only 2 different strategies (sync, async) and a third strategy saying either one makes it hard for vendors to provide better strategies in the future and for users writing portable code wrt to the available strategies.

A bitmask type for the async launch strategy seems to be more suitable than a 3-way enum. However, adaptation of the bitmask requirements (GB53) couldn't be voted in Rapperswil, but were close to and it is expected they will be voted in Batavia. Further discussion provided insight that an enum with corresponding overloaded bit-operators should be chosen.

Problem

Providing only three different possible values for the enum launch and saying that launch::any means either launch::sync or launch::async is very restricting. This hinders future implementors to provide clever infrastructures that can simply used by a call to async(launch::any,...). Also there is no hook for an implementation to provide additional alternatives to launch enumeration and no useful means to combine those (i.e. interpret them like flags). We believe something like async(launch::sync | launch::async, ...) should be allowed and can become especially useful if one could say also something like async(launch::any & ~launch::sync, ....) respectively. This flexibility might limit the features usable in the function called through async(), but it will allow a path to effortless profit from improved hardware/software without complicating the programming model when just using async(launch::any,...)

The visual distinction of launch::sync and launch::async is hard to see. In addition launch::sync is not about synchronous execution, but deferring the function execution until its result is really wanted, which may never. Therefore this document suggests also renaming the launch policy launch::sync to become launch::deferred

Discussion

CH 36 provided the following proposal:

Change in 30.6.1 'enum class launch' to allow further implementation defined values and provide the following bit-operators on the launch values (operator|, operator&, operator~ delivering a launch value). Note: a possible implementation might use an unsigned value to represent the launch enums, but we shouldn't limit the standard to just 32 or 64 available bits in that case and also should keep the launch enums in their own enum namespace.

Change [future.async] p3 according to the changes to enum launch. change --launch::any to "the implementation may choose any of the policies it provides." Note: this can mean that an implementation may restrict the called function to take all required information by copy in case it will be called in a different address space, or even, on a different processor type. To ensure that a call is either performed like launch::async or launch::sync describe one should call async(launch::sync|launch::async,...)

Discussion in Rapperswil:

The discussion discovered that the launch enum served two aspects: On the one hand, there is an implementation view, where an "enum bit" can denote a specific async launch strategy, e.g., in a thread pool, or run it on a GPU. Such a specific launch mechanism provides specific requirements for the underlying function to be run asynchronously, such as copying all input values, or only referring to read-only data to avoid races. On the other hand, there is a user's view where the enum should specify the requirements the user can guarantee for the asynchronously called function and the implementation should be able to select an appropriate one, may be even dynamically for very clever implementations (see below).

To allow this dual nature the enum should provide a hook for implementers to extend it and for users to combine enum values in a useful way, e.g., with a meaning "anything but sync", or "I don't care, because the function is a pure function and would not give any data races or undefined behavior".

The discussion also covered if launch::any is a good name for (launch::sync|launch::async) or for "everything implementers think is safe". However, the name "default" is a keyword and thus unavailable. Nevertheless, the "default" used by the async() function overload without a launch strategy should at least be (launch::sync|launch::async).

Minutes from Discussion in Rapperswil:

existing launch enums: sync, async
    
vendor and future lunch enums: separate_process, other_endian, gpu

possible launch enum sets:
   nothing_outside_standard = async | sync
   no_restrictions_beyond_the_standard = 
   what_implementers_think_is_safe =
   everything_implementers_have = 
    
launch::default <= launch::no_restrictions_beyond_the_standard
   
launch::any = ?
   
std::async( task );
   
std::async( std::launch::async | std::launch::gpu, task );

Proposed text: The value launch::default is at least sync|async. Any vendor extensions shall place no additional restrictions on task interaction.

further discussion on the reflector and emails provided input and changes.

Acknowledgements

Thanks to Detlef Vollmann, Lawrence Crowl, Pete Becker, Alberto Ganesh Barbati, Anthony Williams, Daniel Krügler, Hans Boehm, Michael Wang, Bjarne Stroustrup and the Concurrency subgroup for their comments and contributions to this paper.

Resolved Issues

This paper addresses and details the proposed resolution of FDIS NB comment CH 36. It uses terms of art to be introduced by a paper by Lawrence Crowl that is yet unnumbered at the time of this writing

Proposed Changes

Make launch a bitmask type according to N3110:

In 30.6.1 p1 replace

~~enum class launch { any, async, sync };~~

with

enum class launch : unspecified {
  async = unspecified power of 2,
  deferred = unspecified power of 2
  , implementation defined
};
    
launch  operator|( launch, launch );
launch  operator&( launch, launch );
launch  operator^( launch, launch );
launch  operator~( launch ); 
launch& operator|=(launch & , launch );
launch& operator&=(launch & , launch );
launch& operator^=(launch & , launch );

At the end of 30.6.1 add

The enum type launch is an implementation-defined bitmask type (17.5.2.1.3). [ Note: implementations are encouraged to use bits for individual launch policies. For example, policy launch::deferred has a value of a power of 2. Furthermore, implementations can provide bitmasks to specify restrictions on task interaction by functions launched by async() applicable to a corresponding subset of available launch policies. end note ]

Change 30.6.4 paragraph 2 as follows:

[Note: The result can be any kind of object including a function to compute that result, as used by async when policy is launch::deferredsync. — end note ]

Specify semantics in a clearer way using terminology from 1.10 and an updated 30.4 (according to the paper by Lawrence Crowl)

Change 30.6.9 p3 as follows:

Effects: The first function behaves the same as a call to the second function with a policy argument of launch::any (launch::async|launch::deferred) and the same arguments for F and Args. Implementations who would like to extend the behavior of the first overload are free to do so by adding their extensions to the launch policy under the "as if" rule. The second function creates an associated asynchronous state that is associated with the returned future object. The further behavior of the second function depends on the policy argument as follows. If more than one bullet applies the implementation may choose any applicable policy.

— ( policy & launch::async ) == launch::async — executes INVOKE(decay_copy(std::forward<F>(f)), decay_copy(std::forward<Args>(args))...) (20.8.2, 30.3.1.2) as if in a new thread of execution represented by a thread object with the calls to decay_copy() being evaluated in the thread that called async. Any return value is stored as the result in the associated asynchronous state. Any exception propagated from the execution of INVOKE(decay_copy(std::forward<F>(f)), decay_copy(std::forward<Args>(args))...) is stored as the exceptional result in the associated asynchronous state. The thread object is stored in the associated asynchronous state and affects the behavior of any ~~future~~asynchrounous return objects that reference that state.

—( policy & launch::deferred ) == launch::sync deferred — Stores decay_copy(std::forward<F>(f)) and decay_copy(std::forward<Arg>(args))... in the associated asynchronous state. These copies of f and args constitute a deferred function. Invocation of the deferred function evaluates INVOKE(g, xyz) where g is the stored value of decay_copy(std::forward<F>(f)) and xyz is the stored copy of decay_copy(std::forward<Args.(args)).... The associated asynchronous state is not made ready until the function has completed. The first call to a waiting function ~~waiting~~ on an asynchronous return object referring ~~for~~ the associated asynchronous state created by this async call ~~to become ready~~ shall invoke the deferred function in the thread that called the waiting function; all other calls to waiting functions on asynchronous return objects sharing ~~for~~ the ~~same~~ associated asynchronous state created by this async call ~~to become ready~~ shall block until the deferred function has completed. [ Note: If this policy is specified together with other policies, such as when using a policy value of launch::async|launch::deferred, implementations should defer invocation or the selection of the policy when no more concurrency can be effectively exploited. —end note]

—launc::any — the implementation may choose either policy at any call to async. [ Note: implementations should defer invocation when no more concurrency can be effectively exploited. – end note ]

Change 30.6.9 p5 as follows:

Synchronization: Regardless of provided policy

the invocation of async happens before (1.10) the invocation of f. [ Note: this statement applies even when the corresponding future object is moved to another thread. —end note ]

the completion of the function f happens-before (1.10) the calling thread makes ready the associated asynchronous state. [ Note: f might not be called at all, so its completion might never happen. – end note]

the return from the invocation of async happens-before (1.10) the return from last function that releases the associated asynchronous state.

If (policy & launch::async) == launch::async ~~the invocation is not deferred,~~

a call to a waiting function on an asynchronous return object that shares the associated asynchronous state created by this async call shall block until the associated thread has completed.
the join()on the created thread object happens-before (1.10) the first function that successfully detects the ready status of the associated asynchronous state returns or happens-before (1.10) the return from the last function that ~~that gives up the last reference to~~ releases the associated asynchronous state, whichever happens first. ~~If the invocation is deferred, the completion of the invocation of the deferred function happens-before the calls to the waiting functions return.~~

Change the note in 30.6.9 p 9 as follows:

[ Note: line #1 might not result in concurrency because the async call uses the default ~~launch::any~~ policy, which may use launch::deferredsync, in which case the lambda might not be invoked until the get() call; in that case, work1 and work2 are called on the same thread and there is no concurrency. – end note ] —end example ]