Document Number: | P0054R00 |
---|---|
Date: | 2015-09-12 |
Project: | Programming Language C++, Evolution |
Revises: | none |
Reply to: | gorn@microsoft.com |
An experimental version of the compiler supporting coroutines (aka resumable functions N4134, N4286, N4402) was out in the wild for nearly a year now. This paper proposes changes based on the feedback received from customers experimenting with it, the feedback from WG21 committee members, and learning from the experience of converting large shipping application to use coroutines for asynchronous operations. The updated wording is provided in a separate paper P0057R00.
In revision 4 of the resumable function proposal (N4402),
the requirements on the return type of initial_suspend
,
final_suspend
and yield_value
member functions were
changed from having to return an Awaitable
type, to a type
contextually convertible to bool.
Before N4402:
struct coro {
struct promise_type {
std::experimental::suspend_never initial_suspend() { return {}; }
std::experimental::suspend_always final_suspend() { return {}; }
...
}
};
After N4402:
struct coro {
struct promise_type {
bool initial_suspend() { return false; }
bool final_suspend() { return false; }
...
}
};
While this made simple code slightly simpler, it also made it impossible to write the correct code for less trivial coroutine scenarios. Consider the following:
Imagine a case where an invocation of a coroutine immediately schedules it to
execute on a thread pool and yields control back to the caller. Murphy's law
guarantees that a scheduler will execute the coroutine to completion and
deallocate all the memory associated with the coroutine state even prior to
initial_suspend
call returning.
Prior to N4402, initial suspend point was defined in terms of operator
await
, i.e. await $promise.initial_suspend()
. await
operator mechanics took care of the race when the resumption of the coroutine
happens before await_suspend
completes, by preparing the coroutine
for the resumption prior to the invocation of the await_suspend
. A
check via await_ready
was used to avoid this potentially expensive
preparation if the result of the computation was already available. (Expensive
here means a few extra store operations, such as saving non-volatile registers
in use, and storing an address or an index of the resume point, for example).
To handle the race, N4402's initial_suspend() would need to add the logic
similar to that of the await. Hence we propose to go back to defining initial
suspend point via await $promise.initial_suspend()
.
Imagine a case where a library developer would like to combine the allocation
of a future
shared state N4527/[futures.state] with the allocation
of the coroutine state in the case when the coroutine returns the future.
future<int> deep_thought() {
await 7'500'000'000h;
return 42;
}
This will require an atomic reference count in the shared state / promise. One reference will be held by the future, to make sure it can examine the shared state even when the coroutine has completed execution, and another reference will be from the coroutine itself, since it does not want its memory deallocated by the future destructor while in the middle of the execution.
Future destructor will decrement the reference count in the shared state and
if the reference goes to zero will invoke destroy()
member of the
coroutine_handle
to free the state. Similarly, when the coroutine
reaches the final suspend point, it decrements the reference and if it happens
to be zero, meaning the future is gone and no longer requires the shared state,
the coroutine should not suspend at the final point and proceeds straight to the
end and destroy its state.
However, it is possible that the future's destructor has decremented the
reference count immediately after final_suspend() checked that the reference
count is not zero, but before final_suspend returned. This is very similar to
the race described in he previous section and the solution is the same: we need
to rely on await operator to resolve it. Here is how the correct
final_suspend
would look like.
struct promise_type : shared_state<T> { // refcount is in the shared state
auto final_suspend() {
struct awaiter {
promise_type * me;
bool await_ready() { return false; }
void await_resume() {}
bool await_suspend(coroutine_handle<>) {
auto need_suspending = (me->decrement_refcount() > 0);
return need_suspending;
}
};
return awaiter{this};
}
...
};
Consider an asynchronous generator:
async_generator<int> quick_thinker() {
for (;;) {
await 1ms;
yield 42;
}
}
We need to coordinate between a producer i.e. the coroutine shown above and a
consumer that is whomever is holding on to an async_generatorvalue needs
needs to make a determination: if the
consumer is alive, give it the value and resume it, otherwise, the producer
coroutine need to cancel itself by invoking coroutine_handle::destroy() on
itself. This could be implemented correctly with pre N4402 version. Again the
fix is to revert to pre-N4402 behavior and define yield expr
in
terms of await $promise.yield_value
as await_suspend
allows to concurrent resumption of the coroutine either via
resume()
and destroy()
. With the fix, implemenation of
yield_value
would look like:
template <typename T>
struct async_generator {
struct promise_type {
T const * yielded_value;
coroutine_handle<> consumer;
...
auto yield_value(T const& v) {
struct awaiter {
promise_type * me;
bool await_ready() { return false; }
T const & await_resume() { return *me->yielded_value; }
void await_suspend(coroutine_handle<> myself) {
... if consumer is gone => myself.destroy();
... otherwise => consumer.resume();
}
};
yielded_value = &v;
return awaiter{this};
}
};
...
};
Currently await expression
uses a range-based for like lookup
for three member or non-member functions called await_suspend
,
await_ready
and await_resume
. This has not been always
the case. An earlier iteration of the resumable functions proposal that never
got to be an N-numbered paper had defined operator await
that had
to return an awaitable object that has member functions
await_suspend
, await_ready
and
await_resume
.
Let's compare how await adapters used to look like and how they look in N4134 and beyond.
auto sleep_for(chrono::system_clock::duration d) {
struct result_t {
chrono::system_clock::duration d;
auto operator await() {
struct awaiter {
chrono::system_clock::duration duration;
...
awaiter(chrono::system_clock::duration d) : duration(d){}
bool await_ready() const { return duration.count() <= 0; }
void await_resume() {}
void await_suspend(std::experimental::coroutine_handle<> resume_cb){...}
};
return awaiter{d};
}
};
return result_t{d};
}
The authors felt that this was too much boilerplate code and one more local class than desired, hence the N4134 offered a range-based-for like lookup instead of operator await. Indeed under N4134 rules the code is simpler.
auto sleep_for(chrono::system_clock::duration d) {
struct awaiter {
chrono::system_clock::duration duration;
...
awaiter(chrono::system_clock::duration d) : duration(d){}
bool await_ready() const { return duration.count() <= 0; }
void await_resume() {}
void await_suspend(std::experimental::coroutine_handle<> resume_cb){...}
};
return awaiter{d};
}
However this simplification removed one powerful ability that was enabled by
operator await
. It was no longer possible for library author to
rely on a temporary object on a coroutine frame during the await expansion that
can persist for the duration of await expression and can be used to carry state
between await_ready
, await_suspend
and
await_resume
functions. The only form that remained that allowed
this was when awaitable was returned from a function, such as in the example of
sleep_for
above.
Consider this straightforward, but incorrect awaitable adapter for
boost::future
.
template <typename T> bool await_ready(boost::future<T>& f) { return f.ready(); }
template <typename T> T await_resume(boost::future<T>& f) { return f.get(); }
template <typename T> void await_suspend(boost::future<T>& f, coroutine_handle<> cb) {
f.then([cb](auto&&){ cb(); });
}
The problem is that as of version 1.59, future.then returns a future that
blocks in the destructor. Thus coroutine, after subscribing to the completion of
f.then will block at the last curly brace of await_suspend waiting for the
destructor that will block until the future is ready prevening coroutine from
suspending. Though in the case of boost, we can fix boost .then, in case of
other libraries it may not be possible to change them to adapt await within time
constraints. Having operator await
would have addressed this
problem.
To make sure that a future returned from the .then
won't block
the suspend, we need to extend its life for the duration of await expression.
With operator await
we can do it easily:
template <typename T>
auto operator await(boost::future<T> & f) {
struct awaiter {
future<T>* me;
future<T> keep_this;
bool await_ready() { return me->ready(); }
T await_resume() { return me->get(); }
void await_suspend(coroutine_handle<> cb) {
keep_this = f.then([cb](auto&&){ cb(); });
}
}
return awaiter{this, {}};
}
Another case for operator await
is adapter efficiency. Imagine
that we want to do a lean future that allows multiple coroutines to subscribe
their awaits on lean_future's .then
and make the subscription
operation via .then
to be noexcept
and not perform any
memory allocations. operator await
makes this possible:
template <typename T>
auto operator await(lean_future<T> & f) {
struct awaiter {
lean_future<T>* me;
lean_future<T>::intrusive_link link;
bool await_ready() { return me->ready(); }
T await_resume() { return me->get(); }
void await_suspend(coroutine_handle<> cb) {
keep_this = f.then(&me->link);
}
}
return awaiter{this, {}};
}
Since operator await
enables library to control the temporary
that lives for the duration of the await-expression, library writer can include
in the temporary the intrusive_list::link so that it can be directly linked into
the intrusive list in the lean_future. That removes an allocation and a failure
mode. In kernel mode of operating system, in game development those are
important properties.
Now, Some of these tecniques are possible today with N4134, but only with
awaitables that are temporaries returned from a function, like in sleep_for in
earlier in this section. Having operator await
fixes existing
assymmetry that different awaitables have different expressive power.
We would like to bring back operator await
with an improvement
that will result in less boilerplate code. The proposed change is to make an
operator await
to be implicitly defined for a class that has
await_suspend
, await_resume
and
await_ready
and it is defined as an identity function. It returns
the object itself. With this approach, we can now address the problems described
above and retain the concise style available today.
As a bonus, it is now possible to write an await adapter for chrono::duration
that we have been sneakily using throughout this paper that allows us to write
await 10ms
. Behold:
template <class Rep, class Period>
auto operator await(chrono::duration<Rep, Period> d) {
struct awaiter {
chrono::system_clock::duration duration;
...
awaiter(chrono::system_clock::duration d) : duration(d){}
bool await_ready() const { return duration.count() <= 0; }
void await_resume() {}
void await_suspend(std::experimental::coroutine_handle<> resume_cb){...}
};
return awaiter{d};
}
No good reason. Pre-N4402 it was an expression. N4402 did a lot of "simplifications" that are now being undone. Making yield a statement as opposed to an expression was one of the "simplifications".
The suggested change here is to let yield expr
and yield
{expr}
be expressions not statements with the same precedence as a
throw expr
.
assignment-expression:
conditional-expression
logical-or-expression assignment-operator initializer-clause
throw-expression
yield-expression
This precedence would allow yield
to be used with comma
operator
and at the same time to be able to write yield 1 +
2
without surprising parsing (yield 1) + 2
.
Side effect of this change and making yield_value
return
awaitable as required to fix the defect described in previous section opens the
possibility for library writers to invent and implement semantics for
yield-expresion returning something back into the coroutine enabling
two way communication between the generator and the consumer.
Authors have received a strongly worded feedback that it is highly
undesirable to make a language feature dependent on std::allocators. Other
language features rely on allocating via operator new
and use
overloading of operator new
as a way to customize allocations for
classes that require specialized allocation strategies.
To address this concern and bring the coroutines more in line with other
language features, if a coroutine requires dynamic memory allocation for its
state, it will call operator new
and customization of allocations
could be done by overloading operator new. We implemented this change and
discovered that most of the user code that customized coroutine allocations with
stateless allocators shrunk significantly.
Before:
template <typename T, typename... Ts>
struct coroutine_traits<generator<T>, use_counting_allocator_t, Ts...> {
template <typename T>
struct counting_allocator {
std::allocator<T> inner;
using value_type = T;
T* allocate(std::size_t n) {
bytes_allocated += n * sizeof(T);
return inner.allocate(n);
}
void deallocate(T* p, std::size_t n) {
bytes_freed += n * sizeof(T);
inner.deallocate(p, n);
}
};
template <typename... Us>
static auto get_allocator(Us&&...) {
return counting_allocator<char>{};
}
using promise_type = typename generator<T>::promise_type;
};
After
template <typename T, typename... Ts>
struct coroutine_traits<generator<T>, use_counting_allocator_t, Ts...> {
struct promise_type : generator<T>::promise_type {
void* operator new(size_t size) {
bytes_allocated += size * sizeof(T);
return ::operator new(size);
};
void operator delete(void* p, size_t size) {
bytes_freed += size * sizeof(T);
::operator delete(p, size);
}
};
};
Note that in the get_allocator example, get_allocator it is getting all of
the coroutine arguments so that if it is a stateful allocator it can pick up
required information from the arguments. The suggested change preserves an
ability to pass information to an allocation routine, but, it keeps the simple
case (non-stateful) simple by using the following rule: if the coroutine promise
defines an operator new
that take just size_t
, it will
be used to allocate the memory for the coroutine, otherwise, the compiler will
use the new-expression
of the form promise_type::operator
new(required-size, all of the arguments passed to a coroutine)
. The
latter forms allows for an overloaded new to extract required allocator
parameters.
Finally, to preserve parity with N4402 with respect to allocators, we need to
address coroutine operations in the environment where allocation functions
cannot throw. N4402 was determining the need for special handling of allocations
by checking if get_return_object_on_allocation_failure
static
member function was present in coroutine_traits
, we suggest to move
it to coroutine_promise
and use std::nothrow_t&
form of operator new
in this case.
Before:
struct coro {
struct promise_type {
coro get_return_object();
...
};
};
template <typename... Args> struct coroutine_traits<coro, Args...> {
static coro get_return_object_on_allocation_failure();
using promise_type = coro::promise_type;
};
After:
struct coro {
struct promise_type {
static coro get_return_object_on_allocation_failure();
coro get_return_object();
...
};
};
With this changes, not only we remove dependency of coroutines on
std::allocator and friends, we also moved most of the functionality present in
coroutine_traits
that deal with allocation concerns into the
coroutine promise making specializing coroutine_traits
unnecessary
in majority of cases. The only remaining case for using
coroutine_traits
is when one defines a coroutine promise for a type
that belong to some pre-existing library that cannot be altered.
This section describe some changes we are exploring at the moment, but, did not have time to implement and experiment with. We plan to proposem at the next meeting. They are listed here for an opportunity for early feedback.
One of the pattern in use with frameworks using .then
is to use
a cancellation flag / token to be passed to a function and furnished to every
.then
to facilitate cancellation.
When porting the code to use await
, every await expression was
wrapped with an awaitable adapter that would take an existing awaitable and
augment it to check the cancellation flag and cancel the coroutine if
required.
auto bytesRead = await conn.Read(buf, len);
would become
auto bytesRead = await CheckCancel(cancelToken, conn.Read(buf, len));
Adding CheckCancel at every await site is cumbersome and error prone.
We would like to provide an ability for the coroutine type author to specify
an await_transform
member in the promise_type of the coroutine. If
present, every await expr
in that coroutine would be as if it was
await $promise.await_transform(expr)
.
Besides helping with cancellation, await_transform has other uses:
With an appropriate await_transform, coroutine can trace/log when it is suspended, when it is resumed, whether suspension was avoided due to await_ready being true, etc. This allows debugging tools accumulate information for asynchronous activity visualization. It can be used for capturing the traces for problem or performance analysis.
In N4402 whether await
is allowed or not in the coroutine is
tied to whether the coroutine promise defines
return_value
/return_void
with argumentation that
coroutines that await
on something have an eventual value return
value, but, generators do not. This restriction was introduces in N4402 to help
detect mistakes at compile time when await
is used in coroutines
that don't support it.
await_transform allows library author trivially to specify a compile check whether coroutine is allowed or not to use await and limitation introduced in N4402 is no longer required.
Resumable expressions paper (N4453)
has a compelling example of magically transforming a function template
into a coroutine depending on the OutputIterator
supplied.
template <class OutputIterator> void fib(int n, OutputIterator out) {
int a = 0;
int b = 1;
while (n-‐ > 0) {
*out++ = a;
auto next = a + b;
a = b;
b = next;
}
}
This section sketches out an idea how coroutines can evolve to support the
scenario above. The idea is simple. If a function returns an object of type that
is marked with auto await
, an await
is injected into
the calling function. For the example above, dereferencing of an iterator would
return a proxy that has an overloaded operator =
that returns
automatically awaited awaitable.
auto MyProxy::operator=(int output) {
struct Awaitable auto await { ... };
return Awaitable{...};
}
Thus an expression *out++ = a
will become await (*out++ =
a)
. Awaitable will transfer supplied value a
to the consumer
and suspend the function fib
until the next value is requested.
Note that this has not been designed, implemented and there is no
immediate plan to pursue this approach.
One concern with this approach is that it interferes with composability of
awaitable expressions. If f()
and g()
returns
awaitables, we would like to be able to transform awaitable in questions prior
to applying await
to them. For example, evaluation of await
f() + await g()
reduces concurrency as it would be more beneficial to
execute it as await (f() + g())
, where the result of +
is a composite awaitable that will wait until both results of f()
and g()
are ready and will resume the coroutine providing the sum
of the eventual results of f() and g().
Another concern is that it is now near impossible for the reader to figure out whether function is a coroutine or not unless we can audit every function call, implicit conversion, overloaded operator in the body of the function and figuire out if it can return automatically awaited awaitable.
Moreover, even though coroutines allow asynchrous code to be written nearly
as simple as synchronous, they do not eliminate the need to think about and
properly design the lifetime of the asynchronous activity. Const-ref parameters
const&
that are perfectly fine to consume in a normal function
may result in a crash, information disclosure and more if the function is a
coroutine which lifetime extends beyond the lifetime of the object bound to that
const&
parameter.
Kavya Kotacherry, Daveed Vandevoorde, Richard Smith, Jens Maurer, Lewis Baker, Kirk, Shoop, Hartmut Kaiser, Kenny Kerr, Artur Laksberg, Jim Radigan, Chandler Carruth, Gabriel Dos Reis, Deon Brewis, Jonathan Caves, James McNellis, Stephan T. Lavavej, Herb Sutter, Pablo Halpern, Robert Schumacher, Viktor Tong, Michael Wong, Niklas Gustafsson, Nick Maliwacki, Vladimir Petter, Shahms King, Slava Kuznetsov, Tongari J, Lawrence Crowl, Valentin Isac and many more who contributed.
P0055r00: On Interactions Between
Coroutines and Networking (http://wg21.link/P0055R00)
P0057r00:
Wording for Coroutines, Revision 3 (http://wg21.link/P0057R00)
N4527:
Working Draft, Standard for Programming Language C++ (http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4527.pdf)
N4402: Resumable Functions
(revision 4) (https://isocpp.org/files/papers/N4402.pdf)
N4286:
Resumable Functions (revision 3) (http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4286.pdf)
N4134:
Resumable Functions v2 (http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n4134.pdf)
N4453:
Resumable Expressions (http://open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4453.pdf)