Doc. no: P0286R0 Date: 2016-02-14 Audience: Library Evolution Reply-To: Christopher Kohlhoff <chris@kohlhoff.com>
This paper outlines a pure extension to the draft Networking Technical Specification
to add support for co_await
-based
coroutines. This extension allows us to leverage coroutines to write asynchronous
code in a synchronous style, as in:
awaitable<void> echo(tcp::socket socket, await_context ctx) { try { for (;;) { char data[128]; std::size_t n = co_await socket.async_read_some(net::buffer(data), ctx); co_await async_write(socket, net::buffer(data, n), ctx); } } catch (std::exception& e) { std::cerr << "echo Exception: " << e.what() << std::endl; } }
The design presented in this paper reflects the view that, when using coroutines to compose asynchronous operations, coroutines must be considered in conjunction with executors. Typical networking programs consist of multiple threads of execution (whether implemented using coroutines or as simple chains of callbacks). Indeed, one of the motivations for using coroutines and asynchronous operations is greater control over scheduling than that provided by the OS's thread scheduler. This control allows for both better performance and simplified programming.
Consequently, the design presented below has the following features:
await_context
object. This object is a completion token, and when passed to an asynchronous
operation causes the operation to "block" the current coroutine
in a synchronous-like manner.
spawn
function. This
function also allows the user to specify the execution properties of the
new thread of execution.
An implementation of this proposal text may be found in a branch of the variant of Asio that stands alone from Boost. This branch is available at https://github.com/chriskohlhoff/asio/tree/co_await. It has been tested with Microsoft Visual Studio 2015 Update 1, and depends specifically on the version of the proposed coroutine functionality delivered with that compiler.
To begin, we will examine a simple TCP server that echoes back any data it receives. The main function is as follows:
int main() { try { net::io_context io_context; spawn(io_context, listener, detached); io_context.run(); } catch (std::exception& e) { std::cerr << "Exception: " << e.what() << std::endl; } }
Here, the call to the function spawn
:
spawn(io_context, listener, detached);
launches a coroutine as a new thread of execution. The first argument specifies
that this new thread of execution will be scheduled by the io_context
. The entry point for this new
thread of execution is the function listener
,
which we will see below. The final argument, detached
,
is a special completion token that tells spawn
that we are not interested in the result of the coroutine.
The listener
is a free function:
awaitable<void> listener(await_context ctx) { tcp::acceptor acceptor(ctx.get_executor().context(), {tcp::v4(), 55555}); for (;;) { spawn(acceptor.get_executor(), echo, co_await acceptor.async_accept(ctx), detached); } }
The listener
function returns
an awaitable<void>
.
This indicates that it must either be the entry point of a new thread of
execution, or itself be co_await
-ed.
The listener
function also
accepts an await_context
as its parameter. This parameter represents the context in which the coroutine
is executing, and is passed as a completion token to any asynchronous operations
called by the coroutine, such as:
co_await acceptor.async_accept(ctx)
When the ctx
completion token
is passed to an asynchronous operation, that operation's initiating function
returns an awaitable<T>
.
We must apply the co_await
keyword to this return value to suspend the coroutine.
In this listener, private state (such as acceptor
)
may simply be declared as a stack-based variable. As each new connection
is accepted, the listener spawns a new, detached thread of execution to handle
the incoming client:
spawn(acceptor.get_executor(), echo, co_await acceptor.async_accept(ctx), detached);
The first argument to spawn
specifies that the new thread of execution will be scheduled using the acceptor's
io_context
. This is the
io_context
object that we
created in main
. In the case
where multiple threads are running the io_context
,
this would allow the new thread of execution to execute concurrently. This
is a safe choice only if the new thread of execution is truly independent
and does not access shared data structures. (Note that, in this example,
only the main thread runs the io_context
and so all coroutines will be scheduled in a single thread in any case.)
The entry point for the new thread of execution is the echo
function, and this time we are passing it the result of the async_accept
operation. The echo
function accepts this result in its
parameter list:
awaitable<void> echo(tcp::socket socket, await_context ctx) { try { for (;;) { char data[128]; std::size_t n = co_await socket.async_read_some(net::buffer(data), ctx); co_await async_write(socket, net::buffer(data, n), ctx); } } catch (std::exception& e) { std::cerr << "echo Exception: " << e.what() << std::endl; } }
As with the listener
, private
state such as the data
buffer
may simply be specified as stack variable in the coroutine. We pass the
ctx
completion token to the
asynchronous operations, and co_await
the awaitable<T>
objects
that they return. Any errors are reported as exceptions, so we catch these
within the coroutine to prevent them from escaping to the main
function.
Just as with normal, synchronous function calls, when using coroutines we wish to be able to refactor a sequence of code into its own function. When doing so, it is vital for ensuring program correctness that the refactored code execute in the same thread of execution, and have the same executor properties as its caller.
For example, lets us say we wish to refactor the echo
function above so that a single async_read_some
/async_write
pair is in its own echo_once
function:
awaitable<void> echo_once(tcp::socket& socket, await_context ctx) { char data[128]; std::size_t n = co_await socket.async_read_some(net::buffer(data), ctx); co_await net::async_write(socket, net::buffer(data, n), ctx); }
This function is then called from echo
as follows:
awaitable<void> echo(tcp::socket socket, await_context ctx) { try { for (;;) { co_await echo_once(socket, ctx); } } catch (std::exception& e) { std::cerr << "echo Exception: " << e.what() << std::endl; } }
By passing the ctx
variable
to echo_once
we ensure that
it is scheduled using the same executor. Furthermore, the caller applies
co_await
to the awaitable<T>
produced
by echo_once
, guaranteeing
that the echo
function does
not resume until the callee is complete. These two attributes combine to
ensure that the echo_once
function behaves as though part of the same thread of execution as echo
.
The echo server shown above is a trivially asynchronous program in that:
More typically, connection handling involves a number of concurrent threads of execution, such as:
As an example, consider a simple chat server where multiple connections share a chat room. Any message sent by a participant to the room is relayed by the server to all participants.
class chat_session : public chat_participant, public std::enable_shared_from_this<chat_session>
The chat_session
class is
comprised of multiple coroutine-based threads of execution. We want the session
to exist for as long as there is client activity, so we use std::enable_shared_from_this
to keep the chat_session
object alive for as long as its constituent coroutines.
{ tcp::socket socket_; net::steady_timer timer_; chat_room& room_; std::deque<std::string> write_msgs_; net::strand<net::io_context::executor_type> strand_;
The chat_session
class uses
a strand to coordinate the threads of execution and ensure that they do not
execute concurrently.
public: chat_session(tcp::socket socket, chat_room& room) : socket_(std::move(socket)), timer_(socket_.get_executor().context()), room_(room), strand_(socket_.get_executor()) { timer_.expires_at(std::chrono::steady_clock::time_point::max()); } void start() { room_.join(shared_from_this()); spawn(strand_, &chat_session::reader, shared_from_this(), detached); spawn(strand_, &chat_session::writer, shared_from_this(), detached); }
The strand is specified as the executor when launching the two threads of
execution using spawn
.
void deliver(const std::string& msg) { strand_.dispatch( [this, self=shared_from_this(), msg] { write_msgs_.push_back(msg); timer_.cancel_one(); }); }
The deliver
function uses
a short-lived non-coroutine-based thread of execution to add new messages
to the outbound write queue.
private: awaitable<void> reader(await_context ctx) { try { for (std::string read_msg;;) { std::size_t n = co_await net::async_read_until(socket_, net::dynamic_buffer(read_msg, 1024), "\n", ctx); room_.deliver(read_msg.substr(0, n)); read_msg.erase(0, n); } } catch (std::exception&) { stop(); } } awaitable<void> writer(await_context ctx) { try { while (socket_.is_open()) { if (write_msgs_.empty()) { std::error_code ec; co_await timer_.async_wait(redirect_error(ctx, ec));
By default, passing an await_context
to an asynchronous operation will cause errors to be reported via exception.
In this case we handle the error as an expected case, so we use the redirect_error
completion token to capture
the error into an error_code
.
} else { co_await net::async_write(socket_, net::buffer(write_msgs_.front()), ctx); write_msgs_.pop_front(); } } } catch (std::exception&) { stop(); } } void stop() { room_.leave(shared_from_this()); socket_.close(); timer_.cancel(); } };
This paper proposes the following extensions to the Networking Technical Specification
to add support for co_await
-based
coroutines.
template<class T> awaitable;
Class template awaitable
represents the return type of an asynchronous operation when used with coroutines,
or of a coroutine function that composes asynchronous operations. The awaitable<T>
class
satisfies the Awaitable type requirements.
An awaitable<T>
can
be consumed by at most one co_await
keyword.
template<class Executor> class basic_unsynchronized_await_context;
Class template basic_unsynchronized_await_context
is a completion token type that causes asynchronous operations to produce
an awaitable<T>
as
their initiating function return type.
basic_unsynchronized_await_context<Executor>
class introduces no synchronization on
top of the underlying Executor
object. It requires an executor that provides mutual exclusion semantics.
This minimizes the overhead of coroutines when executing on a single threaded
io_context
, since it is implicitly
a mutual exclusion executor.
template<class Executor> using basic_await_context = basic_unsynchronized_await_context<strand<Executor>>;
basic_await_context
is a
template alias that addresses the common use case of coordinating coroutine
execution in a multithreaded context (such as a thread pool). It uses a
strand<>
to provide the requisite mutual exclusion semantics.
typedef basic_await_context<executor> await_context;
This typedef uses the basic_await_context
template with the polymorphic executor wrapper. This maximizes ease of use,
particularly when calling coroutine functions across module boundaries, with
some runtime cost.
template<class Executor, class F, class Arg1, ..., class ArgN, class CompletionToken> DEDUCED spawn(const Executor& ex, F&& f, Arg1&& arg1, ..., ArgN&& argN, CompletionToken&& token); template<class ExecutionContext, class F, class Arg1, ..., class ArgN, class CompletionToken> DEDUCED spawn(ExecutionContext& ctx, F&& f, Arg1&& arg1, ..., ArgN&& argN, CompletionToken&& token); template<class Executor, class F, class Arg1, ..., class ArgN, class CompletionToken> DEDUCED spawn(const basic_unsynchronized_await_context<Executor>& ctx, F&& f, Arg1&& arg1, ..., ArgN&& argN, CompletionToken&& token);
The function template spawn
is used to launch a new coroutine-based thread of execution.
The first argument determines the executor to be used for scheduling the
coroutine. In the case of the final overload, the new coroutine inherits
the executor of the specified basic_unsynchronized_await_context
.
(This final overload is provided as a convenience for launching related coroutines
that should not be scheduled concurrently.)
These overloads shall not participate in function overload resolution unless
the return type of f(arg1, ..., argN, basic_unsynchronized_await_context<Executor>)
is an awaitable<T>
for some type T
.
Note that the function spawn
meets the requirements of an asynchronous operation, which means that we
can pass any completion token type to it. In the examples above, we use the
detached
completion token
which is defined in this proposal, but other options include plain callbacks:
awaitable<int> my_coroutine(await_context ctx); // ... spawn(my_executor, my_coroutine, [](int result) { ... });
or the use_future
completion
token:
awaitable<int> my_coroutine(await_context ctx); // ... std::future<int> f = spawn(my_executor, my_coroutine, std::experimental::use_future);
class detached_t { }; constexpr detached_t detached;
The class detached_t
is a
completion token that is used to indicate that an asynchronous operation
is detached. That is, there is no completion handler waiting to receive the
operation's result. It is typically used by passing the detached
object as the completion token argument.
This class is independent of the coroutine facility and may have some utility in other use cases.
template<class CompletionToken> class redirect_error_t; template<class CompletionToken> redirect_error_t<decay_t<CompletionToken>::type> redirect_error(CompletionToken&& completion_token, error_code& ec);
The class template redirect_error_t
is a completion token that is used to specify that the error produced by
an asynchronous operation is captured to an error_code
variable. By intercepting the error code before it is passed to the coroutine,
we may prevent the coroutine from throwing an exception on resumption. For
example:
char data[1024]; std::error_code ec; std::size_t n = co_await my_socket.async_read_some( net::buffer(data), redirect_error(ctx, ec)); if (ec == net::stream_errc::eof) { ... }
This class is independent of the coroutine facility and may have some utility in other use cases.
Whether an application uses coroutines or callbacks, a chain of asynchronous operations conceptually behaves as though it is a thread of execution. Furthermore, all but the most trivial networking programs will consist of multiple threads of execution interacting and operating on shared data.
Consequently, it is essential that coroutine facilities intended for networking support executors. This allows us to manage the scheduling of related coroutines that operate on shared data. Indeed, we should allow the scheduling of both coroutine- and non-coroutine-based threads of execution in a single program.
This proposal addresses this by encoding the executor properties of a thread
of execution into the basic_unsynchronized_await_context
completion token. When passed to an asynchronous operation, the operation
will utilize the associated executor when resuming the coroutine.
Similarly, the await context completion token may be passed to child coroutine functions to ensure that these callees observe the same executor properties as the caller, as illustrated in the "Refactoring" example above.
As mentioned above, coordinating multiple threads of execution is a requirement of all but the most trivial applications. Even if a networking application is single-threaded, there still exists concurrency in the scheduling and execution of these threads of execution. Therefore, to reduce the risk of programmer error, the introduction of new threads of execution should be explicit.
In this proposal, new coroutine-based threads of execution are initiated
using the spawn
function.
In addition to launching a new thread of execution, this function requires
the programmer to specify the executor that will be used for it.
Unlike the approach proposed in P0055R0, this proposal does not encode the
implementation of an asynchronous operation into an initiating function's
return type. Specifically, all asynchronous operations that participate in
a coroutine return an awaitable<T>
.
This allows us to perform simple, non-coroutine based composition of coroutine-aware
functions, as in:
awaitable<void> throttled_post(await_context ctx) { if (throttle_required()) return my_simple_timer.async_wait(ctx); else return post(ctx); }
Indeed, this proposal's awaitable<T>
return type mirrors (most of) the regular behaviour of "normal"
function return types. (The main exception being a lack of convertibility
between types.) This allows end users to compose asynchronous operations
and coroutines alike, as shown in the "Refactoring" example above.
In this proposal, the await_context
is passed as the final argument to a thread of execution's entry point. In
early prototypes it was passed as the initial argument, but this interfered
with the ability to implement spawn
using std::invoke
(necessary to support spawn
-ing member functions).
This library proposal should have minimal performance overhead on top of
that already imposed by the co_await
-based
coroutine mechanism.
First, the P0055R0 approach of encoding the implementation into the initiating
function return type appears to be unnecessary. Instead, asynchronous operations
can encapsulate "allocated" state into a temporary coroutine that
is then returned by the initiating function inside an awaitable<T>
object. The compiler's allocation/deallocation elision optimization should
then eliminate the allocation. (Unfortunately, at the time of writing this
could not be verified, due to lack of access to a compiler with this optimization.)
Second, in low latency scenarios where single-threaded execution is employed,
use of basic_unsynchronized_await_context
ensures that coroutines introduce no additional synchronization overhead.
What is less certain, however, is the performance impact of refactoring code into child coroutines within a thread of execution (as shown in the "Refactoring" example above). There is significant machinery required to transport a return value from a callee to the caller. It is not clear whether compiler heroics can reduce this cost to something approaching a normal function return, let alone the coroutine equivalent of inlining the callee.
This is a pure extension to the draft Networking Technical Specification. It does not require changes to that specification nor to any other part of the standard.
This paper proposes an extension to the draft Networking Technical Specification
to add support for co_await
-based
coroutines. These coroutines are specified in P0057R1.
This paper provides an alternative design for integrating the coroutines to that proposed in P0055R0 On Interactions Between Coroutines and Networking Library. In particular, this proposal requires no modification to the design of the draft Networking Technical Specification, and it addresses the design issues raised in section 5 of P0162R0 A response to P0055R0.