| Document #: | P3941R0 [Latest] [Status] |
| Date: | 2025-12-14 |
| Project: | Programming Language C++ |
| Audience: |
Concurrency Working Group (SG1) Library Evolution Working Group (LEWG) Library Working Group (LWG) |
| Reply-to: |
Dietmar Kühl (Bloomberg) <dkuhl@bloomberg.net> |
One important design of std::execution::task
is that a coroutine resumes after a
co_await on
the same scheduler as the one it was executing on prior to the
co_await. To
achieve this, task transforms the
awaited object obj using
affine_on(obj, sched)
where sched is the
corresponding scheduler. There were multiple concerns raised against the
specification of affine_on and
discussed as part of P3796R1.
This proposal is intended to specifically address the concerns raised
relating to task’s scheduler
affinity and in particular
affine_on. The gist of this proposal
is impose constraints on affine_on
to guarantee it can meet its objective at run-time.
There are a few NB comments raised about the way
affine_on works:
affine_on when the scheduler doesn’t
change.
affine_on
vs. continues_on.
affine_on.
affine_on should not
forward the stop token to the scheduling operation.
affine_on.
The discussion on affine_on revealed
some aspects which were not quite clear previously and taking these into
account points towards a better design than was previously specified:
get_scheduler(get_env(rcvr))
when an algorithm is started. This
requirement is more general than just
affine_on and is introduced by P3718R0:
with this guarantee in place,
affine_on only needs one parameter,
i.e., the sender for the work to be executed.
sched on
which the work needs to resume has to guarantee that it is possible to
resume on the correct execution agent. The implication is that
scheduling work needs to be infallible, i.e., the completion signatures
of scheduler(sched)
cannot contain a set_error_t(E)
completion signature. This requirement should be checked statically.
connected to
a receiver whose environment’s
get_stop_token query yields an
unstoppable_token. In addition, the
schedule operation shall not have a set_stopped_t()
completion signature if the environment’s
get_stop_token query yields an
unstoppable_token. This requirement
should also be checked statically.
affine_on algorithm to avoid
rescheduling. This customisation can be achieved by
connecting
to the result of an affine_on member
function called on the child sender, if such a member function is
present, when
connecting
an affine_on sender.
None of these changes really contradict any earlier design: the shape
and behaviour of the affine_on
algorithm wasn’t fully fleshed out. Tightening the behaviour scheduler
affinity and the affine_on algorithm
has some implications on some other components:
affine_on requires an infallible
scheduler modelled at least
inline_scheduler,
task_scheduler, and run_loop::scheduler
should be infallible (i.e., they always complete successfully with
set_value()).
parallel_scheduler can probably not
be made infallible.
task’s scheduler using co_await change_coroutine_scheduler(sch)
become somewhat unclear and this functionality should be removed.
Similar semantics are better modelled using co_await on(sch, nested-task).
affine_on isn’t particular
good and wasn’t designed. It may be worth renaming the algorithms to
something different.
affine_on Shape
The original proposal for task used
continues_on to schedule the work
back on the original scheduler. This algorithm takes the work to be
executed and the scheduler on which to continue as arguments. When SG1
requested that a similar but different algorithms is to be used to
implement scheduler affinity,
continues_on was just replaced by
affine_on with the same shape but
the potential to get customised differently.
The scheduler used for affinity is the scheduler communicated via the
get_scheduler query on the
receiver’s environment: the scheduler argument passed to the
affine_on algorithm would need to
match the scheduler obtained from
get_scheduler query. In the context
of the task coroutine this scheduler
can be obtained via the promise type but in general it is actually not
straight forward to get hold of this scheduler because the receiver and
hence its associated scheduler is only provided by
connect. It
is much more reasonable to have
affine_on only take the work, i.e.,
a sender, as argument and determine the scheduler to resume on from the
receiver’s environment in
connect.
Thus, instead of using
affine_on(sndr, sch)the algorithm is used just with the sender:
affine_on(sndr)
Note that this change implies that an operation state resulting from
connecting
affine_on to a receiver
rcvr is
started on the execution agent
associated with the scheduler obtained from get_scheduler(get_env(rcvr)).
The same requirement is also assumed to be met when
starting the operation state
resulting from
connecting a
task. While it is possible to
statically detect whether the query is valid and provides a scheduler it
cannot be detected if the scheduler matches the execution agent on which
start was called. P3718r0
proposes to add this exact requirement to [exec.get.scheduler].
This change addresses US 234-364 (LWG4331).
The objective of affine_on(sndr)
is to execute sndr and to
complete on the execution agent on which the operation was
started. Let
sch be the scheduler obtained from
get_scheduler(get_env(rcvr))
where rcvr is the receiver
used when
connecting
affine_on(sndr)
(the discussion in this section also applies if the scheduler would be
taken as a parameter, i.e., if the previous
change isn’t applied this discussion still applies). If
connecting
the result of schedule(sch)
fails (i.e., connect(schedule(sch), rcvr)
throws where rcvr is a
suitable receiver), affine_on can
avoid starting the main work and
fail on the execution agent where it was
started. Otherwise, if it obtained
an operation state os from
connect(scheduler(sch), rcvr),
affine_on would
start its main work and would start(os)
on the execution agent where the main work completed. If start(os)
is always successful, affine_on can
achieve its objective. However, if this scheduling operation fails,
i.e., it completes with set_error(e),
or if it gets cancelled, i.e., it completes with set_stopped(), the
execution agent on which the scheduling operation resumes is unclear and
affine_on cannot guarantee its
promise. Thus, it seems reasonable to require that a scheduler used with
affine_on is infallible, at least
when used appropriately (i.e., when providing a receiver whose
associated stop token is an
unstoppable_token).
The current working draft specifies 4 schedulers:
inline_scheduler
which just completes with
set_value()
when
start()ed,
i.e., this scheduler is already infallible.
task_scheduler
is a type-erased scheduler delegating to another scheduler. If the
underlying scheduler is infallible, the only error case for
task_scheduler is potential memory
allocation during
connect of
its ts-sender. If
affine_on creates an operation state
for the scheduling operation during
connect, it
can guarantee that any necessary scheduling operation succeeds. Thus,
this scheduler can be made infallible.
run_loop::run-loop-scheduler
is used by run_loop.
The current specification allows the scheduling operation to fail with
set_error_t(std::exception_ptr).
This permission allows an implementation to use std::mutex
and std::condition_variable
whose operations may throw. It is possible to implement the logic using
atomic operations which can’t throw. The set_stopped()
completion is only used when the receiver’s stop token, i.e. the result
of get_stop_token(get_env(rcvr)),
was stopped. This receiver is controlled by
affine_on, i.e., it can provide a never_stoptoken
and this scheduler won’t complete with set_stopped(). If
the get_completion_signatures
for the corresponding sender takes the environment into account, this
scheduler can also be made infallible.
parallel_scheduler
provides an interface to a replaceable implementation of a thread pool.
The current interface allows parallel_scheduler
to complete with set_error_t(std::exception_ptr)
as well as with set_stopped_t().
It seems unlikely that this interface can be constrained to make it
infallible.
In general it seems unlikely that all schedulers can be constrained to
be infallible. As a result affine_on
and, by extension, task won’t be
usable with all schedulers if
affine_on insists on using only
infallible schedulers. If there are fallible schedulers, there aren’t
any good options for using them with a
task. Note that
affine_on can fail and get cancelled
(due to the main work failing or getting cancelled) but
affine_on can still guarantee that
execution resumes on the expect execution agent when it uses an
infallible scheduler.
This change addresses US 235-363
(LWG4332). This
change goes beyond the actual issue and clarifies that the scheduling
operation used be affine_on needs to
be always successful.
affine_on
If affine_on promises in all cases
that it resumes on the original scheduler it can only work with
infallible schedulers. If a users wants to use a fallible scheduler with
affine_on or
task the scheduler will need to be
adapted. The adapted scheduler can define what it means when the
underlying scheduler fails. There are conceptually only two options (the
exact details may vary) on how to deal with a failed scheduling
operation:
std::terminate.
The standard library doesn’t provide a way to adapt schedulers easily. However, it can certainly be done.
affine_on
If the scheduler used with affine_on
is allowed to fail, affine_on can’t
guarantee that it completes on the correct scheduler in case of an error
completion. It could be specified that
affine_on completes with set_error(rcvr, scheduling_error{e})
when the scheduling operation completes with set_error(r, e)
to make it detectable that it didn’t complete on the correct scheduler.
This situation is certainly not ideal but, at least, only affects the
error completion and it can be made detectable.
A use of affine_on which always
needs to complete on a specific scheduler is still possible: in that
case the user will need to make sure that the used scheduler is
infallible. The main issue here is that there is no automatic static
checking whether that is the case.
In an ideal world, all schedulers would be infallible. It is unclear
if that is achievable. If schedulers need to be allowed to be fallible,
it may be viable to require that all standard library schedulers are
infallible. As outlined above that should be doable for all current
schedulers except, possibly,
parallel_scheduler. So, the proposed
change is to require schedulers to be infallible when being used with
affine_on (and, thus, being used by
task) and to change as many of the
standard C++ libraries to be infallible as possible.
If constraining affine_on to only
infallible schedulers turns out to be too strong, the constraint can be
relaxed in a future revision of the standard by explicitly opting out of
that constraints, e.g., using an additional argument. For
task to make use of it, it too would
need an explicit mechanisms to indicate that its
affine_on use should opt out of the
constraint, e.g., by adding a suitable
static
member to the environment template argument.
affine_on CustomisationSenders which don’t cause the execution agent to be changed like
just or the various queries should
be able to customise affine_on to
avoid unnecessary scheduling. Sadly, a proposal (P3206)
to standardise properties which could be used to determine how a sender
completes didn’t make much progress, yet. An implementation can make use
of similar techniques using an implementation-specific protocol. If a
future standard defines a standard approach to determine the necessary
properties the implementation can pick up on those.
The idea is to have affine_on
define a transform_sender(s)
member function which determines what sender should be returned. By
default the argument is returned but if the child sender indicates that
it doesn’t actually change the execution agent the function would return
the child sender. There are a number of senders for which this can be
done:
just,
just_error, and
just_stoppedread_env and
write_envthen,
upon_error, and
upon_stopped if the child sender
doesn’t change the execution agentThe proposal is to define a
transform_sender member which uses
an implementation-specific property to determine that a sender completes
on the same execution agent as the one it was started on. In addition,
it is recommended that this property gets defined by the various
standard library senders where it can make a difference.
This change addresses US 232-366 (LWG4329), although not in a way allowing application code to plug into this mechanism. Such an approach can be designed in a future revision of the standard.
change_coroutine_schedulerThe current working paper specifies
change_coroutine_scheduler to change
the scheduler used by the coroutine for scheduler affinity. It turns out
that this use is somewhat problematic in two ways:
change_coroutine_scheduler is
co_awaited
again. It doesn’t automatically reset. Thus, local variables constructed
before change_coroutine_scheduler(s)
was
co_awaited
were constructed on the original scheduler and are destroyed on the
replaced scheduler.task’s execution may finish
on a different than the original scheduler. To allow symmetric transfer
between two tasks each
task needs to complete on the
correct scheduler. Thus, the task
needs to be prepared to change to the original scheduler before actually
completing. To do so, it is necessary to know the original scheduler and
also to have storage for the state needed to change to a different
scheduler. It can’t be statically detected whether change_coroutine_scheduler(s)
is
co_awaited
in the body of a coroutine and, thus, the necessary storage and checks
are needed even for tasks which
don’t use
change_coroutine_scheduler.If there were no way to change the scheduler it would still be
possible to execute using a different scheduler, although not as direct:
instead of using co_await change_coroutine_scheduler(s)
to change the scheduler used for affinity to
s a nested
task executing on
s could be
co_awaited:
co_await ex::starts_on(s, [](parameters)->task<T, E> { logic }(arguments));Using this approach the use of the scheduler
s is clearly limited to the nested
coroutine. The scheduler affinity is fully taken care of by the use of
affine_on when
co_awaiting
work. There is no need to provide storage or checks needed for the
potential of having a task return to
the original scheduler if the scheduler isn’t actually changed by a
task.
The proposal is remove
change_coroutine_scheduler and the
possibility of changing the scheduler within a
task. The alternative to controlling
the scheduler used for affinity from within a
task is a bit verbose. This need
under the control of the coroutine is likely relatively rare. Replacing
the used scheduler for an existing
task by nesting it within on(s, t)
or starts_on(s, t)
is fairly straightforward.
This functionality was originally included because it is present for,
at least, one of the existing libraries, although in a form which was
recommended against. The existing use changes the scheduler of a
coroutine when
co_awaiting
the result of schedule(s);
this exact approach was found to be fragile and surprising and the
recommendation was to provide the functionality more explicit.
This change is not associated with any national body comment.
However, it is still important to do! It isn’t adding any new
functionality but removes a problematic way to achieve something which
can be better achieved differently. If this change is not made the
inherent cost of having the possibility of having
change_routine_scheduler can’t be
removed later without breaking existing code.
affine_on Default ImplementationUsing the previous discussion leads to a definition of
affine_on which is quite different
from effectively just using
continues_on:
affine_on should
define a transform_sender member
function which returns the child sender if this child sender indicates
via an implementation specific way that it doesn’t change the execution
agent. It should be recommended that some of the standard library sender
algorithms (see above) to indicate that they don’t change the execution
agent.affine_on algorithm should
only allow to get
connected to
a receiver r whose scheduler
sched obtained by get_scheduler(get_env(r))
is infallible, i.e., get_completion_signatures(schedule(sched), e)
with an environment e where get_stop_token(e)
yields never_stop_token returns
completion_signatures<set_value_t()>.affine_on gets
connected,
the scheduling operation state needs to be created by
connecting
the scheduler’s sender to a suitable receiver to guarantee that the
completion can be scheduled on the execution agent. The stop token get_stop_token(get_env(r))
for the receiver r used for this
connect
shall be an unstoppable_token. The
child sender also needs to be
connected
with a receiver which will capture the respective result upon completion
and start the scheduling operation.started it
starts the operation state from the
child operation.set_error(current_exception)
were called. Once the parameters are stored, the scheduling operation is
started.This behaviour is similar to
continues_on but is subtly different
with respect to when the scheduling operation state needs to be created
and that any stop token from the receiver doesn’t get forwarded. In
addition affine_on is more
constrained with respect to the schedulers it supports and the shape of
the algorithm is different:
affine_on gets the scheduler to
execute on from the receiver it gets
connected
to.
This change addresses US 233-365 (LWG4330) and US 236-362 (LWG; the proposed resolution in this issue is incomplete).
The name affine_on isn’t great.
It may be worth giving the algorithm a better name.
To be done.