1. Motivation
C++17 parallel algorithms, together with executions policies, were a good start for parallel computation in C++ standard. However, they don’t have explicit way to specify what hardware the algorithm should use to be executed on.
In C++ standard we have execution policies that represent "how" the particular algorithm should be executed, in other words, they provide the semantical guarantees for the user callable objects passed to parallel algorithms. Without having other facilities in the C++ standard library execution policies tend to be used to combine both semantics of "how" and "where" the code should be executed.
[P2300R5] introduces
concept that represents the execution context.
It’s more flexible abstraction comparing with using the execution policies for answering "where" the code
should be executed because
is tightly connected to the hardware it sends work to.
Since [P2300R5] is targeted to C++26 we also should answer the question how the rest of C++ standard library would interoperate with the schedulers/senders/receiver mechanism.
P2500R0 is targeted to C++26 and is intended to answer the question how C++17 parallel algorithms support [P2300R5] facilities.
2. Proposed API
2.1. Parallel algorithms CPO
is used as a reference. When the design has a consensus it can be applied to all parallel algorithms.
The proposed API for C++17 parallel algorithms is a customization point object with the following
signature of
:
struct __for_each { template < std :: policy_aware_scheduler Scheduler , typename It , typename Callable > void operator ()( Scheduler s , It b , It e , Callable c ) const { if constexpr ( std :: tag_invocable < __for_each , Scheduler , It , It , Callable > ) { std :: tag_invoke ( * this , s , b , e , c ); } else { // default implementation } } }; inline constexpr __for_each for_each ;
See § 2.3 policy_aware_scheduler section for more details about
concept.
The implementation should invoke the customization, if exists. Otherwise, the default generic
implementation is called. That allows customizing every particular algorithm by
vendor,
if necessary.
Note: P2500 is supposed to use the same customization point mechanism as [P2300R5] does (currently
).
Eventually, the API above should be combined with § 2.2 execute_on and the call would look like:
for_each ( execute_on ( scheduler , std :: execution :: par ), begin , end , callable );
2.1.1. Why scheduler
?
The algorithms should accept the
to be able to get as many senders as they need to be able to build
a dependency graph they want.
2.1.2. Alternative API
Alternative API might look like having both
and
as
parameters.
struct __for_each { template < std :: policy_aware_scheduler Scheduler , std :: execution_policy ExecutionPolicy , typename It , typename Callable > void operator ()( Scheduler s , ExecutionPolicy policy , It b , It e , Callable c ) const ; }; inline constexpr __for_each for_each ;
However (IMHO), it complicates the API and still requires
to check if it can work with passed execution policy object
but on later stage (after resolving the algorithm call) and requires either something like
(see § 2.2 execute_on)
underneath or direct API from
(or execution policy) for that kind of checking.
2.2. execute_on
is the customization point that serves the purpose to tie
and
.
It’s up to
customization to check if it can work with the passed execution policy.
struct __execute_on { policy_aware_scheduler auto operator ()( scheduler auto sched , execution_policy auto policy ) const { return std :: tag_invoke ( * this , sched , policy ); } }; inline constexpr __execute_on execute_on ;
might have the default implementation but it’s a open question
what the behavior it should implement. See § 2.5 Open question for more details.
2.3. policy_aware_scheduler
is a concept for parallel algorithms that represents a combined
and
entity. It allows to get both execution policy type and execution policy object
parallel algorithm is called with.
template < typename S > concept policy_aware_scheduler = scheduler < S > && requires ( S s ) { typename S :: base_scheduler_type ; typename S :: policy_type ; { s . get_policy () } -> execution_policy ; // requires to allow specialization // of execution_policy on the user side // Also might require to make policy copy constructible };
See § 2.4 execution_policy concept for more details about
concept.
Customizations of the parallel algorithms can reuse the existing implementation of parallel algorithms
with
template parameter for "known"
type.
2.4. execution_policy
concept
The execution policy is optional if we want to constraint the return type of (some kind of)
method for
.
template < typename ExecutionPolicy > concept execution_policy = std :: is_execution_policy_v < std :: remove_cvref_t < ExecutionPolicy >> ;
The potential problem with that concept, thought, is a support of user-defined policies. We might
need to allow
specialization.
2.5. Open question
-
Should
have default implementation?execute_on -
If yes, should it advice sequential execution using passed
execution resources or the calling thread?scheduler -
If no, what a default behavior should it advice for
to implement?scheduler
-
-
What if the
is used in entry point to the binary as a polymorphic (or type-erased)scheduler
? How would it know that customization appears?scheduler -
If
concept is necessary should specialization ofexecution_policy
be allowed?is_execution_policy
3. Further exploration
The author is planning to explore how to specify the set of basic functions (named "parallel backend") the rest of parallel algorithms can be expressed with. It might be a separate paper based on the analysis.