1. Motivation
is a convenient way to create an execution context in place where you need it.
However, including it as the sole, standard way to obtain execution resources on the host may promote bad practice
that will inadvertently lead to oversubscription and poor composability. In this section, we outline some of the
weaknesses of
and describe characteristics that an alternative execution context should possess to
avoid these weaknesses. We are not arguing for the removal of
from [P0443R13] but instead to complement
it with at least one additional choice.
The issue with
is that it can easily lead to oversubscription.
Real-world applications are complicated and, in general, link with many third party libraries. Any shared object (.so)
as well as an application itself may create its own
. Without alternatives in the standard, this might
in fact seem like the only portable choice. However, when there are many
instances, the end application
will likely, inadvertently request more threads than the number of physical cores available in the hardware, oversubscribing the hardware.
For compute-intense workloads, oversubscription often leads to poor performance.
is not suitable for parallel algorithms due to oversubscription and composability issues.
The parallel algorithms (overloads with ExecutionPolicy) tend to be extended with an additional overload with Executor.
But C++17 overloads (without Executor) are still there. Since creating separate instances of
in each algorithm is not suitable, there needs to be a way to say where those overloads can obtain an appropriate executor for the computation.
2. Proposed Direction
2.1. Parallel Executor
To solve the problems described above we propose to introduce
. The API is:
namespace std :: execution { executor auto parallel_executor (); }
-
Returns the instance of
parallel_executor -
Lazy initializes the execution context (thread pool) if it was not initialized previously only when
is calledexecute -
The underlying thread pool has the capacity that is equal to
by defaultstd :: thread :: hardware_concurrency -
All instances returned from
function share the same thread pool that is semantically singleton.parallel_executor -
Each
instance has its own arena where work is shared among threadsparallel_executor -
Once the thread pool is initialized it remains alive as long as the process exists.
2.2. Properties
This section introduces the properties user can
,
or
from
object.
2.2.1. arena_t
namespace std :: execution { struct arena_t { template < class T > static constexpr bool is_applicable_property_v = executor < T > ; static constexpr bool is_requirable = true; static constexpr bool is_preferable = true; using polymorphic_query_result_type = arena_t ; constexpr unsigned int concurrency_capacity () const noexcept ; arena_t () = default ; arena_t ( const arena_t & ) = default ; constexpr arena_t ( unsigned int concurrency_capacity ); constexpr bool operator == ( const arena_t & ) const ; constexpr bool operator != ( const arena_t & ) const ; }; }
Represents the arena of the
instance where work is shared between threads. Two different
instances may share the same arena meaning that the work is shared between them. On the other hand,
instances may be created with different arenas.
In that case, the work is not shared between those instances. The work belongs to the arena instance associated with
the work is executed by.
controls how many threads at maximum can share the work inside the instance of
.
Note: There is no guarantee that the number of threads sharing the work is exactly the same as the value of
.
By default
is equal to
2.2.2. wait_context_t
namespace std :: execution { struct wait_context_t { template < typename T > static constexpr bool is_applicable_property_v = executor < T > ; static constexpr bool is_requirable = true; static constexpr bool is_preferable = true; using polymorphic_query_result_type = wait_context_t ; wait_context_t (); wait_context_t ( const wait_context_t & wc ); void wait (); constexpr bool operator == ( const wait_context_t & ) const ; constexpr bool operator != ( const wait_context_t & ) const ; }; }
Represents the object you can wait on.
tracks all the work executed and not yet completed by the set of
instances that share the same
object and additionally tracks the
instances in that set with the
property established.
One default constructed
tracks work independently from another default constructed
. Two default constructed
instances are never equal.
2.3. Parallel executor controls
namespace std :: execution { struct parallel_executor_control { void max_concurrency ( unsigned int ); } }
limits the capacity of
's underlying thread pool. This is the upper limit of the active threads in the pool.
User may have more than one
object at the same time. For the sake of composability, the upper limit is the minimal
value stored
in all constructed but not yet destructed
objects. When one of the objects is destroyed it sets
the minimal value of the max concurrency through the remaining not yet destructed
objects.