1. Revision History
1.1. D1348R0
-
Initial release, prepared in situ at the 2018-11 San Diego meeting.
1.2. P1348R0
-
Changed
tomaximum_occupancy_shape_t
at request of SG1 at San Diego meeting.occupancy_t
2. Summary and Motivation
We propose the addition of an optional query-only property,
, with a
of
. The intention is that the result of querying this property should be used to drive the decomposition of work into parts and passed to
to express the number of agents needed.
Previous discussion by authors of [P0443r9] had indicated an understanding that such a property would be added at some future time. Discussion at the 2018-11 San Diego meeting served to increse the acuteness of the need and demonstrate just how cross-cutting the concern is. At least dating back to earliest discussions of
parallel implementations using [P0443r9], the authors recognized this need:
// XXX ideally, we’d partition the input into a number of tiles // proportional to the "unit_shape" of the executor // the idea behind this property is somewhat analogous to what // std::thread::hardware_concurrency() reports // for example, a thread pool executor would probably return // the number of theads in the pool // since we don’t have such a property, arbitrarily choose 16 size_t desired_num_tiles = 16 ;
(For context, see https://gist.github.com/jaredhoberock/7888469864b45bf471e686243e8a83c7).
Implementation reports at the 2018-11 San Diego meeting further demonstrated the ubiquity of the need for parallel algorithms is to decompose their work into tiles, and that the choice of the number of tiles is a potentially important performance concern. This number provides guidance to the parallel algorithm calling
to make an informed choice of what number of tiles they might want to use.
3. Wording
Add the following property to the section enumerating the query-only properties in [P0443r9]:
struct occupancy_t { static constexpr bool is_requirable = false; static constexpr bool is_preferable = false; using polymorphic_query_result_type = size_t ; template < class Executor > static constexpr decltype ( auto ) static_query_v = Executor :: query ( occupancy_t ()); }; constexpr occupancy_t occupancy ;
Provides a nonzero estimate for the number of execution agents that should occupy associated execution contexts (if any). [Note: For example, a thread pool executor might return the number of threads in a pool; a SIMD executor might return the number of vector lanes; a GPU executor might return the total number of hardware thread contexts; the inline executor should return
. Unlike
, if this value is not well defined or not computable for a given executor type
, then
should be false
.
Provides a nonzero estimate for the number of execution agents that should occupy associated execution contexts (if any). [Note: For example, a thread pool executor might return the number of threads in a pool; a SIMD executor might return the number of vector lanes; a GPU executor might return the total number of hardware thread contexts; the inline executor should return
. Unlike
, if this value is not well defined or not computable for a given executor type
, then
should be false
. —end note]