Document Number: | P0574R0 |
Date: | 2017-02-04 |
Author: | Anthony
Williams Just Software Solutions Ltd |
Audience: | SG1, LWG |
The C++17 draft features overloads of a lot of the standard library
algorithms which take an ExecutionPolicy
argument, in order to
allow parallel implementations. However, the requirements of many of the
algorithms are written in such a way as to require a sequential
implementation, or a sub-optimal parallel implementation, thus eliminating
some or all of the benefit in the new overloads.
P0523r0: Wording for CH 10: Complexity
of parallel algorithms attempts to address this problem by a blanket
relaxation of the constraints for overloads with
an ExecutionPolicy
. I think this is an incorrect solution. It
simultaneously grants too much leeway for some algorithms, while not fixing
the problems with others. Instead, I think it is better to address each
algorithm individually.
In this paper I have gathered those algorithms where the requirements are specified in such a way as to rule out an efficient parallel implementation, and proposed alternative specifications for those algorithms.
adjacent_find
[alg.adjacent.find]Change paragraph 2 as follows:
Complexity: For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, fFor a nonempty range, exactlymin((i - first) + 1, (last - first) - 1)
applications of the corresponding predicate, wherei
isadjacent_find
’s return value. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, at most(last-first)-1
applications of the corresponding predicate.
Modify the requirements for copy_if
in paragraph 12 as
follows:
Requires: The ranges[first, last)
and[result, result + (last - first))
shall not overlap. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
,std::iterator_traits<InputIterator>::value_type
shall beMoveConstructible
.
Modify the requirements for remove_copy
and remove_copy_if
in paragraph 7 as follows:
Requires: The ranges[first, last)
and[result, result + (last - first))
shall not overlap. The expression*result = *first
shall be valid. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, the type of*first
shall satisfy theCopyConstructible
requirements.
Modify the requirements for unique_copy
and unique_copy_if
in paragraph 5 as follows:
Requires: The comparison function shall be an equivalence relation. The ranges[first, last)
and[result, result+(last-first))
shall not overlap. The expression*result = *first
shall be valid. LetT
be the value type ofInputIterator
. For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, the following shall hold:— If
InputIterator
meets the forward iterator requirements, then there are no additional requirements forT
. Otherwise, ifOutputIterator
meets the forward iterator requirements and its value type is the same asT
, thenT
shall beCopyAssignable
(Table 26). Otherwise,T
shall be bothCopyConstructible
(Table 24) andCopyAssignable.
For the overloads with an
ExecutionPolicy
other thanstd::execution::sequential_policy
,T
shall be bothCopyConstructible
(Table 24) andCopyAssignable.
Modify the complexity for partition
in paragraph 7 as
follows:
Complexity: For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, iIfForwardIterator
meets the requirements for aBidirectionalIterator
, at most(last - first) / 2
swaps are done; otherwise at mostlast - first
swaps are done. Exactlylast - first
applications of the predicate are done. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, no worse than O(N log N) swaps, where N =last - first
. Exactlylast - first
applications of the predicate are done.
Modify the complexity for stable_partition
in paragraph 11 as
follows:
Complexity: For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, aAt most N log(N ) swaps, where N =last - first
, but only O(N ) swaps if there is enough extra memory. Exactlylast - first
applications of the predicate. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, O(N log N) swaps, where N =last - first
. Exactlylast - first
applications of the predicate are done.
Modify the requirements for partition_copy
in paragraph 12
as follows:
Requires:InputIterator
’s value type shall beCopyAssignable
, and shall be writable (24.2.1) to theout_true
andout_false
OutputIterator
s, and shall be convertible toPredicate
’s argument type. The input range shall not overlap with either of the output ranges. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
,InputIterator
’s value type shall also beCopyConstructible
.
Modify the complexity for nth_element
in paragraph 3 as
follows:
Complexity: For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, lLinear on average. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, O(N) executions of the predicate, and no worse than O(N log N) swaps, where N =last - first
.
Modify the complexity for merge
in paragraph 4 as
follows:
Complexity: For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, aAt most(last1 - first1) + (last2 - first2) - 1
comparisons. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, O((last1 - first1) + (last2 - first2) - 1
) comparisons.
Modify the complexity for inplace_merge
in paragraph 8 as
follows:
Complexity: For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, ifWhenenough additional memory is available,(last - first) - 1
comparisons.If no additional memory is availableOtherwise, an algorithm with complexity at most O(N
log(N
)) may be used, whereN = last - first
.
Modify paragraph 5 as follows:
Requires:binary_op
shall neither invalidate iterators or subranges, nor modify elements in the range[first, last)
. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, bothT
andBinaryOperation
shall beCopyConstructible
. All ofbinary_op(init,*first)
,binary_op(*first,init)
,binary_op(init,init)
, andbinary_op(*first,*first)
shall be convertible toT
.
Modify paragraph 1 as follows:
Requires: Neitherunary_op
norbinary_op
shall invalidate subranges, or modify elements in the range[first, last)
. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, all ofT
,UnaryOperation
andBinaryOperation
shall beCopyConstructible
. All ofbinary_op(init,unary_op(*first)
,binary_op(unary_op(*first),init)
,binary_op(init,init)
, andbinary_op(unary_op(*first),unary_op(*first))
shall be convertible toT
.
Change paragraph 2 as follows:
Effects: For the overloads with noExecutionPolicy
, or anExecutionPolicy
ofstd::execution::sequential_policy
, fFor a non-empty range, the function creates an accumulatoracc
whose type isInputIterator
’s value type, initializes it with*first
, and assigns the result to*result
. For every iteratori
in [first + 1
,last
) in order, creates an objectval
whose type isInputIterator
’s value type, initializes it with*i
, computesval - acc
orbinary_op(val, acc)
, assigns the result to*(result + (i - first))
, and move assigns fromval
toacc
. For the overloads with anExecutionPolicy
other thanstd::execution::sequential_policy
, for a non-empty range, for every valued
in [1
,last-first-1
) the function computes*(first+d)-*(first+d-1)
orbinary_op(*(first+d),*(first+d-1))
and assigns the result to*(result + d)