Document number: P2643R1
Date: 2023-05-18
Reply to: Gonzalo Brito Gadeschi <gonzalob _at_ nvidia.com>
Authors: Gonzalo Brito Gadeschi, Olivier Giroux, Thomas Rodgers
Audience: Concurrency
Improving C++ concurrency features
Revisions
r0 - (Kona Drafts)
- Added discussion of pros/cons of returning pair<T. bool> vs. optional<T>.
- Added discussion of timed waits and freestanding, given that <chrono> is not part of freestanding.
- Added wording for
barrier::try_wait_for
and barrier::try_wait_until
.
- Added motivating examples.
- Fixed
barrier::try_wait_for/_until
signatures; they were incorrectly accepting an arrival_token&&
, but since these can be called in a loop consuming the arrival_token
is incorrect.
- Removed discussion of ‘hinted’ wait mechanism. The design surface area of this proposal is such that it should be a separate paper, brought forward an interested party.
- Removed fallible untimed
try_wait
.
r1 - (Varna Draft)
- Removed timed waiting for freestanding, to reflect Kona guidance, and added discussion.
- Removed pros/cons discussion of returning
optional<T>
vs pair<bool, T>
vs T
, reflecting Kona guidance.
- Re-added fallible untimed
try_wait
with rationale, reflecting Kona guidance.
Introduction
P1135R6 introduced serval new concurrency primitives to the C++20 concurrency library:
<atomic>
: added the class atomic_flag
, the wait
and notify_one/_all
to class template atomic<>
, and free function versions of these.
<semaphore>
: added class template counting_semaphore<>
and class binary_semaphore
.
<barrier>
,<latch>
: added class template barrier<>
and class latch
.
Though each element included was long coming, and had much implementation experience behind it, fresh user feedback tells us that some improvements could still be made:
- Return last observed value from
atomic::wait
; this value is lost otherwise.
- Add timed versions of
atomic::wait
and other concurrency primitves like barrier
and latch
, to make it easier to implement concurrency primitives on top of atomic
that expose timed waiting facilities themselves, e.g., like <semaphore>
.
- Avoid spurious polling in
atomic::wait
by accepting a predicate.
This proposal proposes extensions to address these shortcomings. This branch demonstrates its implementability in libstdc++.
The design of the features above is mostly orthogonal, and this section explores them independently.
Return last observed value on success
Desing: ::wait
APIs change the return type signature from returnvoid
T
wait(...)
;
Example 0:
Before |
After |
atomic<int> a(42);
a.wait(42);
auto o = a.load();
assert(o != 42); // MAY FAIL! |
atomic<int> a(42);
auto o = a.wait(42);
assert(o != 42); // OK! |
The atomic<T>::wait
method guarantees that the thread is unblocked only if the value changed.
Before this paper, the new atomic<T>
value is not returned to the caller. This has the following two shortcomings:
- That value is lost forever: after the thread is unblocked, the value might change before the unblocked thread calls
atomic<T>::load
(ABA Problem).
- Applications need the new value often: which forces them to call
atomic<T>::load
, even though atomic::wait<T>
most likely already loaded the value, to test that it did change.
After this paper, the value returned by wait
is returned to the caller, eliminating the need for the subsequent load.
This is an ABI breaking change that could be resolved by picking a different to-be-determined name for these APIs.
Fallible and timed versions of wait APIs
The design of the fallible timed versions of wait APIs adds three new APIs. For atomic
these are
template <class T>
optional<T> atomic<T>::try_wait(T, memory_order);
template <class T, class Rep, class Period>
optional<T> atomic<T>::try_wait_for(T, duration<Rep, Period> const&, memory_order);
template <class T, class Clock, class Duration>
optional<T> atomic<T>::try_wait_until(T, time_point<Clock, Duration> const&, memory_order);
They are non-blocking, i.e., they eventually return to the caller in a finite-set of steps, even if the value did not change. This enables the application to “do something else” before attempting to wait again.
On failure, i.e., if the value did not change, they return nullopt
and the operation has no effects (it does not synchronize). On success, they return an optional<T>
containing the last observed value, which is guaranteed to be different from the one the call site waited on.
The untimed try_wait
overload attempts to wait for a finite unspecified duration. The implementation may pick a different duration every time, enabling it decide for how long the operation should attempt to wait based on dynamic information, like the load of the system.
Since <chrono> and <optional> are not freestanding, these APIs will not be available in freestanding implementations. We could attempt to improve this by:
- Supporting the subset of
optional
APIs that do not throw exceptions in freestanding.
- Supporting the
<chrono>
durations in freestanding.
Example 1: the atomic variable t
tracks how many tasks need to still be processed, and the application prints every second - if it is still waiting - how many tasks remain:
Before |
After |
atomic<int> t;
int old = 1;
auto b = clock::now();
while (old != 0) {
old = t.load();
if (auto e = clock::now(); (e - b) > 1s) {
cout << old;
b = e;
}
} |
atomic<int> t;
auto old = 1;
while (old != 0) { auto o = t.try_wait_for(old, 1s);
old = o.has_value()? o.value() : old;
cout << old;
}
|
Before this proposal, applications need to re-implement atomic<T>::wait
logic, and like in this example, may accidentally call atomic<T>::load
in a loop without any back-off.
After this proposal, try_wait_for
expresses the application’s intent to wait for a certain duration of time, enabling the implementation to wait efficiently.
For barrier
and latch
they look like:
template <class CF>
bool barrier<CF>::try_wait(arrival_token& tok) const;
template <class CF, class Rep, class Period>
bool barrier<CF>::try_wait_for(arrival_token& tok, duration<Rep, Period> const& rel_time) const;
template <class CF, class Clock, class Duration>
bool barrier<CF>::try_wait_until(arrival_token& tok, time_point<Clock, Duration> const& abs_time) const;
template <class Rep, class Period>
bool latch::try_wait_for(duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool latch::try_wait_until(time_point<Clock, Duration> const& abs_time) const;
Example 2: using a barrier
which tracks the completion of task_count
tasks:
std::barrier b(task_count);
auto n = process_thread_local_tasks();
auto t = b.arrive(n);
while (!b.try_wait_for(t, 1ms)) {
auto s = steal_and_process_tasks();
b.arrive(s);
}
Predicate versions
There is no wording for the predicate versions proposed in this revision of this paper.
When the program is waiting for a condition different from “not equal to”, there is an added re-try loop around the wait
operation in the program. This loop causes each call to wait
to be performed as if it were the first call to wait
, oblivious to the fact that the program has already been waiting for some time. This leads to re-executing the short-term polling strategy.
Taking a predicate instead of a value allows us to push the program-defined condition inside of atomic::wait
, delete the outer loop, and allows the implementation to track time spent.
At least two implementations currently implement atomic::wait
in terms of a wait taking a predicate.
The authors see two APIs that are both equvialent in terms of implementation complexity and usability, but have an opposite API depending on how the semantics of the predicate are defined.
Both options have this signature T atomic<T>::wait_with_predicate(Predicate&& p)
and the predicate takes a T
and returns a bool
, but the predicate returns:
- Option 1:
true
if we synchronizes
- Understood as the wait succeds
- Option 2:
false
if we synchronizes
- Understood as “the value changed” (value compared not-equal - like
atomic::wait(T)
waits until T
changes)
Feedback needed: We should pick an option before working on wording.
Example 3: same example as Example 1 using Option 1 above (true
if we synchronize):
Before |
After |
atomic<int> t;
int old = 1;
auto b = clock::now();
while (old != 0) {
old = t.load();
auto e = clock::now();
if ((e - b) > 1s) {
cout << old;
b = e;
}
} |
atomic<int> t;
int old = t.load();
auto p = [&](int v) {
old = v;
return v == 0;
};
while(!t.try_wait_with_predicate_for(p, 1s).has_value()) {
cout << old;
}
|
In Example 1, the “After this paper” implementation has the issue that try_wait_for(old, 1s)
returns as soon as old
changes. That is, while the goal of the application is to wait for the value becoming zero, and it could wait up to 1s for that, if after 0.1s the value changes from 10 to 9 this API will return.
The “After this paper” implementation in Example 2 fixes that using the fallibe timed _with_predicate
version. The predicate is called every time the value changes, but it doesn’t “succeed” until the value is 0.
Wording
Return last observed value from atomic::wait
To [atomics.ref.generic.general]:
namespace std {
template<class T> struct atomic_ref { // [atomics.ref.generic.general]
voidT wait(T, memory_order = memory_order::seq_cst) const noexcept;
};
}
UNRESOLVED QUESTION: all atomic_ref
types are missing volatile
wait overloads. Should that be fixed as part of this work?
To [atomics.ref.ops]:
voidT wait(T old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions:
order
is neither memory_order::release
nor memory_order::acq_rel
.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates
load(order)
and compares its value representation for equality against that of old
.
- If they compare unequal, returns the result of the evaluation of
load(order)
in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
- Remarks: This function is an atomic waiting operation (atomics.wait) on atomic object
*ptr
.
To [atomics.ref.int]:
namespace std {
template<> struct atomic_ref<integral> {
voidT wait(integral, memory_order = memory_order::seq_cst) const noexcept;
};
}
To [atomics.ref.float]:
namespace std {
template<> struct atomic_ref<floating-point> {
voidT wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
};
}
To [atomics.ref.pointer]:
namespace std {
template<class T> struct atomic_ref<T*> {
voidT* wait(T*, memory_order = memory_order::seq_cst) const noexcept;
};
}
To [atomics.types.generic.general]:
namespace std {
template<class T> struct atomic {
voidT wait(T, memory_order = memory_order::seq_cst) const volatile noexcept;
voidT wait(T, memory_order = memory_order::seq_cst) const noexcept;
};
}
To [atomics.types.operations]:
voidT wait(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;
voidT wait(T old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions: order is neither
memory_order::release
nor memory_order::acq_rel
.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates
load(order)
and compares its value representation for equality against that of old
.
- If they compare unequal, returns the result of the evaluation of
load(order)
in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
- Remarks: This function is an atomic waiting operation (atomics.wait).
To [atomics.types.int]:
namespace std {
template<> struct atomic<integral> {
voidT wait(integral, memory_order = memory_order::seq_cst) const volatile noexcept;
voidT wait(integral, memory_order = memory_order::seq_cst) const noexcept;
};
}
To [atomics.types.float]:
namespace std {
template<> struct atomic<floating-point> {
voidT wait(floating-point, memory_order = memory_order::seq_cst) const volatile noexcept;
voidT wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
};
}
To [atomics.types.pointer]:
namespace std {
template<class T> struct atomic<T*> {
voidT* wait(T*, memory_order = memory_order::seq_cst) const volatile noexcept;
voidT* wait(T*, memory_order = memory_order::seq_cst) const noexcept;
};
}
To [util.smartptr.atomic.shared]:
namespace std {
template<class T> struct atomic<shared_ptr<T>> {
voidshared_ptr<T> wait(shared_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
};
}
and
voidshared_ptr<T> wait(shared_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions:
order
is neither memory_order::release
nor memory_order::acq_rel
.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates
load(order)
and compares it to old
.
- If the two are not equivalent, returns the result of the evaluation of
load(order)
in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
- Remarks: Two
shared_ptr
objects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).
To [util.smartptr.atomic.weak]:
namespace std {
template<class T> struct atomic<weak_ptr<T>> {
voidweak_ptr<T> wait(weak_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
};
}
voidweak_ptr<T> wait(weak_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions: order is neither
memory_order::release
nor memory_order::acq_rel
.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates
load(order)
and compares it to old
.
- If the two are not equivalent, returns the result of the evaluation of
load(order)
in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
- Remarks: Two
weak_ptr
objects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).
No changes to [atomics.nonmembers] are needed.
No changes to [atomic.flag]'s wait
APIs are needed.
Fallible and timed-versions of ::wait
APIs
To [atomics.ref.generic.general]:
namespace std {
template<class T> struct atomic_ref { // [atomics.ref.generic.general]
optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(
T, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(
T, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
};
}
To [atomics.ref.ops]:
optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old,
chrono::duration<Rep, Period> const& rel_time,
memory_order order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old,
chrono::time_point<Clock, Duration> const& abs_time,
memory_order order = memory_order::seq_cst
) const noexcept;
- Preconditions:
order
is neither memory_order::release
nor memory_order::acq_rel
.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates
load(order)
and compares its value representation for equality against that of old
.
- If they compare unequal, returns the result of the evaluation of
load(order)
in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously or the timeout expired. If it is unblocked by the timeout there is no effect and it returns
nullopt
.
The timeout expires (thread.req.timing) when the current time is after abs_time
(for try_wait_until
) or when at least rel_time
has passed from the start of the function (for try_wait_for
).
An implementation should ensure that try_wait_for
and try_wait_until
do not consistently return nullopt
in the absence of contending atomic operations.
The timeout for try_wait
is finite but otherwise unspecified.
- Throws: Timeout-related exceptions (thread.req.timing).
- Remarks: This function is an atomic waiting operation (atomics.wait) on atomic object
*ptr
.
To [atomics.ref.int]:
namespace std {
template<> struct atomic_ref<integral> {
optional<integral> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<integral> try_wait_for(
integral, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<integral> try_wait_until(
integral, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
};
}
To [atomics.ref.float]:
namespace std {
template<> struct atomic_ref<floating-point> {
optional<floating-point> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<floating-point> try_wait_for(
floating-point, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<floating-point> try_wait_until(
floating-point, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
};
}
To [atomics.ref.pointer]:
namespace std {
template<class T> struct atomic_ref<T*> {
optional<T*> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T*> try_wait_for(
T*, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T*> try_wait_until(
T*, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
};
}
To [atomics.types.generic.general]:
namespace std {
template<class T> struct atomic {
optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(
integral, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
optional<T> try_wait_for(
integral, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(
integral, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(
integral, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
};
}
To [atomics.types.operations]:
optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old,
chrono::duration<Rep, Period> const& rel_time,
memory_order order = memory_order::seq_cst
) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old,
chrono::duration<Rep, Period> const& rel_time,
memory_order order = memory_order::seq_cst
) const volatile noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old,
chrono::time_point<Clock, Duration> const& abs_time,
memory_order order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old,
chrono::time_point<Clock, Duration> const& abs_time,
memory_order order = memory_order::seq_cst
) const volatile noexcept;
EDITORIAL: analogous to atomic_ref
. Intentionally left out from the current revision of this paper.
To [atomics.types.int]:
namespace std {
template<> struct atomic<integral> {
optional<integral> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<integral> try_wait_for(
integral, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Rep, class Period>
optional<integral> try_wait_for(
integral, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
template <class Clock, class Duration>
optional<integral> try_wait_until(
integral, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<integral> try_wait_until(
integral, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
};
}
To [atomics.types.float]:
namespace std {
template<> struct atomic<floating-point> {
optional<floating-point> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<floating-point> try_wait_for(
floating-point, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Rep, class Period>
optional<floating-point> try_wait_for(
floating-point, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
template <class Clock, class Duration>
optional<floating-point> try_wait_until(
floating-point, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<floating-point> try_wait_until(
floating-point, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
};
}
To [atomics.types.pointer]:
namespace std {
template<class T> struct atomic<T*> {
optional<T*> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T*> try_wait_for(
T*, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Rep, class Period>
optional<T*> try_wait_for(
T*, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
template <class Clock, class Duration>
optional<T*> try_wait_until(
T*, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T*> try_wait_until(
T*, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
};
}
To [util.smartptr.atomic.shared]:
namespace std {
template<class T> struct atomic<shared_ptr<T>> {
optional<shared_ptr<T>> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<shared_ptr<T>> try_wait_for(
shared_ptr<T>l, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<shared_ptr<T>> try_wait_until(
shared_ptr<T>, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
};
}
EDITORIAL: analogous to the try_wait_for
/_until
APIS of atomic_ref
, with shared_ptr
/weak_ptr
tweaks. Intentionally left out of the current revision of this paper.
To [util.smartptr.atomic.weak]:
namespace std {
template<class T> struct atomic<weak_ptr<T>> {
optional<weak_ptr<T>> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<weak_ptr<T>> try_wait_for(
weak_ptr<T>l, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<weak_ptr<T>> try_wait_until(
weak_ptr<T>, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
};
}
EDITORIAL: analogous to the try_wait_for
/_until
APIS of atomic_ref
, with shared_ptr
/weak_ptr
tweaks. Intentionally left out of the current revision of this paper.
EDITORIAL: No changes to [atomics.nonmembers] are needed.
To [atomic.flag]:
namespace std {
struct atomic_flag {
bool try_wait(memory_order = memory_order::seq_cst) const noexcept;
bool try_wait(memory_order = memory_order::seq_cst) const noexcept volatile;
template <class Rep, class Period>
bool try_wait_for(
bool, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Rep, class Period>
bool try_wait_for(
bool, chrono::duration<Rep, Period> const& rel_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
template <class Clock, class Duration>
bool try_wait_until(
bool, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
bool try_wait_until(
bool, chrono::time_point<Clock, Duration> const& abs_time,
memory_order = memory_order::seq_cst
) const volatile noexcept;
};
}
bool atomic_flag_try_wait(const atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait(const volatile atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait_explicit(const atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait_explicit(const volatile atomic_flag* object, bool old) noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const volatile noexcept;
bool atomic_flag_try_wait_for(const atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time) noexcept;
bool atomic_flag_try_wait_for(const volatile atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time) noexcept;
bool atomic_flag_try_wait_for_explicit(const atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag_try_wait_for_explicit(const volatile atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag::try_wait_for(bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait_for(bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) const volatile noexcept;
atomic_flag_try_wait_until(const atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time) noexcept;
bool atomic_flag_try_wait_until(const volatile atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time) noexcept;
bool atomic_flag_try_wait_until_explicit(const atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag_try_wait_until_explicit(const volatile atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag::try_wait_until(bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait_until(bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) const volatile noexcept;
For atomic_flag_try_wait
, atomic_flag_try_wait_for
, and atomic_flag_try_wait_until
, let order
be memory_order::seq_cst
. Let flag
be object
for the non-member functions, and this
for the member functions.
- Preconditions:
order
is neither memory_order::release
nor memory_order::acq_rel
.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates
load(order)
and compares its value representation for equality against that of old
.
- If they compare unequal, returns the result of the evaluation of
load(order)
in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously or the timeout expired. If it is unblocked by the timeout there is no effect and it returns
nullopt
.
The timeout expires (thread.req.timing) when the current time is after abs_time
(for try_wait_until
) or when at least rel_time
has passed from the start of the function (for try_wait_for
).
An implementation should ensure that try_wait_for
and try_wait_until
do not consistently return nullopt
in the absence of contending atomic operations.
The timeout for try_wait
is finite but otherwise unspecified.
- Throws: Timeout-related exceptions (thread.req.timing).
- Remarks: This function is an atomic waiting operation (atomics.wait) on atomic object
*ptr
.
To [atomics.wait]:
- [Note 2: The following functions are atomic waiting operations:
2.1 atomic<T>::wait
, atomic<T>::wait_for
, atomic::<T>::wait_until
,
2.2 atomic_flag::wait
, atomic_flag::wait_for
, atomic_flag::wait_until
,
2.3 atomic_wait
and, atomic_wait_explicit
, atomic_wait_for
, atomic_wait_for_explicit
, atomic_wait_until
, atomic_wait_until_explicit
,
2.4 atomic_flag_wait
and, atomic_flag_wait_explicit
, atomic_flag_wait_for
, atomic_flag_wait_for_explicit
, atomic_flag_wait_until
, atomic_flag_wait_until_explicit
,and
2.5 atomic_ref<T>::wait
, atomic_ref<T>::wait_for
, atomic_ref<T>::wait_until
.
— end note]
UNRESOLVED QUESTION: do we need to change something else for the non-member versions of try_wait_for
and try_wait_until
operations?
To [thread.barrier]:
namespace std {
template <class Completion Function>
class barrier {
public:
bool try_wait(arrival_token& tok) const;
template <class Rep, class Period>
bool try_wait_for(arrival_token& tok, chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(arrival_token& tok, chrono::time_point<Clock, Duration> const& abs_time) const;
};
}
UNRESOLVED QUESTION: arrival_token&
vs arrival_token const&
?
Note: try_wait_for
/try_wait_until
do not take ownership of the arrival_token
because if they fail, the user still needs the token to be able to call them again.
bool try_wait(arrival_token& tok) const;
template <class Rep, class Period>
bool try_wait_for(arrival_token& tok, chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(arrival_token& tok, chrono::time_point<Clock, Duration> const& abs_time) const;
- Preconditions:
arrival
is associated with the phase synchronization point for the current phase or the immediately preceding phase of the same barrier object.
- Effects: Blocks at the synchronization point associated with
arrival
until the phase completion step of the synchronization point’s phase is run or the timeout expired. If it is unblocked by the timeout there is no effect and it returns false
; otherwise, it returns true
.
The timeout expires (thread.req.timing) when the current time is after abs_time
(for try_wait_until
) or when at least rel_time
has passed from the start of the function (for try_wait_for
).
The timeout for try_wait
is finite but otherwise unspecified.
An implementation must ensure that try_wait_for
and try_wait_until
do not consistently return false
after the phase completion step associated with arrival
has run.
[Note: If arrival
is associated with the synchronization point for a previous phase, the call returns immediately. — end note]
- Throws:
system_error
when an exception is required (thread.req.exception) or timeout-related exceptions (thread.req.timing).
- Error conditions: Error conditions: Any of the error conditions allowed for mutex types (thread.mutex.requirements.mutex).
To thread.latch:
namespace std {
class latch {
public:
bool try_wait() const;
template <class Rep, class Period>
bool try_wait_for(chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;
};
}
bool try_wait() const;
template <class Rep, class Period>
bool try_wait_for(chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;
EDITORIAL: semantics intentionally omitted from the current revision of this paper but analogous to barrier
.
Document number: P2643R1
Date: 2023-05-18
Reply to: Gonzalo Brito Gadeschi <gonzalob _at_ nvidia.com>
Authors: Gonzalo Brito Gadeschi, Olivier Giroux, Thomas Rodgers
Audience: Concurrency
Improving C++ concurrency features
Revisions
r0 - (Kona Drafts)
barrier::try_wait_for
andbarrier::try_wait_until
.barrier::try_wait_for/_until
signatures; they were incorrectly accepting anarrival_token&&
, but since these can be called in a loop consuming thearrival_token
is incorrect.try_wait
.r1 - (Varna Draft)
optional<T>
vspair<bool, T>
vsT
, reflecting Kona guidance.try_wait
with rationale, reflecting Kona guidance.Introduction
P1135R6 introduced serval new concurrency primitives to the C++20 concurrency library:
<atomic>
: added the classatomic_flag
, thewait
andnotify_one/_all
to class templateatomic<>
, and free function versions of these.<semaphore>
: added class templatecounting_semaphore<>
and classbinary_semaphore
.<barrier>
,<latch>
: added class templatebarrier<>
and classlatch
.Though each element included was long coming, and had much implementation experience behind it, fresh user feedback tells us that some improvements could still be made:
atomic::wait
; this value is lost otherwise.atomic::wait
and other concurrency primitves likebarrier
andlatch
, to make it easier to implement concurrency primitives on top ofatomic
that expose timed waiting facilities themselves, e.g., like<semaphore>
.atomic::wait
by accepting a predicate.This proposal proposes extensions to address these shortcomings. This branch demonstrates its implementability in libstdc++.
Design
The design of the features above is mostly orthogonal, and this section explores them independently.
Return last observed value on success
Desing:
::wait
APIs change the return type signature from returningvoid
T
wait(...)
;Example 0:
atomic<int> a(42);
a.wait(42);
auto o = a.load();
assert(o != 42); // MAY FAIL!
atomic<int> a(42);
auto o = a.wait(42);
assert(o != 42); // OK!
The
atomic<T>::wait
method guarantees that the thread is unblocked only if the value changed.Before this paper, the new
atomic<T>
value is not returned to the caller. This has the following two shortcomings:atomic<T>::load
(ABA Problem).atomic<T>::load
, even thoughatomic::wait<T>
most likely already loaded the value, to test that it did change.After this paper, the value returned by
wait
is returned to the caller, eliminating the need for the subsequent load.This is an ABI breaking change that could be resolved by picking a different to-be-determined name for these APIs.
Fallible and timed versions of wait APIs
The design of the fallible timed versions of wait APIs adds three new APIs. For
atomic
these areThey are non-blocking, i.e., they eventually return to the caller in a finite-set of steps, even if the value did not change. This enables the application to “do something else” before attempting to wait again.
On failure, i.e., if the value did not change, they return
nullopt
and the operation has no effects (it does not synchronize). On success, they return anoptional<T>
containing the last observed value, which is guaranteed to be different from the one the call site waited on.The untimed
try_wait
overload attempts to wait for a finite unspecified duration. The implementation may pick a different duration every time, enabling it decide for how long the operation should attempt to wait based on dynamic information, like the load of the system.Since <chrono> and <optional> are not freestanding, these APIs will not be available in freestanding implementations. We could attempt to improve this by:
optional
APIs that do not throw exceptions in freestanding.<chrono>
durations in freestanding.Example 1: the atomic variable
t
tracks how many tasks need to still be processed, and the application prints every second - if it is still waiting - how many tasks remain:atomic<int> t;
int old = 1;
auto b = clock::now();
while (old != 0) {
old = t.load();
if (auto e = clock::now(); (e - b) > 1s) {
cout << old;
b = e;
}
}
atomic<int> t;
auto old = 1;
while (old != 0) {
auto o = t.try_wait_for(old, 1s);
old = o.has_value()? o.value() : old;
cout << old;
}
Before this proposal, applications need to re-implement
atomic<T>::wait
logic, and like in this example, may accidentally callatomic<T>::load
in a loop without any back-off.After this proposal,
try_wait_for
expresses the application’s intent to wait for a certain duration of time, enabling the implementation to wait efficiently.For
barrier
andlatch
they look like:Example 2: using a
barrier
which tracks the completion oftask_count
tasks:Predicate versions
There is no wording for the predicate versions proposed in this revision of this paper.
When the program is waiting for a condition different from “not equal to”, there is an added re-try loop around the
wait
operation in the program. This loop causes each call towait
to be performed as if it were the first call towait
, oblivious to the fact that the program has already been waiting for some time. This leads to re-executing the short-term polling strategy.Taking a predicate instead of a value allows us to push the program-defined condition inside of
atomic::wait
, delete the outer loop, and allows the implementation to track time spent.At least two implementations currently implement
atomic::wait
in terms of a wait taking a predicate.The authors see two APIs that are both equvialent in terms of implementation complexity and usability, but have an opposite API depending on how the semantics of the predicate are defined.
Both options have this signature
T atomic<T>::wait_with_predicate(Predicate&& p)
and the predicate takes aT
and returns abool
, but the predicate returns:true
if we synchronizesfalse
if we synchronizesatomic::wait(T)
waits untilT
changes)Feedback needed: We should pick an option before working on wording.
Example 3: same example as Example 1 using Option 1 above (
true
if we synchronize):atomic<int> t;
int old = 1;
auto b = clock::now();
while (old != 0) {
old = t.load();
auto e = clock::now();
if ((e - b) > 1s) {
cout << old;
b = e;
}
}
atomic<int> t;
int old = t.load();
auto p = [&](int v) {
old = v;
return v == 0;
};
while(!t.try_wait_with_predicate_for(p, 1s).has_value()) {
cout << old;
}
In Example 1, the “After this paper” implementation has the issue that
try_wait_for(old, 1s)
returns as soon asold
changes. That is, while the goal of the application is to wait for the value becoming zero, and it could wait up to 1s for that, if after 0.1s the value changes from 10 to 9 this API will return.The “After this paper” implementation in Example 2 fixes that using the fallibe timed
_with_predicate
version. The predicate is called every time the value changes, but it doesn’t “succeed” until the value is 0.Wording
Return last observed value from
atomic::wait
To [atomics.ref.generic.general]:
To [atomics.ref.ops]:
order
is neithermemory_order::release
normemory_order::acq_rel
.load(order)
and compares its value representation for equality against that ofold
.load(order)
in the previous step.*ptr
.To [atomics.ref.int]:
To [atomics.ref.float]:
To [atomics.ref.pointer]:
To [atomics.types.generic.general]:
To [atomics.types.operations]:
memory_order::release
normemory_order::acq_rel
.load(order)
and compares its value representation for equality against that ofold
.load(order)
in the previous step.To [atomics.types.int]:
To [atomics.types.float]:
To [atomics.types.pointer]:
To [util.smartptr.atomic.shared]:
and
order
is neithermemory_order::release
normemory_order::acq_rel
.load(order)
and compares it toold
.load(order)
in the previous step.shared_ptr
objects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).To [util.smartptr.atomic.weak]:
memory_order::release
normemory_order::acq_rel
.load(order)
and compares it toold
.load(order)
in the previous step.weak_ptr
objects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).No changes to [atomics.nonmembers] are needed.
No changes to [atomic.flag]'s
wait
APIs are needed.Fallible and timed-versions of
::wait
APIsTo [atomics.ref.generic.general]:
To [atomics.ref.ops]:
order
is neithermemory_order::release
normemory_order::acq_rel
.load(order)
and compares its value representation for equality against that ofold
.load(order)
in the previous step.nullopt
.The timeout expires (thread.req.timing) when the current time is after
abs_time
(fortry_wait_until
) or when at leastrel_time
has passed from the start of the function (fortry_wait_for
).An implementation should ensure that
try_wait_for
andtry_wait_until
do not consistently returnnullopt
in the absence of contending atomic operations.The timeout for
try_wait
is finite but otherwise unspecified.*ptr
.To [atomics.ref.int]:
To [atomics.ref.float]:
To [atomics.ref.pointer]:
To [atomics.types.generic.general]:
To [atomics.types.operations]:
To [atomics.types.int]:
To [atomics.types.float]:
To [atomics.types.pointer]:
To [util.smartptr.atomic.shared]:
To [util.smartptr.atomic.weak]:
To [atomic.flag]:
For
atomic_flag_try_wait
,atomic_flag_try_wait_for
, andatomic_flag_try_wait_until
, letorder
bememory_order::seq_cst
. Letflag
beobject
for the non-member functions, andthis
for the member functions.order
is neithermemory_order::release
normemory_order::acq_rel
.load(order)
and compares its value representation for equality against that ofold
.load(order)
in the previous step.nullopt
.The timeout expires (thread.req.timing) when the current time is after
abs_time
(fortry_wait_until
) or when at leastrel_time
has passed from the start of the function (fortry_wait_for
).An implementation should ensure that
try_wait_for
andtry_wait_until
do not consistently returnnullopt
in the absence of contending atomic operations.The timeout for
try_wait
is finite but otherwise unspecified.*ptr
.To [atomics.wait]:
2.1
atomic<T>::wait
,atomic<T>::wait_for
,atomic::<T>::wait_until
,2.2
atomic_flag::wait
,atomic_flag::wait_for
,atomic_flag::wait_until
,2.3
atomic_wait
and,atomic_wait_explicit
,atomic_wait_for
,atomic_wait_for_explicit
,atomic_wait_until
,atomic_wait_until_explicit
,2.4
atomic_flag_wait
and,atomic_flag_wait_explicit
,atomic_flag_wait_for
,atomic_flag_wait_for_explicit
,atomic_flag_wait_until
,atomic_flag_wait_until_explicit
,and2.5
atomic_ref<T>::wait
,atomic_ref<T>::wait_for
,atomic_ref<T>::wait_until
.— end note]
To [thread.barrier]:
arrival
is associated with the phase synchronization point for the current phase or the immediately preceding phase of the same barrier object.arrival
until the phase completion step of the synchronization point’s phase is run or the timeout expired. If it is unblocked by the timeout there is no effect and it returnsfalse
; otherwise, it returnstrue
.The timeout expires (thread.req.timing) when the current time is after
abs_time
(fortry_wait_until
) or when at leastrel_time
has passed from the start of the function (fortry_wait_for
).The timeout for
try_wait
is finite but otherwise unspecified.An implementation must ensure that
try_wait_for
andtry_wait_until
do not consistently returnfalse
after the phase completion step associated witharrival
has run.[Note: If
arrival
is associated with the synchronization point for a previous phase, the call returns immediately. — end note]system_error
when an exception is required (thread.req.exception) or timeout-related exceptions (thread.req.timing).To thread.latch: