1. Introduction
This paper is the unification of the wording for a series of related C++20 proposals for introducing new synchronization and thread coordination facilities and enhancing existing ones:
-
[P0514]: Efficient
atomic
waiting and semaphores. -
[P0666]: Latches and barriers.
-
[P0995]:
atomic_flag::test
and lockfree integral types.
2. Wording
Note: The � character is used to denote a placeholder number which shall be selected by the editor.
Modify the header synopsis for <atomic>
in [atomics.syn] as follows:
32.2 Header<atomic>
synopsis [atomics.syn]namespace std { // ... // 32.3, type aliases // ... using atomic_intptr_t = atomic<intptr_t>; using atomic_uintptr_t = atomic<uintptr_t>; using atomic_size_t = atomic<size_t>; using atomic_ptrdiff_t = atomic<ptrdiff_t>; using atomic_intmax_t = atomic<intmax_t>; using atomic_uintmax_t = atomic<uintmax_t>;
using atomic_int_fast_wait_t = atomic<implementation-defined>;
using atomic_uint_fast_wait_t = atomic<implementation-defined>;
using atomic_signed_lock_free = see below;
using atomic_unsigned_lock_free = see below;
// ... // 32.8, flag type and operations struct atomic_flag;
bool atomic_flag_test(volatile atomic_flag*) noexcept; bool atomic_flag_test(atomic_flag*) noexcept; bool atomic_flag_test_explicit(volatile atomic_flag*, memory_order) noexcept; bool atomic_flag_test_explicit(atomic_flag*, memory_order) noexcept;bool atomic_flag_test_and_set(volatile atomic_flag*) noexcept; bool atomic_flag_test_and_set(atomic_flag*) noexcept; bool atomic_flag_test_and_set_explicit(volatile atomic_flag*, memory_order) noexcept; bool atomic_flag_test_and_set_explicit(atomic_flag*, memory_order) noexcept; void atomic_flag_clear(volatile atomic_flag*) noexcept; void atomic_flag_clear(atomic_flag*) noexcept; void atomic_flag_clear_explicit(volatile atomic_flag*, memory_order) noexcept; void atomic_flag_clear_explicit(atomic_flag*, memory_order) noexcept; #define ATOMIC_FLAG_INIT see below // 32.9, fences extern "C" void atomic_thread_fence(memory_order) noexcept; extern "C" void atomic_signal_fence(memory_order) noexcept; // 32.�, waiting and notifying functions
template<class T> void atomic_notify_one(const volatile atomic<T>*); template<class T> void atomic_notify_one(const atomic<T>*); void atomic_notify_one(const volatile atomic_flag*); void atomic_notify_one(const atomic_flag*); template<class T> void atomic_notify_all(const volatile atomic<T>*); template<class T> void atomic_notify_all(const atomic<T>*); void atomic_notify_all(const volatile atomic_flag*); void atomic_notify_all(const atomic_flag*); template<class T> void atomic_wait(const volatile atomic<T>*, typename atomic<T>::value_type); template<class T> void atomic_wait(const atomic<T>*, typename atomic<T>::value_type); void atomic_wait(const volatile atomic_flag*, bool); void atomic_wait(const atomic_flag*, bool); template<class T> void atomic_wait_explicit(const volatile atomic<T>*, typename atomic<T>::value_type, memory_order); template<class T> void atomic_wait_explicit(const atomic<T>*, typename atomic<T>::value_type, memory_order); void atomic_wait_explicit(const volatile atomic_flag*, bool, memory_order); void atomic_wait_explicit(const atomic_flag*, bool, memory_order);}
Modify [atomics.alias] as follows:
32.3 Type aliases [atomics.alias]The type aliasesatomic_intN_t
,atomic_uintN_t
,atomic_intptr_t
, andatomic_uintptr_t
are defined if and only ifintN_t
,uintN_t
,intptr_t
, anduintptr_t
are defined, respectively.The type aliasesatomic_signed_lock_free
andatomic_unsigned_lock_free
are defined to be specializations ofatomic
whose template arguments are integral types, respectively signed and unsigned, other thanbool
.is_always_lock_free
shall betrue
foratomic_signed_lock_free
andatomic_unsigned_lock_free
. An implementation should choose the integral specialization ofatomic
for which the waiting and notifying functions are most efficient.The type aliasesatomic_int_fast_wait_t
andatomic_uint_fast_wait_t
are integral atomic types. Implementations should ensure that invocations of waiting and notifying functions (32.�) with these types have the lowest performance overhead among integer types.
Note: The reference to "waiting and notifying functions" in the above change should refer to the new [atomic.wait] subclause.
Modify [atomics.flag] as follows:
32.8 Flag type and operations [atomics.flag]namespace std { struct atomic_flag {bool test(memory_order = memory_order_seq_cst) volatile noexcept; bool test(memory_order = memory_order_seq_cst) noexcept;bool test_and_set(memory_order = memory_order_seq_cst) volatile noexcept; bool test_and_set(memory_order = memory_order_seq_cst) noexcept; void clear(memory_order = memory_order_seq_cst) volatile noexcept; void clear(memory_order = memory_order_seq_cst) noexcept; atomic_flag() noexcept = default; atomic_flag(const atomic_flag&) = delete; atomic_flag& operator=(const atomic_flag&) = delete; atomic_flag& operator=(const atomic_flag&) volatile = delete; };
bool atomic_flag_test(volatile atomic_flag*) noexcept; bool atomic_flag_test(atomic_flag*) noexcept; bool atomic_flag_test_explicit(volatile atomic_flag*, memory_order) noexcept; bool atomic_flag_test_explicit(atomic_flag*, memory_order) noexcept;bool atomic_flag_test_and_set(volatile atomic_flag*) noexcept; bool atomic_flag_test_and_set(atomic_flag*) noexcept; bool atomic_flag_test_and_set_explicit(volatile atomic_flag*, memory_order) noexcept; bool atomic_flag_test_and_set_explicit(atomic_flag*, memory_order) noexcept; void atomic_flag_clear(volatile atomic_flag*) noexcept; void atomic_flag_clear(atomic_flag*) noexcept; void atomic_flag_clear_explicit(volatile atomic_flag*, memory_order) noexcept; void atomic_flag_clear_explicit(atomic_flag*, memory_order) noexcept;
#define ATOMIC_FLAG_INIT see below
}
Theatomic_flag
type provides the classic test-and-set functionality. It has two states, set and clear.Operations on an object of typeatomic_flag
shall be lock-free. [ Note: Hence the operations should also be address-free. — end note ]Theatomic_flag
type is a standard-layout struct. It has a trivial default constructor and a trivial destructor.The macroATOMIC_FLAG_INIT
shall be defined in such a way that it can be used to initialize an object of typeatomic_flag
to the clear state. The macro can be used in the form:atomic_flag guard = ATOMIC_FLAG_INIT;It is unspecified whether the macro can be used in other initialization contexts. For a complete static-duration object, that initialization shall be static. Unless initialized with
ATOMIC_FLAG_INIT
, it is unspecified whether anatomic_flag
object has an initial state of set or clear.bool atomic_flag_test(volatile atomic_flag* object) noexcept; bool atomic_flag_test(atomic_flag* object) noexcept; bool atomic_flag_test_explicit(volatile atomic_flag* object, memory_order order) noexcept; bool atomic_flag_test_explicit(atomic_flag* object, memory_order order) noexcept; bool atomic_flag::test(memory_order order = memory_order_seq_cst) volatile noexcept; bool atomic_flag::test(memory_order order = memory_order_seq_cst) noexcept;Requires: Theorder
argument shall not bememory_order_release
normemory_order_acq_rel
.Effects: Memory is affected according to the value oforder
.Returns: Atomically returns the value pointed to byobject
orthis
.bool atomic_flag_test_and_set(volatile atomic_flag* object) noexcept; bool atomic_flag_test_and_set(atomic_flag* object) noexcept; bool atomic_flag_test_and_set_explicit(volatile atomic_flag* object, memory_order order) noexcept; bool atomic_flag_test_and_set_explicit(atomic_flag* object, memory_order order) noexcept; bool atomic_flag::test_and_set(memory_order order = memory_order_seq_cst) volatile noexcept; bool atomic_flag::test_and_set(memory_order order = memory_order_seq_cst) noexcept;Effects: Atomically sets the value pointed to byobject
or bythis
totrue
. Memory is affected according to the value oforder
. These operations are atomic read-modify-write operations (4.7).Returns: Atomically, the value of the object immediately before the effects.void atomic_flag_clear(volatile atomic_flag* object) noexcept; void atomic_flag_clear(atomic_flag* object) noexcept; void atomic_flag_clear_explicit(volatile atomic_flag* object, memory_order order) noexcept; void atomic_flag_clear_explicit(atomic_flag* object, memory_order order) noexcept; void atomic_flag::clear(memory_order order = memory_order_seq_cst) volatile noexcept; void atomic_flag::clear(memory_order order = memory_order_seq_cst) noexcept;Requires: The
order
argument shall not bememory_order_consume
,memory_order_acquire
, normemory_order_acq_rel
.Effects: Atomically sets the value pointed to by
object
or bythis
tofalse
. Memory is affected according to the value oforder
.
Add a new subclause after [atomics.fences]:
32.� Waiting and notifying functions [atomics.wait]The functions in this subclause provide a mechanism to wait for the value of an atomic object to change, more efficiently than can be achieved with polling. Waiting functions in this facility may block until they are unblocked by notifying functions, according to each function’s effects. [ Note: Programs are not guaranteed to observe transient atomic values, an issue known as the A-B-A problem, resulting in continued blocking if a condition is only temporarily met. – End note ]The functionsatomic_wait
andatomic_wait_explicit
are waiting functions. The functionsatomic_notify_one
andatomic_notify_all
are notifying functions.template<class T> void atomic_notify_one(const volatile atomic<T>* object); template<class T> void atomic_notify_one(const atomic<T>* object); void atomic_notify_one(const volatile atomic_flag* object); void atomic_notify_one(const atomic_flag* object);Effects: Unblocks up to execution of a waiting function that blocked after observing the result of an atomic operationX
, if there exists another atomic operationY
, such thatX
precedesY
in the modification order of*object
, andY
happens before this call.template<class T> void atomic_notify_all(const volatile atomic<T>* object); template<class T> void atomic_notify_all(const atomic<T>* object); void atomic_notify_all(const volatile atomic_flag* object); void atomic_notify_all(const atomic_flag* object);Effects: Unblocks each execution of a waiting function that blocked after observing the result of an atomic operationX
, if there exists another atomic operationY
, such thatX
precedesY
in the modification order of*object
, andY
happens before this call.template<class T> void atomic_wait(const volatile atomic<T>* object, typename atomic<T>::value_type old); template<class T> void atomic_wait(const atomic<T>* object, typename atomic<T>::value_type old); void atomic_wait(const volatile atomic_flag* object, bool old); void atomic_wait(const atomic_flag* object, bool old);Effects: Equivalent toatomic_wait_explicit(object, old, memory_order_seq_cst);
template<class T> void atomic_wait_explicit(const volatile atomic<T>* object, typename atomic<T>::value_type old, memory_order order); template<class T> void atomic_wait_explicit(const atomic<T>* object, typename atomic<T>::value_type old, memory_order order);Requires: The order argument shall not bememory_order_release
normemory_order_acq_rel
.Effects: Repeatedly performs the following steps, in order:
Evaluates
object->load(order) != old
then, if the result istrue
, returns.Blocks until an implementation-defined condition has been met. [ Note: Consequently, it may unblock for reasons other than a call to a notifying function. — end note ]
void atomic_wait_explicit(const volatile atomic_flag* object, bool old, memory_order order); void atomic_wait_explicit(const atomic_flag* object, bool old, memory_order order);Requires: The order argument shall not bememory_order_release
normemory_order_acq_rel
.Effects: Repeatedly performs the following steps, in order:
Evaluates
object->test(order) != old
then, if the result istrue
, returns.Blocks until an implementation-defined condition has been met. [ Note: Consequently, it may unblock for reasons other than a call to a notifying function. — end note ]
Modify Table 140 "Thread support library summary" in [thread.general] as follows:
Table 140 — Thread support library summary
Subclause Header(s) 33.2 Requirements 33.3 Threads <thread>
33.4 Mutual exclusion <mutex>
<shared_mutex>
33.5 Condition variables <condition_variable>
33.� Semaphores <semaphore>
33.� Latches and barriers <latch>
<barrier>
33.6 Futures <future>
Add two new subclauses after [thread.condition]:
33.� Semaphores [thread.semaphore]Semaphores are lightweight synchronization primitives used to constrain concurrent access to a shared resource. They are widely used to implement other synchronization primitives and, whenever both are applicable, can be more efficient than condition variables.A counting semaphore is a semaphore object that models a non-negative resource count. A binary semaphore is a semaphore object that has only two states, also known as available and unavailable. [ Note: A binary semaphore should be more efficient than a counting semaphore with a unit magnitude count. – end note ]
33.�.1 Header<semaphore>
synopsis [thread.semaphore.syn]namespace std { template<ptrdiff_t least_max_value = implementation-defined> class counting_semaphore; using binary_semaphore = counting_semaphore<1>; }
33.�.2 Class templatecounting_semaphore
[thread.semaphore.counting.class]namespace std { template<ptrdiff_t least_max_value> class counting_semaphore { public: static constexpr ptrdiff_t max() noexcept; explicit constexpr counting_semaphore(ptrdiff_t); ~counting_semaphore(); counting_semaphore(const basic_semaphore&) = delete; counting_semaphore(basic_semaphore&&) = delete; counting_semaphore& operator=(const basic_semaphore&) = delete; counting_semaphore& operator=(basic_semaphore&&) = delete; void release(ptrdiff_t update = 1); void acquire(); bool try_acquire(); template<class Clock, class Duration> bool try_acquire_until(const chrono::time_point<Clock, Duration>&); template<class Rep, class Period> bool try_acquire_for(const chrono::duration<Rep, Period>&); private: ptrdiff_t counter; // exposition only }; }
Classcounting_semaphore
maintains an internal counter that is initialized when the semaphore is created. Threads may block waiting untilcounter >= 1
.Semaphores permit concurrent invocation of therelease
,acquire
,try_acquire
,try_acquire_for
, andtry_acquire_until
member functions.static constexpr ptrdiff_t max() noexcept;Returns: The maximum value ofcounter
. This value shall not be less than that of the template argumentleast_max_value
. [ Note: The value may exceed least_max_value. – end note ]explicit constexpr counting_semaphore(ptrdiff_t desired);Requires:desired >= 0
anddesired <= max()
.Effects:counter = desired
.~counting_semaphore();Requires: For every function call that blocks oncounter
, a function call that will cause it to unblock and return shall happen before this call. [ Note: This relaxes the usual rules, which would have required all wait calls to happen before destruction. — end note ]Effects: Destroys the object.void release(ptrdiff_t update = 1);Requires:update >= 0
, andcounter + update <= max()
.Effects:counter += update
, executed atomically. If any threads are blocked on counter, unblocks them.Synchronization: Strongly happens before invocations oftry_acquire
that observe the result of the effects.bool try_acquire();Effects:
With low probability, returns immediately. [ Note: An implementation should ensure that
try_acquire
does not consistently returnfalse
in the absence of contending acquisitions. — end note ]Otherwise, if
counter >= 1
, thencounter -= 1
is executed atomically.Returns:true
ifcounter
was decremented, otherwisefalse
.void acquire();Effects: Repeatedly performs the following steps, in order:
Evaluates
try_acquire
, then, if the result istrue
, returns.Blocks until
counter >= 1
.template<class Clock, class Duration> bool try_acquire_until(const chrono::time_point<Clock, Duration>& abs_time); template<class Rep, class Period> bool try_acquire_for(const chrono::duration<Rep, Period>& rel_time);Effects: Repeatedly performs the following steps, in order:
Evaluates
try_acquire
. If the result istrue
, returnstrue
.Blocks until the timeout expires or
counter >= 1
. If the timeout expired, returnsfalse
.Throws: Timeout-related exceptions (33.2.4).
33.� Coordination Types [thread.coord]This section describes various concepts related to thread coordination, and defines the coordination typeslatch
andbarrier
. These types facilitate concurrent computation performed by a number of threads, in one or more phases.In this subclause, a synchronization point represents a condition that a thread may contribute to or wait for, potentially blocking until it is satisfied. A thread arrives at the synchronization point when it has an effect on the state of the condition, even if it does not cause it to become satisfied.Concurrent invocations of the member functions of coordination types, other than their destructors, do not introduce data races.
33.�.1 Latches [thread.coord.latch]A latch is a thread coordination mechanism that allows any number of threads to block until an expected count is summed (exactly) by threads that arrived at the latch. The expected count is set when the latch is constructed. An individual latch is a single-use object; once the count has been reached, the latch cannot be reused.
33.�.1.1 Header<latch>
synopsis [thread.coord.latch.syn]namespace std { class latch }
33.�.1.2 Classlatch
[thread.coord.latch.class]namespace std { class latch { public: explicit constexpr latch(ptrdiff_t expected); ~latch(); latch(const latch&) = delete; latch(latch&&) = delete; latch& operator=(const latch&) = delete; latch& operator=(latch&&) = delete; void arrive(ptrdiff_t update = 1); bool try_wait() const noexcept; void wait() const; void arrive_and_wait(ptrdiff_t n update = 1); private: ptrdiff_t counter; // exposition only }; }
Alatch
maintains an internal counter that is initialized when thelatch
is created. Threads may block at thelatch
’s synchronization point, waiting forcounter
to be decremented to0
.explicit constexpr latch(ptrdiff_t expected);Requires:expected >= 0
.Effects:counter = expected
.~latch();Requires: No threads are blocked at the synchronization point.Effects: Destroys the latch.Remarks: May be called even if some threads have not yet returned from functions that block at the synchronization point, provided that they are unblocked. [ Note: The destructor may block until all threads have exited invocations ofwait
on this object. — end note ]void arrive(ptrdiff_t update);Requires:counter >= update
andupdate >= 0
.Effects: Atomically decrementscounter
byupdate
.Synchronization: Synchronizes with the returns from all calls unblocked by the effects.Remarks: Arrives at the synchronization point withupdate
count.bool try_wait() const noexcept;Returns:counter == 0
.void wait() const;Effects: Ifcounter == 0
, returns immediately. Otherwise, blocks the calling thread at the synchronization point untilcounter == 0
.void arrive_and_wait(ptrdiff_t update);Effects: Equivalent toarrive(update); wait();
.
33.�.2 Barriers [thread.coord.barrier]A barrier is a thread coordination mechanism that allows at most an expected count of threads to block until that count is summed (exactly) by threads that arrived at the barrier in each of its successive phases. Once threads are released from blocking at the synchronization point for a phase, they can reuse the same barrier immediately in its next phase. [ Note: It is thus useful for managing repeated tasks, or phases of a larger task, that are handled by multiple threads. — end note ]A barrier has a completion step that is a (possibly empty) set of effects associated with a phase of the barrier. When the member functions defined in this subclause arrive at the barrier, they have the following effects:
When the expected number of threads for this phase have arrived at the barrier, one of those threads executes the barrier type’s completion step.
When the completion step is completed, all threads blocked at the synchronization point for this phase are unblocked and the barrier enters its next phase. The end of the completion step strongly happens before the returns from all calls unblocked by its completion.
33.�.2.1 Header<barrier>
synopsis [thread.coord.barrier.syn]namespace std { template<class CompletionFunction = implementation-defined> class barrier; }
33.�.2.2 Class templatebarrier
[thread.coord.barrier.class]namespace std { template<class CompletionFunction> class barrier { public: using arrival_token = implementation-defined; explicit barrier(ptrdiff_t expected, CompletionFunction f = CompletionFunction()); ~barrier(); barrier(const barrier&) = delete; barrier(barrier&&) = delete; barrier& operator=(const barrier&) = delete; barrier& operator=(barrier&&) = delete; [[nodiscard]] arrival_token arrive(ptrdiff_t update = 1); bool try_wait(arrival_token) const; void wait(arrival_token) const; void arrive_and_wait(); void arrive_and_drop(); private: CompletionFunction completion; // exposition only }; }
Abarrier
is a barrier type with a completion step controlled by a function object. The completion step callscompletion
. Threads may block at the barrier’s synchronization point for a phase, waiting for the expected sum contributions by threads that arrive in that phase.CompletionFunction
shall beCopyConstructible
andis_invocable_r_v<void, CompletionFunction>
shall betrue
.barrier::arrival_token
is an implementation-defined type that isMoveConstructible
andMoveAssignable
.explicit barrier(ptrdiff_t expected, CompletionFunction f);Requires:expected >= 0
, andnoexcept(f())
shall betrue
.Effects: Initializes the barrier forexpected
number of threads in the first phase, and initializescompletion
withstd::move(f)
. [ Note: Ifexpected
is0
this object may only be destroyed. — end note ]~barrier();Requires: No threads are blocked at a synchronization point for any phase.Effects: Destroys the barrier.Remarks: May be called even if some threads have not yet returned from functions that block at a synchronization point, provided that they have unblocked. [ Note: The destructor may block until all threads have exited invocations of wait() on this object. — end note ][[nodiscard]] arrival_token arrive(ptrdiff_t update);Requires: The expected count is not less thanupdate
.Effects: Constructs an object of typearrival_token
that is associated with thebarrier
's synchronization point for the current phase, then arrivesupdate
times at the synchronization point for the current phase.Synchronization: The call toarrive
strongly happens before the start of the completion step for the current phase.Returns: The constructed object.Remarks: This may cause the completion step to start.bool try_wait(arrival_token arrival) const;Requires:arrival
is associated with a synchronization point for the current or the immediately preceding phases of thebarrier
.Returns:true
if the synchronization point condition associated witharrival
is satisfied, otherwisefalse
.void wait(arrival_token arrival) const;Requires:arrival
is associated with a synchronization point for the current or the immediately preceding phases of thebarrier
.Effects: Blocks at the synchronization point associated witharrival
until the condition is satisfied.void arrive_and_wait();Effects: Equivalent towait(arrive())
.void arrive_and_drop();Requires: The expected number of threads for the current phase is not0
.Effects: Decrements the expected number of threads for subsequent phases by1
, then arrives at the synchronization point for the current phase.Synchronization: The call toarrive_and_drop
strongly happens before the start of the completion step for the current phase.Remarks: This may cause the completion phase to start.
Create the following feature test macros:
-
__cpp_lib_atomic_lock_free_type_aliases
, which implies thatatomic_signed_lock_free
andatomic_unsigned_lock_free
types are available. -
__cpp_lib_atomic_flag_test
, which implies thetest
methods and free functions foratomic_flag
are available. -
__cpp_lib_atomic_wait
, which implies thenotify_*
andwait
methods and free functions foratomic
andatomic_flag
and theatomic_int_fast_wait_t
andatomic_uint_fast_wait_t
types are available. -
__cpp_lib_semaphore
, which implies thatcounting_semaphore
andbinary_semaphore
are available. -
__cpp_lib_latch
, which implies thatlatch
is available. -
__cpp_lib_barrier
, which implies thatbarrier
is available.