This document presents a complete proposal for Chapter 29, Atomic Operations Library, of the C++ standard. It is based on the atomic operation proposals N2047 by Hans Boehm, N2145 by Hans Boehm and Lawrence Crowl, on the memory model proposed in N2153 by Silvera, Wong, McKenney and Blainey, and on Alexander Terekhov's contributions to mailing lists and the comp.programming.threads discussion group.
This document deviates from the previous proposals in the following major areas:
std::atomic<T>
class template);
The proposal mandates the existence of the complete contents of the <atomic>
header and does not make any part of it optional. Given the expected time frame
of the C++ standard and the current hardware trends, this should not pose a problem
for most target platforms. A platform that does not provide a compare and swap primitive
has two choices: not conform to the requirements or provide an emulation using a
hidden spinlock pool, as explained in N2145. A spinlock primitive is included in
the proposed text in order to avoid the undesirable case where the application implements
a spinlock on top of such emulated atomics.
Since most, if not all, lock-free algorithms and data structures require the compare
and swap primitive, a program or library is not likely to be able to use the facilities
in <atomic>
in a meaningful way if atomic_compare_swap
is not present.
The general consensus is that an atomic library should be constrained to operate on a certain specifically designated set of types. This proposal disagrees and requires that any POD type that meets certain implementation-defined size and alignment restrictions be atomic with respect to the library operations. The motivation for this is twofold: one, allow C structs such as those shown below to be manipulated atomically:
struct rwlock_state { unsigned writer: 1; unsigned readers: 31; }; struct uses_dwcas { void * p1; void * p2; };
Two, to allow a variable to be manipulated both atomically and non-atomically in different parts of the code. This scenario typically arises in "mostly lock-free" data structures where the concurrent atomic accesses are guarded by a read lock, whereas the exclusive accesses are done under the protection of a write lock and need not be atomic and suffer the corresponding penalty.
The ability to use atomic and non-atomic accesses on the same variable is a very sharp knife. I maintain, however, that the ability to use low-level atomic accesses at all is a sufficiently sharp knife in itself and will be only be done by expert programmers. At a certain level of expertise, one values much more the ability to succintly express a notion without the compiler throwing obstacles along the way, rather than the annoying tendency of erecting safeguards where none are needed.
I should note, however, that the rationale in this section only applies to the low-level, C compatible API presented in the current proposal. A high-level interface will obviously have different properties since it would target a different audience.
It is also worth mentioning that the rest of the proposal is independent of the definition of an atomic type and can be accepted even with the requirements being tightened to only allow a specific set of atomic types.
The proposed text does not include a high level C++ interface to the low-level atomic
operations such as a std::atomic<T>
class template. This is partially
motivated by the fact that the precise semantics of std::atomic
are
still being discussed and are generally not agreed upon. Another reason for the
exclusion of std::atomic
is that it is completely implementable as
a library component (whereas the low-level operations will generally be either compiler
intrinsics or thin wrappers over compiler intrinsics). This makes std::atomic
a non-critical component for C++09 and a candidate for a technical report.
Even though the proposed text does not contain std::atomic
, it has
been written with its various definitions in mind and is sufficiently capable of
expressing them.
The previous proposals contained only one bidirectional contraint, ordered. Its precise definition varied, but the trend has been towards making it as strong as possible. At the same time, it is generally acknowledged that there is a need for a weaker bidirectional constraint that combines the semantics of the two unidirectional constraints acquire and release. This documents proposes the name __acq_rel for it.
A previous version of this proposal used a function/macro name suffix to specify
the constraint, such as atomic_load_acquire( &x )
. This was consistent
with the direction that the other proposals have been taking. However, in the course
of producing a prototype implementation, I discovered that I had to introduce a
lower layer that was parameterized on the constraint type and used the notation
__atomic_load( __acquire, &x )
. This simplified many parts of the
implementation considerably because it allowed the constraint to be passed to a
lower-level function unmodified, rather than require five separate functions to
be written for the same purpose, as shown in the following example:
template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v ) { // static_assert( __is_integral(T) ); T r = *p; while( !atomic_compare_swap( Cn(), p, &r, r ^ v ) ); return r; }
The same constraint notation then happened to also provide a nice interface for specifying the various kinds of fences.
This document includes bidirectional fences as proposed in N2153, which also gives
motivating examples. One additional fence that "falls apart" from the notation is
the acquire+release fence that is expressed as atomic_memory_fence( __acq_rel
)
and provides the equivalent of #LoadLoad | #LoadStore | #StoreStore
in Sun notation or lwsync on IBM platforms.
The atomic_compiler_fence( constraint )
function/macro provides
fences that only affect
compiler reorderings and have no effect on the hardware. A need for such control
arises in low-level code that communicates with interrupt or signal handlers.
The current approaches for dealing with the ordered constraint have included
The current text adopts a hybrid approach. It specifies the semantics of ordered in isolation while at the same time demanding that:
This allows programmers to pick the level of consistency they desire. It also allows
an std::atomic<T>
class template to provide either SC or CCCC.
This clause describes facilities that C++ programs may use to concurrently access data from multiple threads without introducing undefined behavior.
A POD type is atomic if it meets implementation-defined size and alignment requirements. If a type is atomic, all types with the same size and equal or stricter alignment shall also be atomic.
// Constraints typedef unspecified _Relaxed; typedef unspecified _Acquire; typedef unspecified _Release; typedef unspecified _Acq_Rel; typedef unspecified _Ordered; #define __relaxed see below #define __acquire see below #define __release see below #define __acq_rel see below #define __ordered see below // Operations template< class Cn, class T > inline T atomic_load( Cn, T const * p ); template< class Cn, class T > inline T atomic_load( Cn, T const volatile * p ); template< class Cn, class T > inline void atomic_store( Cn, T * p, T v ); template< class Cn, class T > inline void atomic_store( Cn, T volatile * p, T v ); template< class Cn, class T > inline T atomic_swap( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_swap( Cn, T volatile * p, T v ); template< class Cn, class T > inline bool atomic_compare_swap( Cn, T * p, T * v, T w ); template< class Cn, class T > inline bool atomic_compare_swap( Cn, T volatile * p, T * v, T w ); template< class Cn, class T > inline T atomic_fetch_add( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_add( Cn, T volatile * p, T v ); template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * * p, ptrdiff_t v ); template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * volatile * p, ptrdiff_t v ); template< class Cn, class T > inline T atomic_fetch_and( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_and( Cn, T volatile * p, T v ); template< class Cn, class T > inline T atomic_fetch_or( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_or( Cn, T volatile * p, T v ); template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_xor( Cn, T volatile * p, T v ); template< class T > inline void atomic_increment( T * p ); template< class T > inline void atomic_increment( T volatile * p ); template< class T > inline bool atomic_decrement( T * p ); template< class T > inline bool atomic_decrement( T volatile * p ); template< class T > inline T * atomic_load_address( T * const * p ); template< class T > inline T * atomic_load_address( T * const volatile * p ); // Fences template< class Cn > inline void atomic_memory_fence( Cn ); template< class Cn > inline void atomic_compiler_fence( Cn ); // Spinlocks typedef unspecified atomic_spinlock_t; #define ATOMIC_SPINLOCK_INITIALIZER unspecified inline bool atomic_spin_trylock( atomic_spinlock_t * lock ); inline void atomic_spin_unlock( atomic_spinlock_t * lock );
The header <atomic>
defines facilities for concurrent access
to shared data without synchronization.
All symbols in <atomic>
are allowed to be defined as macros or
compiler intrinsics. The operations are shown as function templates for functional
specification and presentation purposes. The implementation is allowed to not provide
functions or function templates. A program that passes an explicit template parameter
to an atomic operation is ill-formed. A program that tries to obtain the address
of any function or function template defined in <atomic>
is ill-formed.
The inclusion of <atomic>
in a translation unit reserves all
identifiers starting with atomic_ or ATOMIC_ to the implementation.
The symbols __relaxed
, __acquire
, __release
,
__acq_rel
and __ordered
are shown in the synopsis as macros
for presentation purposes. The implementation is allowed to not define them as macros,
subject to the requirements below.
All atomic operations and fences take a constraint as a first argument.
A constraint is a value of type _Relaxed
, _Acquire
,
_Release
, _Acq_Rel
or _Ordered
.
A constraint of type _Relaxed
places no additional requirements on
the operation.
A constraint of type _Acquire
ensures that the load part of the atomic
operation has acquire semantics as defined in [intro.concur].
A constraint of type _Release
ensures that the store part of the atomic
operation has release semantics as defined in [intro.concur].
A constraint of type _Acq_Rel
combines the semantics of _Acquire
and _Release
and is only allowed on read-modify-write atomic operations.
A constraint of type _Ordered
ensures that all operations that precede
the atomic operation in program order are performed before the atomic operation
with respect to any other thread, and that all operations that follow the atomic
operation are performed after the atomic operation with respect to any other thread.
A program that contains no data races except those introduced by atomic operations
with an _Ordered
constraint shall produce a sequentially consistent
execution.
A program that contains no data races except those introduced by atomic operations
with an _Ordered
constraint or atomic_load operations with an _Acquire
constraint shall produce a cache-coherent causally consistent execution.
An atomic operation on a non-volatile object is not part of the observable behavior ([intro.execution]). An implementation is allowed to transform, reorder, coalesce or eliminate atomic operations on non-volatile objects if this produces a valid execution according to [intro.execution] and [intro.concur].
All atomic operations require an atomic type as T
. A program
that attempts to use an atomic operation on a non-atomic type is ill-formed.
No atomic operation or fence throws an exception.
typedef unspecified _Relaxed; typedef unspecified _Acquire; typedef unspecified _Release; typedef unspecified _Acq_Rel; typedef unspecified _Ordered;
The types _Relaxed
, _Acquire
, _Release
,
_Acq_Rel
and _Ordered
are unspecified distinct POD types.
#define __relaxed see below #define __acquire see below #define __release see below #define __acq_rel see below #define __ordered see below
The symbols __relaxed
, __acquire
, __release
,
__acq_rel
and __ordered
are values of type _Relaxed
,
_Acquire
, _Release
, _Acq_Rel
and _Ordered
,
respectively. It is unspecified whether they are lvalues or rvalues.
[Note: two possible definitions of _Relaxed
and __relaxed
might be
struct _Relaxed {}; #define __relaxed _Relaxed()
and
enum _Relaxed { __relaxed };
--end note]
template< class Cn, class T > inline T atomic_load( Cn, T const * p ); template< class Cn, class T > inline T atomic_load( Cn, T const volatile * p );
Requires: Cn
shall be one of _Relaxed
, _Acquire
or _Ordered
.
Returns: *p
.
template< class Cn, class T > inline void atomic_store( Cn, T * p, T v ); template< class Cn, class T > inline void atomic_store( Cn, T volatile * p, T v );
Requires: Cn
shall be one of _Relaxed
, _Release
or _Ordered
.
Effects: *p = v;
template< class Cn, class T > inline T atomic_swap( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_swap( Cn, T volatile * p, T v );
Effects: *p = v;
Returns: The old contents of *p
.
template< class Cn, class T > inline bool atomic_compare_swap( Cn, T * p, T * v, T w ); template< class Cn, class T > inline bool atomic_compare_swap( Cn, T volatile * p, T * v, T w );
Effects: Compares *p
with *v
by using a bitwise
comparison and if they match, updates *p
to contain w
.
In either case updates *v
with the old value of *p
by
using a bitwise copy. Performs no lvalue to rvalue conversion on *p
or *v
.
Returns: true
if the update to *p
has succeeded,
false
otherwise. The function is allowed to fail spuriously.
[Example:
template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v ) { static_assert( std::is_integral<T>::value, "This function requires an integral type" ); T r = *p; while( !atomic_compare_swap( Cn(), p, &r, r ^ v ) ); return r; }
--end example]
template< class Cn, class T > inline T atomic_fetch_add( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_add( Cn, T volatile * p, T v );
Requires: T
shall be integral.
Effects: *p += v;
Returns: The old contents of *p
.
template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * * p, ptrdiff_t v ); template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * volatile * p, ptrdiff_t v );
Effects: *p += v;
Returns: The old contents of *p
.
template< class Cn, class T > inline T atomic_fetch_and( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_and( Cn, T volatile * p, T v );
Requires: T
shall be integral.
Effects: *p &= v;
Returns: The old contents of *p
.
template< class Cn, class T > inline T atomic_fetch_or( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_or( Cn, T volatile * p, T v );
Requires: T
shall be integral.
Effects: *p |= v;
Returns: The old contents of *p
.
template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v ); template< class Cn, class T > inline T atomic_fetch_xor( Cn, T volatile * p, T v );
Requires: T
shall be integral.
Effects: *p ^= v;
Returns: The old contents of *p
.
template< class T > inline void atomic_increment( T * p ); template< class T > inline void atomic_increment( T volatile * p );
Requires: T
shall be integral.
Effects: ++*p;
Constraint: _Relaxed
.
template< class T > inline bool atomic_decrement( T * p ); template< class T > inline bool atomic_decrement( T volatile * p );
Requires: T
shall be integral.
Effects: --*p;
Returns: true
if the new value of *p
is zero,
false
otherwise.
Constraint: _Acquire
if the new value of *p
is
zero, _Release
otherwise.
template< class T > inline T * atomic_load_address( T * const * p ); template< class T > inline T * atomic_load_address( T * const volatile * p );
Returns: *p
.
Constraint: _Acquire
only with respect to **p
.
template< class Cn > inline void atomic_memory_fence( Cn );
Effects: When Cn
is
_Relaxed
, has no effect;_Acquire
, ensures that all subsequent operations in program order are
performed after all preceding loads in program order;_Release
, ensures that all preceding operations in program order are
performed after all subsequent stores in program order;_Acq_Rel
, combines the semantics of _Acquire
and
_Release
;_Ordered
, ensures that all preceding operations in program order are
performed after all subsequent operations in program order.template< class Cn > inline void atomic_compiler_fence( Cn );
Effects: as atomic_memory_fence
, but only inhibits compiler
reorderings and has no effect on hardware reorderings.
typedef unspecified atomic_spinlock_t;
atomic_spinlock_t
is an unspecified POD type that represents a spinlock.
The behavior of a program that overwrites the contents of a locked spinlock
is undefined.
#define ATOMIC_SPINLOCK_INITIALIZER unspecified
ATOMIC_SPINLOCK_INITIALIZER
shall be used as an initializer for atomic_spinlock_t
objects with static storage duration.
inline bool atomic_spin_trylock( atomic_spinlock_t * spinlock );
Requires: *spinlock
shall be an initialized object of type
atomic_spinlock_t
.
Effects: Attempts to lock *spinlock
. The attempt shall succeed
only if *spinlock
is not locked.
Returns: true if *spinlock
has been locked by the current
thread, false otherwise.
Constraint: _Acquire
.
inline void atomic_spin_unlock( atomic_spinlock_t * spinlock );
Requires: *spinlock
shall be locked by the current thread.
Effects: Unlocks *spinlock
.
Postconditions: *spinlock
is no longer locked.
Constraint: _Release
.
Add to the synopsis of <type_traits> the following type property trait:
template< class T > struct is_atomic;
Add to Table 39, Type Property Predicates, the following:
template< class T > struct is_atomic; |
T is atomic ([atomics]) | T shall be a complete type. |
A prototype implementation of the proposal for Microsoft Visual C++ 7.1 is available at:
http://www.pdimov.com/cpp/N2195/atomic.hpp
Thanks to Hans Boehm and Lawrence Crowl for reviewing this document.
--end