Document number: N2195=07-0055

Programming Language C++, Evolution/Library

2007-03-07

Proposed Text for Chapter 29, Atomic Operations Library [atomics]

Overview
Rationale
Proposed Text
Implementability

I. Overview

This document presents a complete proposal for Chapter 29, Atomic Operations Library, of the C++ standard. It is based on the atomic operation proposals N2047 by Hans Boehm, N2145 by Hans Boehm and Lawrence Crowl, on the memory model proposed in N2153 by Silvera, Wong, McKenney and Blainey, and on Alexander Terekhov's contributions to mailing lists and the comp.programming.threads discussion group.

II. Rationale

This document deviates from the previous proposals in the following major areas:

Absence of feature test macros or functions;
Absence of a dedicated enumerated set of atomic types;
Absence of a high-level C++ API (the std::atomic<T> class template);
Inclusion of an acquire+release ordering constraint;
Passing the constraint as a first argument instead of using several functions or macros with the appropriate suffix;
Introduction of bi-directional fences as proposed in N2153;
Additional requirements on the ordered constraint that enable the programmer to achieve sequential consistency and CCCC (cache-coherent casual consistency).

A. Absence of feature test macros or functions

The proposal mandates the existence of the complete contents of the <atomic> header and does not make any part of it optional. Given the expected time frame of the C++ standard and the current hardware trends, this should not pose a problem for most target platforms. A platform that does not provide a compare and swap primitive has two choices: not conform to the requirements or provide an emulation using a hidden spinlock pool, as explained in N2145. A spinlock primitive is included in the proposed text in order to avoid the undesirable case where the application implements a spinlock on top of such emulated atomics.

Since most, if not all, lock-free algorithms and data structures require the compare and swap primitive, a program or library is not likely to be able to use the facilities in <atomic> in a meaningful way if atomic_compare_swap is not present.

B. Absence of a dedicated enumerated set of atomic types

The general consensus is that an atomic library should be constrained to operate on a certain specifically designated set of types. This proposal disagrees and requires that any POD type that meets certain implementation-defined size and alignment restrictions be atomic with respect to the library operations. The motivation for this is twofold: one, allow C structs such as those shown below to be manipulated atomically:

struct rwlock_state
{
    unsigned writer: 1;
    unsigned readers: 31;
};

struct uses_dwcas
{
    void * p1;
    void * p2;
};

Two, to allow a variable to be manipulated both atomically and non-atomically in different parts of the code. This scenario typically arises in "mostly lock-free" data structures where the concurrent atomic accesses are guarded by a read lock, whereas the exclusive accesses are done under the protection of a write lock and need not be atomic and suffer the corresponding penalty.

The ability to use atomic and non-atomic accesses on the same variable is a very sharp knife. I maintain, however, that the ability to use low-level atomic accesses at all is a sufficiently sharp knife in itself and will be only be done by expert programmers. At a certain level of expertise, one values much more the ability to succintly express a notion without the compiler throwing obstacles along the way, rather than the annoying tendency of erecting safeguards where none are needed.

I should note, however, that the rationale in this section only applies to the low-level, C compatible API presented in the current proposal. A high-level interface will obviously have different properties since it would target a different audience.

It is also worth mentioning that the rest of the proposal is independent of the definition of an atomic type and can be accepted even with the requirements being tightened to only allow a specific set of atomic types.

C. Absence of a high-level C++ API

The proposed text does not include a high level C++ interface to the low-level atomic operations such as a std::atomic<T> class template. This is partially motivated by the fact that the precise semantics of std::atomic are still being discussed and are generally not agreed upon. Another reason for the exclusion of std::atomic is that it is completely implementable as a library component (whereas the low-level operations will generally be either compiler intrinsics or thin wrappers over compiler intrinsics). This makes std::atomic a non-critical component for C++09 and a candidate for a technical report.

Even though the proposed text does not contain std::atomic, it has been written with its various definitions in mind and is sufficiently capable of expressing them.

D. Inclusion of an acquire+release ordering constraint

The previous proposals contained only one bidirectional contraint, ordered. Its precise definition varied, but the trend has been towards making it as strong as possible. At the same time, it is generally acknowledged that there is a need for a weaker bidirectional constraint that combines the semantics of the two unidirectional constraints acquire and release. This documents proposes the name __acq_rel for it.

E. Passing the constraint as a first argument instead of using a suffix

A previous version of this proposal used a function/macro name suffix to specify the constraint, such as atomic_load_acquire( &x ). This was consistent with the direction that the other proposals have been taking. However, in the course of producing a prototype implementation, I discovered that I had to introduce a lower layer that was parameterized on the constraint type and used the notation __atomic_load( __acquire, &x ). This simplified many parts of the implementation considerably because it allowed the constraint to be passed to a lower-level function unmodified, rather than require five separate functions to be written for the same purpose, as shown in the following example:

template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v )
{
	// static_assert( __is_integral(T) );

	T r = *p;

	while( !atomic_compare_swap( Cn(), p, &r, r ^ v ) );

	return r;
}

The same constraint notation then happened to also provide a nice interface for specifying the various kinds of fences.

F. Introduction of bidirectional fences

This document includes bidirectional fences as proposed in N2153, which also gives motivating examples. One additional fence that "falls apart" from the notation is the acquire+release fence that is expressed as atomic_memory_fence( __acq_rel ) and provides the equivalent of #LoadLoad | #LoadStore | #StoreStore in Sun notation or lwsync on IBM platforms.

The atomic_compiler_fence( constraint ) function/macro provides fences that only affect compiler reorderings and have no effect on the hardware. A need for such control arises in low-level code that communicates with interrupt or signal handlers.

G. Additional requirements on the ordered constraint

The current approaches for dealing with the ordered constraint have included

Leaving its semantics not well specified;
Defining its semantics in isolation, without mentioning sequential consistency or CCCC;
Demanding that ordered should provide sequential consistency.

The current text adopts a hybrid approach. It specifies the semantics of ordered in isolation while at the same time demanding that:

A program that only uses ordered operations is sequentially consistent;
A program that only uses ordered stores and acquire loads provides CCCC.

This allows programmers to pick the level of consistency they desire. It also allows an std::atomic<T> class template to provide either SC or CCCC.

III. Proposed Text

Chapter 29, Atomic Operations Library [atomics]

This clause describes facilities that C++ programs may use to concurrently access data from multiple threads without introducing undefined behavior.

Definitions

A POD type is atomic if it meets implementation-defined size and alignment requirements. If a type is atomic, all types with the same size and equal or stricter alignment shall also be atomic.

Header <atomic> synopsis

// Constraints

typedef unspecified _Relaxed;
typedef unspecified _Acquire;
typedef unspecified _Release;
typedef unspecified _Acq_Rel;
typedef unspecified _Ordered;

#define __relaxed see below
#define __acquire see below
#define __release see below
#define __acq_rel see below
#define __ordered see below

// Operations

template< class Cn, class T > inline T atomic_load( Cn, T const * p );
template< class Cn, class T > inline T atomic_load( Cn, T const volatile * p );

template< class Cn, class T > inline void atomic_store( Cn, T * p, T v );
template< class Cn, class T > inline void atomic_store( Cn, T volatile * p, T v );

template< class Cn, class T > inline T atomic_swap( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_swap( Cn, T volatile * p, T v );

template< class Cn, class T > inline bool atomic_compare_swap( Cn, T * p, T * v, T w );
template< class Cn, class T > inline bool atomic_compare_swap( Cn, T volatile * p, T * v, T w );

template< class Cn, class T > inline T atomic_fetch_add( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_add( Cn, T volatile * p, T v );

template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * * p, ptrdiff_t v );
template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * volatile * p, ptrdiff_t v );

template< class Cn, class T > inline T atomic_fetch_and( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_and( Cn, T volatile * p, T v );

template< class Cn, class T > inline T atomic_fetch_or( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_or( Cn, T volatile * p, T v );

template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_xor( Cn, T volatile * p, T v );

template< class T > inline void atomic_increment( T * p );
template< class T > inline void atomic_increment( T volatile * p );

template< class T > inline bool atomic_decrement( T * p );
template< class T > inline bool atomic_decrement( T volatile * p );

template< class T > inline T * atomic_load_address( T * const * p );
template< class T > inline T * atomic_load_address( T * const volatile * p );

// Fences

template< class Cn > inline void atomic_memory_fence( Cn );
template< class Cn > inline void atomic_compiler_fence( Cn );

// Spinlocks

typedef unspecified atomic_spinlock_t;
#define ATOMIC_SPINLOCK_INITIALIZER unspecified

inline bool atomic_spin_trylock( atomic_spinlock_t * lock );
inline void atomic_spin_unlock( atomic_spinlock_t * lock );

The header <atomic> defines facilities for concurrent access to shared data without synchronization.

All symbols in <atomic> are allowed to be defined as macros or compiler intrinsics. The operations are shown as function templates for functional specification and presentation purposes. The implementation is allowed to not provide functions or function templates. A program that passes an explicit template parameter to an atomic operation is ill-formed. A program that tries to obtain the address of any function or function template defined in <atomic> is ill-formed.

The inclusion of <atomic> in a translation unit reserves all identifiers starting with atomic_ or ATOMIC_ to the implementation.

The symbols __relaxed, __acquire, __release, __acq_rel and __ordered are shown in the synopsis as macros for presentation purposes. The implementation is allowed to not define them as macros, subject to the requirements below.

All atomic operations and fences take a constraint as a first argument. A constraint is a value of type _Relaxed, _Acquire, _Release, _Acq_Rel or _Ordered.

A constraint of type _Relaxed places no additional requirements on the operation.

A constraint of type _Acquire ensures that the load part of the atomic operation has acquire semantics as defined in [intro.concur].

A constraint of type _Release ensures that the store part of the atomic operation has release semantics as defined in [intro.concur].

A constraint of type _Acq_Rel combines the semantics of _Acquire and _Release and is only allowed on read-modify-write atomic operations.

A constraint of type _Ordered ensures that all operations that precede the atomic operation in program order are performed before the atomic operation with respect to any other thread, and that all operations that follow the atomic operation are performed after the atomic operation with respect to any other thread.

A program that contains no data races except those introduced by atomic operations with an _Ordered constraint shall produce a sequentially consistent execution.

A program that contains no data races except those introduced by atomic operations with an _Ordered constraint or atomic_load operations with an _Acquire constraint shall produce a cache-coherent causally consistent execution.

An atomic operation on a non-volatile object is not part of the observable behavior ([intro.execution]). An implementation is allowed to transform, reorder, coalesce or eliminate atomic operations on non-volatile objects if this produces a valid execution according to [intro.execution] and [intro.concur].

All atomic operations require an atomic type as T. A program that attempts to use an atomic operation on a non-atomic type is ill-formed.

No atomic operation or fence throws an exception.

Constraints

typedef unspecified _Relaxed;
typedef unspecified _Acquire;
typedef unspecified _Release;
typedef unspecified _Acq_Rel;
typedef unspecified _Ordered;

The types _Relaxed, _Acquire, _Release, _Acq_Rel and _Ordered are unspecified distinct POD types.

#define __relaxed see below
#define __acquire see below
#define __release see below
#define __acq_rel see below
#define __ordered see below

The symbols __relaxed, __acquire, __release, __acq_rel and __ordered are values of type _Relaxed, _Acquire, _Release, _Acq_Rel and _Ordered, respectively. It is unspecified whether they are lvalues or rvalues.

[Note: two possible definitions of _Relaxed and __relaxed might be

struct _Relaxed {};
#define __relaxed _Relaxed()

and

enum _Relaxed { __relaxed };

--end note]

Operations

template< class Cn, class T > inline T atomic_load( Cn, T const * p );
template< class Cn, class T > inline T atomic_load( Cn, T const volatile * p );

Requires: Cn shall be one of _Relaxed, _Acquire or _Ordered.

Returns: *p.

template< class Cn, class T > inline void atomic_store( Cn, T * p, T v );
template< class Cn, class T > inline void atomic_store( Cn, T volatile * p, T v );

Requires: Cn shall be one of _Relaxed, _Release or _Ordered.

Effects: *p = v;

template< class Cn, class T > inline T atomic_swap( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_swap( Cn, T volatile * p, T v );

Effects: *p = v;

Returns: The old contents of *p.

template< class Cn, class T > inline bool atomic_compare_swap( Cn, T * p, T * v, T w );
template< class Cn, class T > inline bool atomic_compare_swap( Cn, T volatile * p, T * v, T w );

Effects: Compares *p with *v by using a bitwise comparison and if they match, updates *p to contain w. In either case updates *v with the old value of *p by using a bitwise copy. Performs no lvalue to rvalue conversion on *p or *v.

Returns: true if the update to *p has succeeded, false otherwise. The function is allowed to fail spuriously.

[Example:

template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v )
{
    static_assert( std::is_integral<T>::value, "This function requires an integral type" );

    T r = *p;

    while( !atomic_compare_swap( Cn(), p, &r, r ^ v ) );

    return r;
}

--end example]

template< class Cn, class T > inline T atomic_fetch_add( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_add( Cn, T volatile * p, T v );

Requires: T shall be integral.

Effects: *p += v;

Returns: The old contents of *p.

template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * * p, ptrdiff_t v );
template< class Cn, class T > inline T * atomic_fetch_add( Cn, T * volatile * p, ptrdiff_t v );

Effects: *p += v;

Returns: The old contents of *p.

template< class Cn, class T > inline T atomic_fetch_and( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_and( Cn, T volatile * p, T v );

Requires: T shall be integral.

Effects: *p &= v;

Returns: The old contents of *p.

template< class Cn, class T > inline T atomic_fetch_or( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_or( Cn, T volatile * p, T v );

Requires: T shall be integral.

Effects: *p |= v;

Returns: The old contents of *p.

template< class Cn, class T > inline T atomic_fetch_xor( Cn, T * p, T v );
template< class Cn, class T > inline T atomic_fetch_xor( Cn, T volatile * p, T v );

Requires: T shall be integral.

Effects: *p ^= v;

Returns: The old contents of *p.

template< class T > inline void atomic_increment( T * p );
template< class T > inline void atomic_increment( T volatile * p );

Requires: T shall be integral.

Effects: ++*p;

Constraint: _Relaxed.

template< class T > inline bool atomic_decrement( T * p );
template< class T > inline bool atomic_decrement( T volatile * p );

Requires: T shall be integral.

Effects: --*p;

Returns: true if the new value of *p is zero, false otherwise.

Constraint: _Acquire if the new value of *p is zero, _Release otherwise.

template< class T > inline T * atomic_load_address( T * const * p );
template< class T > inline T * atomic_load_address( T * const volatile * p );

Returns: *p.

Constraint: _Acquire only with respect to **p.

Fences

template< class Cn > inline void atomic_memory_fence( Cn );

Effects: When Cn is

_Relaxed, has no effect;
_Acquire, ensures that all subsequent operations in program order are performed after all preceding loads in program order;
_Release, ensures that all preceding operations in program order are performed after all subsequent stores in program order;
_Acq_Rel, combines the semantics of _Acquire and _Release;
_Ordered, ensures that all preceding operations in program order are performed after all subsequent operations in program order.

template< class Cn > inline void atomic_compiler_fence( Cn );

Effects: as atomic_memory_fence, but only inhibits compiler reorderings and has no effect on hardware reorderings.

Spinlocks

typedef unspecified atomic_spinlock_t;

atomic_spinlock_t is an unspecified POD type that represents a spinlock. The behavior of a program that overwrites the contents of a locked spinlock is undefined.

#define ATOMIC_SPINLOCK_INITIALIZER unspecified

ATOMIC_SPINLOCK_INITIALIZER shall be used as an initializer for atomic_spinlock_t objects with static storage duration.

inline bool atomic_spin_trylock( atomic_spinlock_t * spinlock );

Requires: *spinlock shall be an initialized object of type atomic_spinlock_t.

Effects: Attempts to lock *spinlock. The attempt shall succeed only if *spinlock is not locked.

Returns: true if *spinlock has been locked by the current thread, false otherwise.

Constraint: _Acquire.

inline void atomic_spin_unlock( atomic_spinlock_t * spinlock );

Requires: *spinlock shall be locked by the current thread.

Effects: Unlocks *spinlock.

Postconditions: *spinlock is no longer locked.

Constraint: _Release.

Additions to Chapter 20, General utilities library [utilities]

Add to the synopsis of <type_traits> the following type property trait:

template< class T > struct is_atomic;

Add to Table 39, Type Property Predicates, the following:

template< class T > struct is_atomic; T is atomic ([atomics]) T shall be a complete type.

IV. Implementability

A prototype implementation of the proposal for Microsoft Visual C++ 7.1 is available at:

http://www.pdimov.com/cpp/N2195/atomic.hpp

Thanks to Hans Boehm and Lawrence Crowl for reviewing this document.

--end