ISO/IEC JTC1 SC22 WG14 N1489 - 2010-05-29
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Introduction
Operators
    Loads and Stores
    Compound Assignment
    Ternary Compare-Exchange Operator
    Lost Non-Volatile Optimization
Type Names
    Over-Constrained Size and Alignment
    Prohibiting Internal Locks
    Qualification Inconsistency
    Nested Atomicity
    Recommendation
Summary
The C++ standards committee, with liason support from the C committee, developed a facility for atomic types and operations with a goal of making it possible to write code using atomics for inclusion and compilation in both C and C++. That is, there should be a common subset of the atomics facility that means the same is both languages. The resulting facility is embodied in the C++ Final Committee Draft N3092 and in the C Working Draft N1425. Code using the common subset has been successfully compiled and executed in both languages.
This common facility requires that
all atomic oerations be written as function calls.
In contrast,
programmers writing in pure C++ can use the natural operators.
Furthermore, the common subset defines a fixed set of types,
it is not extensible.
Pure C++ programs can obtain new atomic types
using the atomic class template.
To enable C programmers to have extensible types and use operators,
Blaine Garst has proposed a new facility for C,
with the latest description in
N1473.
This paper compares the full C++ facility to the proposed C facility, and makes some recommendations that improve the C proposal.
Both facilities provide for sequentially consistent operations when using the operator syntax.
Both facilities provide implicit lvalue-to-rvalue conversion for atomics loads and assignment operator syntax for atomic stores.
Most modern processors cannot support both a read of one location and a write to another location as a single atomic action. As a consequence, processors must implement assignment as two separate atomic actions. C++ prevents assignment from one atomic to another to prevent the illusion of full atomicity when it is not present.
atomic_int a = { 1 }, b = { 2 }; a = 3; // okay, a single atomic write a = b; // error, an implicit two-location atomic a = (int)b; // okay, clearly a two-step process
Recommendation: Make assignment from one atomic to another ill-formed.
Both facilities provide atomic compound assignment, unlike Java.
Both facilities provide conventional return values
for the compound assignment operators,
i.e. the result of the operation and not the prior values
as with the fetch_op functions.
The C++ facility provides only the compound assignment operators
'+=', '-=',
'&=', '|=', and '^=',
and only for the integer and pointer types.
The C facility provides the same set of compound operators
as the underlying types.
C++ programmers can add the additional operations in user code
by defining a non-member operator function.
float operator+=( atomic<float>& object, float value ) { float expected = object.load(memory_order_relaxed); float desired; do desired = expected + value; while ( !object.compare_exchange_weak(expected, desired) ); return desired; }
The compound operators
that are in the C proposal but not in the C++ proposal,
are under-specified.
C++ carefully defined '+=' to use
two's complement arithmetic
to prevent any undefined behavior,
redundant representations, or trap values in the operation.
In particular, this observation applies to signaling NaNs.
Without additional specification,
a floating-point '+=' would suffer from these problems.
These problems are significant because
programmers have no opportunity to check whether or not
an upcoming operation will stray into undefined behavior.
Recommendation: Either explicitly remove the extended operations or explicitly specify them to have semantics "as if" the above code.
The C proposal provides a new ternary compare-exchange operator.
The closest equivalent in C++
are the compare_exchange_* member functions.
bool done = object ?= expected : desired; bool done = object.compare_exchange_weak(expected, desired);
The return value is a boolean indicating whether or not the assignment occured. This behavior is consistent with the existing C/C++ compare-exchange functions.
The 'expected' argument is an r-value. However, the corresponding argument in the C/C++ compare-exchange functions is a pointer or a reference. The reason is two-fold. First, processors tend to pick one of three compare-exchange signatures: return only the boolean, return only the value found, or both. The C/C++ compare-exchange functions maps efficiently to all three signatures. Second, the functions write back through the 'expected' argument when the comparison fails, which sets makes 'expected' ready for recompution. The resulting loop is both simpler to write and more efficient.
Recommendation: Change the second operand of the operator to an l-value rather than an r-value.
The existing C/C++ facility provides separate functions for the weak and strong semantics of compare-exchange. The C proposal does not specify which of these semantics it provides. Of the two semantics, the weak semantics are the best choice if only one is available. (The reasons are efficiency on some platforms and robustness to padding and redundant representations.)
Recommendation: Define the operator to have weak semantics.
The proposal presents spinlocks as an example use of the new compare-exchange operator. However, the compare-exchange operation is significantly stronger than necessary for spinlocks. Furthermore, spinlocks can have really pathological behavior on multi-programmed systems.
Recommendation: Use a better example, perhaps from wait-free data structures.
The C proposal states that the __atomic qualifier
implies the volatile qualifier.
This implication is unfortunate
because it prevents a broad class of optimizations.
For example, given a non-volatile atomic object a
the operation sequence a+=1,a+=2
could be optimized to a+=3.
This optimization is possible because
it is possible that no other regular thread
will observe a state between those two operations.
Given that atomic operations may take a hundred cycles,
such optimizations could be valuable.
The C++ committee also anticipated potential future compilers that would completely remove threads of execution, and in the process turning variables that formerly needed to be atomic into simple sequential variables. Such an optimization would not be possible if all atomic variables were implicitly volatile.
Recommendation: Keep volatile and atomic as distinct and orthogonal concepts.
The C proposal constructs atomic types
by qualifying the base type
with a new __atomic type qualifier.
The C++ FCD constructs atomic types
via a class template taking the base type as an argument.
The qualifier approach has several problems.
Because qualification implies a differing interpretation over a given base type, changes in alignment and size are not possible. Changing the alignment and size is important to efficiency. For example, some IA32 ABIs require only 16-bit alignment for 32-bit integers. However, the atomic operations require 32-bit alignment. One would want to impose additional alignment constraints on the atomic versions of many types. Likewise, structures that are of odd sizes would become more efficient when extended to sizes that are a multiple of the word size. C++ avoids these problems because atomic types are separate types with separate sizes and alignments, not qualified types.
The representation constraints of the qualifier approach prevents use of locks internal to the atomic. While we expect most programmers would not be happy with such a representation, the existing C/C++ specification has been carefully crafted to enable such an implementation when unavoidable, without penalizing implementations that do not use internal locks.
Qualifiers are often added or removed when passing pointers, which is very likely to produce programs in which some operations on a memory location are atomic and some are not. Such programs may silently fail in their value computations or silently become sequentially inconsistent.
The C proposal
allows both a struct and its members
to be __atomic qualified.
as one thread could access an atomic member,
another accesses the struct as a whole.
If both are intended to be atomic,
which is not clear from the proposal,
then the implementation of the accesses
requires a substantially more expensive implementation.
one that is reminiscent of transactional memory, but weaker.
As Hans Boehm noted,
a hierarchy of lock acquisitions
is sufficient to implement the nested atomicity.
The C proposal
further allows non-atomic access
to members of __atomic qualified structs.
If such accesses are not atomic with respect to full-struct accesses,
then the specification provides the weak atomicity
of transactional memory.
If such accesses are atomic,
then the specification provides the strong atomicity
of transactional memory.
Strong atomicity is not presently supported by hardware.
C++ prevents such problems in two ways.
First,
the argument to the atomic template class
must be trivially copyable.
Atomic types are not themselves trivially copyable,
and hence are invalid as arguments.
Second,
all operations on atomic objects
are value-in/value-out.
The types and operations provide no references to any internal state.
C++ adopted the approach of distinct constructed types rather than qualified types to prevent use of both atomic and non-atomic operations on the same type, which provides for increased program safety and increased opportunity for optimization.
Given the problems with the qualification approach in constrast to the constructed type approach, I must recommend against the qualifier approach in favor of the constructed type approach.
There still remains the issue of another syntax for constructing atomic types. Compatibility and interoperability with C++ would be enhanced if that syntax resembles C++ syntax, e.g. a new syntax rule:
- type-specifier:
_Atomic <type-name>
Note that support of such a syntax does not imply the introduction of templates into the C language.
Other type construction syntax is certainly possible, at the cost of more complexity in code that must be compiled by both languages.
The C proposal N1473 provides syntactically convenient access to atomics. Much of its semantics are already consistent with C++. However, it has several problems that are a consequence of an excess of generality. Most of these problems can be solved with well-targeted restrictions, which also brings semantics in line with those of C++. However, the type qualifier approach is inconsistent with the state of the art in systems implementation. Instead, the proposal should use more direct type construction, and therefore should propose a different syntax.