ISO/IEC JTC1 SC22 WG21 DRAFT N2955 = 09-0145 - 2009-09-28
Mark Batty, mjb220@cl.cam.ac.uk
Motivation for Suggestions
Mixed Accesses to Memory Locations
Release Sequence and Visible Sequence of Side Effects
The Conditions for Sequential Consistency
Reading the Initial State
Restriction on Compiler Hoisting
Collected Suggestions for Change to the Standard
1.10p2
1.10p4
1.10p6
1.10p12
1.10p13
1.10p14
29.3p9
References
The atomic memory operations in section 29 [Atomics] are the programmer's interface to the explicitly synchronized memory operations. They are presented as a set of types with functions for operating on them. It does not seem sufficiently clear what provision there is for non-atomic access to objects of this type.
At first glance the Standard seems to allow non-atomic accesses at atomic locations. In 1.10 [intro.multithread] paragraph 14 this is implied from the definition of a data race (note that memory operations in conflict act on the same location):
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens-before the other. Any such data race results in undefined behavior.
How then would a programmer write down a non-atomic access to the atomic location? There are no operations built into the C++ library functions on the atomic types for non-atomic access. Non-atomic access will have to take place on the object through an lvalue of some other type. 3.10 [basic.lval] paragraph 15 lists the types of lvalue that would not result in undefined behavior:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined
- the dynamic type of the object,
- a cv-qualified version of the dynamic type of the object,
- a type similar (as defined in 4.4) to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- a char or unsigned char type.
All of these types are slight variations of the original type and are still atomic, except for the char and unsigned char types. This suggests that any non-atomic access to an atomic location would have to be through a char or unsigned char lvalue. There is evidence in the standard to suggest that using such an lvalue to modify an atomic object would result in undefined behavior, but it is not definitive.
Each atomic type in section 29 has a deleted copy constructor,
therefore has no trivial copy constructor, and ought not to be
trivially copyable. This is not certain however from the definition in
9p6 [class] which defines a trivially copyable class, but does not
explicitly require such a class to have a trivial copy
constructor. Section 3.9 [basic.types] leaves memcpy
undefined over non-trivially copyable types. If memcpy
is
undefined over the atomics, this suggests that other non-atomic memory
accesses might not work either.
This argument is not very strong, but if it does indeed hold, then the following note, to be appended to 1.10p4, will clarify that mixed accesses are not allowed:
( Note: Atomic and locking objects are not trivially copyable [3.9], so executions that access atomic or locking objects as non-atomic objects, through an array of character type, for example, shall have undefined behaviour - end note )
The possibility of mixed accesses should be removed from the definition of a data race in 1.10p14:
The execution of a program contains a data race if it contains two non-atomic conflicting actions in different threads,
at least one of which is not atomic,and neither happens-before the other. Any such data race results in undefined behavior.
The standard presents several definitions with the phrase "maximal contiguous sub-sequence of side effects in the modification order of M". The usages of this phrase seem unclear, in the definitions of a visible sequence of side effects and a release sequence in 1.10p13 and 1.10p6 respectively:
The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens-before it. [our emphasis]
A release sequence on an atomic object M is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is a release, and every subsequent operation is performed by the same thread that performed the release, or is an atomic read-modify-write operation. [our emphasis]
Early in the process of formalization, the word 'maximal' was understood to be specifying that the first operation in the sequence is maximal in the modification order. This was an error in interpretation rather than in the prose of the standard, but it might improve the clarity of the paragraphs to change the word 'maximal' for the words 'maximal-length'. So respectively 1.10p13 and 1.10p6 become:
The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal-length contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens-before it.
A release sequence on an atomic object M is a maximal-length contiguous sub-sequence of side effects in the modification order of M, where the first operation is a release, and every subsequent operation is performed by the same thread that performed the release, or is an atomic read-modify-write operation.
The standard alludes to supporting a simplified sequentially consistent model of program execution although it does not explicitly specify it. It would be valuable for programmers to be guaranteed sequential consistency with minimal well-specified conditions.
The standard does promise that programmers can rely on a sequentially consistent execution of memory operations, but only if they do not use atomic operations at all. The note that makes this claim is in 1.10p14:
It can be shown that programs that correctly use simple locks to prevent all data races, and use no other synchronization operations, behave as though the executions of their constituent threads were simply interleaved, with each observed value of an object being the last value assigned in that interleaving. This is normally referred to as "sequential consistency". However, this applies only to race-free programs, and race-free programs cannot observe most program transformations that do not change single-threaded program semantics. [...]
In 1.10p12 the standard alludes to requiring sequential consistency with slightly weaker conditions, describing the correspondence of data races in restricted executions to data races in a sequentially consistent execution.
This states that operations on ordinary variables are not visibly reordered. This is not actually detectable without data races, but it is necessary to ensure that data races, as defined here, and with suitable restrictions on the use of atomics, correspond to data races in a simple interleaved (sequentially consistent) execution.
Given the models described in [BoehmAdve08] and [BattySewell09] correspond to data race free executions in the model of the standard over a restricted set of the atomic operations, it seems that the language regarding the conditions for sequential consistency could be made stronger.
The suggestion for change is that the note in 1.10p14 be changed to:
It can be shown that programs that correctly use simple locks and atomic
memory_order_seq_cst
operations to prevent all data races, and use no other synchronization operations, behave as though the executions of their constituent threads were simply interleaved [...]
There is only one paragraph discussing reads from the initial state in the standard. In 1.10p2 it is mentioned that reads might read from the initial state.
The value of an object visible to a thread T at a particular point might be the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.
Given the weakness of the memory model it does not seem enough to leave the reader to come up with a common-sense semantics for reads from the initial state. The standard should provide explicit conditions for such reads, for the non atomic memory objects, 1.10p12 states:
The value of a non-atomic scalar object M as determined by evaluation B, shall be the value stored by the visible side effect A.
Which could be changed to:
The value of a non-atomic scalar object M as determined by evaluation B, shall be the value stored by the visible side effect A, if one exists, or the initial value of the object otherwise.
Similarly, 1.10p13 describes the values determined for evaluations on atomic objects:
The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens before it.
The value of an atomic object M, as determined by evaluation B, shall be the value stored by some operation in the visible sequence of M with respect to B.
In the atomic case, if there is no visible side effect with respect to an evaluation, then we assume that the evaluation could take the value of any side effect that it does not happen before or the initial value of the object. Altering 1.10p13 to incorporate this gives:
The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens before it.
If there are no visible side effects with respect to
B and there are some side effects thatB does not happen before, thenB has an initial side effect sequence, a maximal-length contiguous sub-sequence of side effects in the modification order ofM where for each side effect, it is not the case thatB happens before it.The value of an atomic object M, as determined by evaluation B, shall be the value stored by some operation in the visible sequence of M with respect to B, or if no such sequence exists, the initial value or any value from the initial side effect sequence.
Returning now to 1.10p2, the word 'might' suggests that there may be some ambiguity over what values are read. The intended meaning is not that the value of an object might be the initial value at any arbitrary point, nor is the intention that the value might not be the initial value at the initial point. Given the edits of 10.1p12 and 10.1p13 above, 10.1p2 could be changed slightly to remove the word 'might':
The value of an object visible to a thread T at a particular point
might beis the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.
Consider paragraph 29.3p9 [atomics.order]:
Implementations shall not move an atomic operation out of an unbounded loop.
The phrase "moving an atomic operation" may be meaningful in the context of a specific implementation, but the presentation of the rest of the model has been more general. The word "moved" seems to refer to memory operations being moved by a compiler (or some abstract interpreter), whereas the style of the rest of the standard is more axiomatic: given some particular candidate execution, the memory model determines whether it is valid or not.
Even leaving that aside, paragraph 29.3p9 mentions an unbounded loop. This term is not used anywhere else in the standard. Given the context it seems to be referring to iteration statements that are not assumed to terminate by 6.5p5 [stmt.iter]. Any iteration statement containing an atomic operation is not assumed to terminate by paragraph 6.5p5, so 29.3p9 is redundant.
There seems to be a serious question as to just what is intended here. If the clause is meant to prevent compilers from hoisting atomic memory actions out of the body of a loop, then this restriction is overly strong. If a given transformation would produce observably different behavior, then 1.9p1 [intro.execution] dictates that the transformation would not be valid:
The semantic descriptions in this International Standard define a parametrized non-deterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
The suggestion for change is to remove the second sentence in 29.3p9 entirely.
This section collects all of the suggestions for change to the
standard described in this document. The relevant paragraphs label is
given then the whole paragraph is reproduced with any added text in
bold and any removed text struck through.
The value of an object visible to a thread might be is the initial value of the object, a value
assigned to the object by
The library defines a number of atomic operations (paragraph 29) and
operations on locks (paragraph 30) that are specially identified as
synchronization operations. These operations play a special role in
making assignments in one thread visible to another. A synchronization
operation on one or more memory locations is either a consume
operation, an acquire operation, a release operation, or both an
acquire and release operation. A synchronization operation without an
associated memory location is a fence and can be either an acquire
fence, a release fence, or both an acquire and release fence. In
addition, there are relaxed atomic operations, which are not
synchronization operations, and atomic read-modify-write operations,
which have special characteristics. ( Note: For example, a call that
acquires a lock will perform an acquire operation on the locations
comprising the lock. Correspondingly, a call that releases the same
lock will perform a release operation on those same
locations. Informally, performing a release operation on
A release sequence on an atomic object
A visible side effect
The value of a non-atomic scalar object
The visible sequence of side effects on an atomic object
If there are no visible side effects with respect
to
memory_order_seq_cst
operations to prevent all
data races, and use no other synchronization operations, behave as
though the executions of their constituent threads were simply
interleaved, with each observed value of an object being the last
value assigned in that interleaving. This is normally referred to as
"sequential consistency". However, this applies only to race-free
programs, and race-free programs cannot observe most program
transformations that do not change single-threaded program
semantics. In fact, most single-threaded program transformations
continue to be allowed, since any program that behaves differently as
a result must perform an undefined operation. - end note )