This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of Open status.
Section: 33.5.4 [atomics.order] Status: Open Submitter: Brian Demsky Opened: 2013-06-17 Last modified: 2016-01-28
Priority: 4
View other active issues in [atomics.order].
View all other issues in [atomics.order].
View all issues with Open status.
Discussion:
I believe that the following variation on IRIW should admit executions in which c1 = d1 = 5 and c2 = d2 = 0. If this is allowed, then what is sequence of program evaluations for 33.5.4 [atomics.order] p9 that justifies the store to z? It seems that 33.5.4 [atomics.order] p9 should not allow this execution because one of the stores to x or y has to appear earlier in the sequence, each of the fetch_adds reads the previous load in the thread (and thus must appear later in the sequence), and 33.5.4 [atomics.order] p9 states that each load must read from the last prior assignment in the sequence.
atomic_int x; atomic_int y; atomic_int z; int c1, c2, d1, d2; static void a(void* obj) { atomic_store_explicit(&x, 5, memory_order_relaxed); } static void b(void* obj) { atomic_store_explicit(&y, 5, memory_order_relaxed); } static void c(void* obj) { c1 = atomic_load_explicit(&x, memory_order_relaxed); // this could also be an atomic load if the address depends on c1: c2 = atomic_fetch_add_explicit(&y, c1, memory_order_relaxed); } static void d(void* obj) { d1 = atomic_load_explicit(&y, memory_order_relaxed); d2 = atomic_fetch_add_explicit(&x, d1, memory_order_relaxed); } int user_main(int argc, char** argv) { thrd_t t1, t2, t3, t4; atomic_init(&x, 0); atomic_init(&y, 0); printf("Main thread: creating 4 threads\n"); thrd_create(&t1, (thrd_start_t)&a, NULL); thrd_create(&t2, (thrd_start_t)&b, NULL); thrd_create(&t3, (thrd_start_t)&c, NULL); thrd_create(&t4, (thrd_start_t)&d, NULL); thrd_join(t1); thrd_join(t2); thrd_join(t3); thrd_join(t4); printf("c1=%d c2=%d\n",c1,c2); printf("d1=%d d2=%d\n",d1,d2); // Can this store write 1000 (i.e., c1=d1=5, c2=d2=0)? atomic_store(&z, (c1+d1)*100+c2+d2); printf("Main thread is finished\n"); return 0; }
It seems that the easiest fix is to allow a load in 33.5.4 [atomics.order] p9 to read from any prior store in the evaluation order.
That said, I would personally advocate the following: It seems to me that C/C++ atomics are in a bit of different situation than Java because:People are expected to use relaxed C++ atomics in potentially racy situations, so it isn't clear that semantics as complicated as the JMM's causality would be sane.
People who use C/C++ atomics are likely to be experts and use them in a very controlled fashion. I would be really surprised if compilers would find any real wins by optimizing the use of atomics.
Why not do something like:
There is satisfaction DAG of all program evaluations. Each evaluation observes the values of variables as computed by some prior assignment in the DAG. There is an edge x->y between two evaluations x and y if:the evaluation y observes a value computed by the evaluation x or
the evaluation y is an atomic store, the evaluation x is an atomic load, and there is a condition branch c that may depend (intrathread dependence) on x and x-sb->c and c-sb->y.
This seems to allow reordering of relaxed atomics that processors do without extra fence instructions, allows most reorderings by the compiler, and gets rid of satisfaction cycles.
[2015-02 Cologne]
Handed over to SG1.
[2015-05 Lenexa, SG1 response]
This was partially addressed (weasel-worded) in C++14 (See N3786). The remainder is an open research problem. N3710 outlines a "solution" that doesn't have a consensus behind it because it costs performance. We have no better solution at the moment.
Proposed resolution: