This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of Open status.
Section: 33.5.4 [atomics.order] Status: Open Submitter: Brian Demsky Opened: 2013-06-17 Last modified: 2016-01-28
Priority: 4
View other active issues in [atomics.order].
View all other issues in [atomics.order].
View all issues with Open status.
Discussion:
I believe that the following variation on IRIW should admit executions in which c1 = d1 = 5 and c2 = d2 = 0. If this is allowed, then what is sequence of program evaluations for 33.5.4 [atomics.order] p9 that justifies the store to z? It seems that 33.5.4 [atomics.order] p9 should not allow this execution because one of the stores to x or y has to appear earlier in the sequence, each of the fetch_adds reads the previous load in the thread (and thus must appear later in the sequence), and 33.5.4 [atomics.order] p9 states that each load must read from the last prior assignment in the sequence.
atomic_int x;
atomic_int y;
atomic_int z;
int c1, c2, d1, d2;
static void a(void* obj)
{
  atomic_store_explicit(&x, 5, memory_order_relaxed); 
}
static void b(void* obj)
{
  atomic_store_explicit(&y, 5, memory_order_relaxed); 
}
static void c(void* obj)
{
  c1 = atomic_load_explicit(&x, memory_order_relaxed);
  // this could also be an atomic load if the address depends on c1:
  c2 = atomic_fetch_add_explicit(&y, c1, memory_order_relaxed);  
}
static void d(void* obj)
{
  d1 = atomic_load_explicit(&y, memory_order_relaxed);
  d2 = atomic_fetch_add_explicit(&x, d1, memory_order_relaxed); 
}
int user_main(int argc, char** argv)
{
  thrd_t t1, t2, t3, t4;
  atomic_init(&x, 0);
  atomic_init(&y, 0);
  printf("Main thread: creating 4 threads\n");
  thrd_create(&t1, (thrd_start_t)&a, NULL);
  thrd_create(&t2, (thrd_start_t)&b, NULL);
  thrd_create(&t3, (thrd_start_t)&c, NULL);
  thrd_create(&t4, (thrd_start_t)&d, NULL);
  thrd_join(t1);
  thrd_join(t2);
  thrd_join(t3);
  thrd_join(t4);
  printf("c1=%d c2=%d\n",c1,c2);
  printf("d1=%d d2=%d\n",d1,d2);
  // Can this store write 1000 (i.e., c1=d1=5, c2=d2=0)?
  atomic_store(&z, (c1+d1)*100+c2+d2);
  printf("Main thread is finished\n");
  return 0;
}
It seems that the easiest fix is to allow a load in 33.5.4 [atomics.order] p9 to read from any prior store in the evaluation order.
That said, I would personally advocate the following: It seems to me that C/C++ atomics are in a bit of different situation than Java because:People are expected to use relaxed C++ atomics in potentially racy situations, so it isn't clear that semantics as complicated as the JMM's causality would be sane.
People who use C/C++ atomics are likely to be experts and use them in a very controlled fashion. I would be really surprised if compilers would find any real wins by optimizing the use of atomics.
Why not do something like:
There is satisfaction DAG of all program evaluations. Each evaluation observes the values of variables as computed by some prior assignment in the DAG. There is an edge x->y between two evaluations x and y if:the evaluation y observes a value computed by the evaluation x or
the evaluation y is an atomic store, the evaluation x is an atomic load, and there is a condition branch c that may depend (intrathread dependence) on x and x-sb->c and c-sb->y.
This seems to allow reordering of relaxed atomics that processors do without extra fence instructions, allows most reorderings by the compiler, and gets rid of satisfaction cycles.
[2015-02 Cologne]
Handed over to SG1.
[2015-05 Lenexa, SG1 response]
This was partially addressed (weasel-worded) in C++14 (See N3786). The remainder is an open research problem. N3710 outlines a "solution" that doesn't have a consensus behind it because it costs performance. We have no better solution at the moment.
Proposed resolution: