Doc. no.	N2016=06-0086
Date:	2006-04-21
Reply to:	Hans Boehm <Hans.Boehm@hp.com>
Authors:	Hans Boehm & Nick Maclaren

Should `volatile` Acquire Atomicity and Thread Visibility Semantics?

Traditionally, the semantics of C's and C++'s volatile keyword have been unclear. In particular, these languages state that operations are "evaluated strictly according to the rules of the abstract machine" (C99, 6.7.3-6) But, at least in the pthread context, this has generally not been interpreted to apply to inter-thread visibility. According to David Butenhof, "the use of volatile accomplishes nothing but to prevent the compiler from making useful and desirable optimizations, providing no help whatsoever in making code 'thread safe'" (comp.programming.threads posting, July 3, 1997, according to the Google archive).

As a result, most implementations do not insert sufficient memory fences to guarantee that other threads, or even hardware devices, see volatile operations in the order in which they were issued

On some platforms, some limited ordering guarantees are provided, either because they are automatically enforced by the underlying hardware or, as on Itanium, because different instructions are generated for volatile references. But the specific rules are highly platform dependent. And even when they are specified for a specific platform, they may be inconsistently implemented.

It is unclear how to use such weak guarantees in portable code, except in a few rare instances, which we list below.

This raises the issue of whether volatile should be given a real meaning that provides both atomicity and inter-thread visibility, roughly along the lines of Java volatiles.

Although we believe that abstractly this provides a substantial improvement by giving semantics to something that currently has almost no portable semantics, there seem to be a number of practical obstacles driven by backward compatibility issues that lead us to at least hesitate. These are discussed below.

Existing portable uses

There appear to be three classes of volatile use that are actually portable. Though such uses appear relatively rare, they would be slowed down on some platforms if we adopted stronger semantics. These are:

volatile may be used to mark local variables in the same scope as a setjmp whose value should be preserved across a longjmp. It is unclear what fraction of such uses would be slowed down, since the atomicity and ordering constraints have no effect if there is no way to share the local variable in question. (It is even unclear what fraction of such uses would be slowed down by requiring all variables to be preserved across a longjmp, but that is a separate matter and is not considered here.)
volatile may be used when variables may be "externally modified", but the modification in fact is triggered synchronously by the thread itself, e.g. because the underlying memory is mapped at multiple locations.
A volatile sigatomic_t may be used to communicate with a signal handler in the same thread, in a restricted manner. One could consider weakening the requirements for the sigatomic_t case, but that seems rather counterintuitive.

The last of these is a bit more frequent but unlikely to be performance critical.

Existing non-portable uses

We believe that volatile is routinely used, e.g. in OS kernels, in connection with explicit platform-dependent memory fencing instructions. Although these uses are inherently nonportable and not strictly standard-conforming, they represent a substantial code base that would be adversely affected by changing the meaning of volatile. Requiring special compiler options when building kernels is normal, but it would be desirable to avoid ones to select between old and new semantics for applications. It is unclear whether this is a serious problem.

Issues with adding atomicity

For volatile variables to be useful for inter-thread communication, updates would have to be atomic, as would reads of those variables. Lock-based emulation is probably impractical, since it doesn't work with asynchronous signals. But we can usually provide hardware atomicity only for a few types. This raises the question of what to do when we cannot.

Providing a diagnostic is problematic, since it is common in existing code to declare entire structures volatile, with no requirement for atomic update. It might be acceptable to provide a diagnostic only when a use, e.g. in a struct assignment, would require unimplementable atomicity. But even then there is probably a danger of introducing diagnostics for existing code that was, and remains, correct.

If no diagnostic is issued when a volatile operation is not atomic, then we introduce a large new opportunity for subtle, unreproducible errors. A further complication is that this is really only reasonable if the language defines which types may be used with volatile. This is difficult to do, since C++ supports some low end embedded platforms that may have difficulty providing atomic updates of, say, a pointer, since it exceeds the natural word length.

Conclusions

If we were designing a language from scratch, we believe it would be clearly prefereable to give volatile semantics that allow such variables to be used for inter-thread communication. However, making such a change now would add performance costs to some correct existing code. And the existing body of code would make it difficult or impossible to properly issue diagnostics for such thread-communication uses.

Since a library interface to "atomic" objects could support the same uses, it is unclear that such a change is worth the transition cost. If evidence is later produced that the benefits of a change would outweigh the disadvantages, there would be no difficulty in providing threading semantics for volatile. The change would have the benefit of cleaning up the semantics of a currently underdefined construct.

Should volatile Acquire Atomicity and Thread Visibility Semantics?

Existing portable uses

Existing non-portable uses

Issues with adding atomicity

Conclusions

Should `volatile` Acquire Atomicity and Thread Visibility Semantics?