Doc. no. |
N2016=06-0086 |
Date: |
2006-04-21 |
Reply to: |
Hans Boehm <Hans.Boehm@hp.com> |
Authors: |
Hans Boehm & Nick Maclaren |
Should volatile Acquire Atomicity
and Thread Visibility Semantics?
Traditionally, the semantics of C's and C++'s volatile
keyword have been
unclear. In particular, these languages state that operations are
"evaluated strictly according to the rules of the
abstract machine" (C99, 6.7.3-6) But, at
least in the pthread context, this has generally not been interpreted to apply
to inter-thread visibility. According to David Butenhof,
"the use of volatile accomplishes nothing
but to prevent the compiler from making useful and desirable
optimizations, providing no help whatsoever in making code 'thread
safe'" (comp.programming.threads posting, July 3, 1997, according to
the Google archive).
As a result, most implementations do not insert sufficient memory fences to
guarantee that other threads, or even hardware devices, see volatile operations
in the order in which they were issued
On some platforms, some limited ordering guarantees are provided, either
because they are automatically enforced by the underlying hardware or, as on
Itanium, because different instructions are generated for volatile references.
But the specific rules are highly platform dependent. And even when they are
specified for a specific platform, they may be inconsistently implemented.
It is unclear how to use such weak guarantees in portable code, except in a few
rare instances, which we list below.
This raises the issue of whether volatile
should be given a real meaning that
provides both atomicity and inter-thread visibility, roughly along the lines of
Java volatiles.
Although we believe that abstractly this provides a substantial improvement by
giving semantics to something that currently has almost no portable semantics,
there seem to be a number of practical obstacles driven by backward
compatibility issues that lead us to at least hesitate. These are discussed
below.
Existing portable uses
There appear to be three classes of volatile
use that are actually portable.
Though such uses appear relatively rare, they would be slowed down on some
platforms if we adopted stronger semantics. These are:
- volatile
may be used to mark local variables in the same scope as a setjmp
whose value should be preserved across a longjmp. It is unclear what fraction
of such uses would be slowed down, since the atomicity and ordering constraints
have no effect if there is no way to share the local variable in question.
(It is even unclear what fraction of such uses would be slowed down by requiring all variables to be preserved across a longjmp, but that is a separate matter and is not considered here.)
- volatile may be used when variables
may be "externally modified", but the
modification in fact is triggered synchronously by the thread itself, e.g.
because the underlying memory is mapped at multiple locations.
- A volatile sigatomic_t
may be used to communicate with a signal handler
in the same thread, in a restricted manner. One could consider weakening the requirements for the
sigatomic_t case, but that seems rather counterintuitive.
The last of these is a bit more frequent but unlikely to be performance
critical.
Existing non-portable uses
We believe that volatile is routinely used,
e.g. in OS kernels, in connection
with explicit platform-dependent memory fencing instructions. Although these
uses are inherently nonportable and not strictly standard-conforming, they
represent a substantial code base that would be adversely affected by changing
the meaning of volatile.
Requiring special compiler options when building kernels is normal, but
it would be desirable to avoid ones to select between old and new semantics
for applications. It is unclear whether this is a serious problem.
Issues with adding atomicity
For volatile variables to be useful for inter-thread
communication, updates
would have to be atomic, as would reads of those variables. Lock-based
emulation is probably impractical, since it doesn't work with asynchronous
signals. But we can usually provide hardware atomicity only for a few types.
This raises the question of what to do when we cannot.
Providing a diagnostic is problematic, since it is common in existing code to
declare entire structures volatile, with no
requirement for atomic update. It
might be acceptable to provide a diagnostic only when a use, e.g. in a struct
assignment, would require unimplementable atomicity. But even then there is
probably a danger of introducing diagnostics for existing code that was, and
remains, correct.
If no diagnostic is issued when a volatile operation is not atomic, then we
introduce a large new opportunity for subtle, unreproducible errors. A further
complication is that this is really only reasonable if the language defines
which types may be used with volatile. This is difficult to do, since C++
supports some low end embedded platforms that may have difficulty providing
atomic updates of, say, a pointer, since it exceeds the natural word length.
Conclusions
If we were designing a language from scratch, we believe it would be
clearly prefereable to give volatile semantics that allow such
variables to be used for inter-thread communication. However, making
such a change now would add performance costs to some correct existing
code. And the existing body of code would make it difficult or impossible
to properly issue diagnostics for such thread-communication uses.
Since a library interface to "atomic" objects could support the same uses,
it is unclear that such a change is worth the transition cost.
If evidence is later produced that the benefits of a change would outweigh
the disadvantages, there would be no difficulty in providing
threading semantics for volatile.
The change
would have the benefit of cleaning up the semantics of a currently
underdefined construct.