Doc. No.: | WG21/N3910 |
---|---|
Revision of: | WG21/N3787 |
Date: | 2014-02-14 |
Reply to: | Hans-J. Boehm |
Phone: | +1-650-857-3406 |
Email: | Hans.Boehm@hp.com |
This is a revision of N3787. It attempts to reflect the remaining comments from CWG discussions in Chicago. It also reflects further in person discussions with Jens Maurer and Clark Nelson at the Issaquah meeting, and subsequent feedback from both SG1 and core. Some of the wording here came directly from Jens.
CWG Issue 1441 points out that in the process of relaxing the restrictions on asynchronous signal handlers to allow use of atomics, we inadvertently made it impossible to use even local variables of non-volatile, non-atomic type. As a result of an initial discussion within CWG, Jens Maurer generated a proposed resolution, which addresses that specific issue.
Pre-Bristol discussion in SG1, both in Portland and during the February 2013 SG1 teleconference, raised a number of additional issues. Both Jens' solution and all prior versions of the standard still give undefined behavior to code involving signal handlers which we believe should clearly be legal. Our goal was to correct such oversights, and allow some realistic signal handlers to be portable, while preserving a significant amount of implementation freedom with respect to what is allowable in a signal handler. In particular, we do not want to reinvent Posix' notion of async-signal-safe functions here.
This issue was revisited as part of the concurrency group (SG1) meetings in Bristol. After some initial debate, that discussion concluded with several straw polls reaffirming that we want lock-free atomics to be usable in signal handlers, that signal handlers should be able to read data written before the handler installation, and that accesses to ordinary variables in signal handlers should be OK, so long as there are happens-before relationships separating such code from mainline code accesses. Some of us remember this as the original intent behind the C++11 changes, but recollections vary.
In spite of reaffirming the intent of this paper, the changes below were not brought forward as a change to the C++14 working paper, since there was a feeling we should take more time to consider alternate approaches to the wording, and that input from WG14 would be desirable.
My current hope is that something along the lines of this proposal, but probably not precisely these words, will find its way into the draft.
We give several proposed changes and summarize the reasoning behind the change as well as some of the past discussion:
Replace 1.9p6 [intro.execution]:
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects which are neither
of typevolatile std::sig_atomic_t
norlock-free atomic objects (29.4)are unspecified during the execution of the signal handler, and the value of any object not in either of these two categories that is modified by the handler becomes undefined.
with
If a signal handler is executed as a result of a call to the
raise
function, then the execution of the handler is sequenced after the invocation of theraise
function and before its return. [Note: When a signal is received for another reason, the execution of the signal handler is usually unsequenced with respect to the rest of the program. -- end note.]
The original restriction would now be expressed elsewhere (see below) in terms of data races. This means that signal handlers can now access variables also accessed in mainline code, so long as the required happens-before orders are established.
We concluded during the February discussion that the old "interrupted by a signal" phrase referred to an asynchronous signal, and was basically OK. But after reading the C standard I'm not sure, and it makes sense to me to be more explicit. This is my latest attempt to do so.
Add after 1.10p1[intro.multithread]:
A signal handler that is executed as a result of a call to the
raise
function belongs to the same thread of execution as the call to theraise
function. Otherwise it is unspecified which thread of execution contains a signal handler invocation.
Change one sentence in 1.9p15 [intro.execution]
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, and they are not potentially concurrent (1.10, intro.thread), the behavior is undefined. [Note: The next section imposes similar, but more complex restrictions on potentially concurrent computations. -- end note]
Discussion:
This is a delicate area. Asynchronous signal handlers are unsequenced. If we said nothing else, even atomic operations in mainline code and the signal handler might introduce undefined behavior. We don't want that.
This is not an issue for regular unsequenced expressions. Consider the question of whether the following is legal:
{
atomic<int *>p = 0;
int i;
(i = 17, p = &i, 1) + (p? *p : 0);
}
After some false starts, we concluded in the February phone call that
the answer is yes, for reasons having more to do with function calls in
expressions than atomics.
The store to p
and the initial test of p
are indeterminately sequenced. If the latter occurs first, the potentially
unsequenced access to *p
doesn't occur. In the other
case, the store to i
is sequenced before the store
to p
, which is sequenced before the test on p
,
which is sequenced before the questionable load from *p
.
This again relies heavily on the fact that atomic operations are
function calls in C++. The situation in C is unfortunately different.
In spite of earlier contradictory conclusions, there are however strong reasons to treat unsequenced expressions differently from data races in signal handlers. These have to do with weaker memory orders. Consider the following example:
Mainline code:
x = 1; y.store(1, memory_order_mo1);
Signal handler:
if (y.load(memory_order_mo2)) tmp = x;
This should or should not result in undefined behavior, depending on mo1 and mo2. I don't think this is expressible without relying on happens-before.
Fortunately, I think this doesn't apply within expressions:
(x = 1, y.store(1, memory_order_relaxed), 0) + (y.load(memory_order_relaxed)? x : 1)
(all variables initially zero as usual)
must return 1. A compiler that violates this by reordering the initial
two stores and performing the y.load()
in the middle is broken.
(At least so we claim with only mild uncertainty.)
Thus the restriction on unsequenced operations should apply only to code that may not run concurrently. For code that may run concurrently (threads and signal handlers) we need the happens-before-based notion of data races that reflects memory_order specifications.
Change the normative part of 1.10p21 [intro.multithread] as follows:
Two actions are potentially concurrent if
- they are performed by different threads, or
- they are unsequenced, and at least one is performed by a signal handler.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions
in different threads, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.Two accesses to the same object of type
volatile sig_atomic_t
do not result in a data race if both occur in the same thread, even if one or more occurs in a signal handler. For each signal handler invocation, evaluations performed by the thread invoking a signal handler can be divided into two groups A and B, such that no evaluations in B happen before evaluations in A, and the evaluations of suchvolatile sig_atomic_t
objects take values as though all evaluations in A happened before the execution of the signal handler and the execution of the signal handler happened before all evaluations in B.
Discussion:
By the above reasoning, we need to give signal handlers the same data-race-based treatment as threads. Memory_order specifications must be respected in determining whether there is undefined behavior.
There was some discussion during the February 2013 phone call as to whether
we should view signal handlers as being performed by a specific thread at
all, and I think we were moving towards removing that notion.
A signal handler probably cannot portably tell which thread
it's running on. But after thinking about this more, I don't
know how to reconcile this change with atomic_signal_fence
,
so I am once again inclined to leave things more like they are.
These changes should now have the effect of allowing full atomics to be used in communicating with a signal handler. I can now allocate an object, assign it to an atomic pointer variable, and have a signal handler access the non-atomic objects through that variable, just as another thread could. Since signal handlers obey strictly more scheduling constraints than threads, I think this is entirely expected, and what we had in mind all the time.
Objects of type volatile sig_atomic_t
seem to require special
treatment. Unlike relaxed atomics, if two such assignments occur in
mainline code, a signal handler cannot see the second without seeing the
first. Especially in single-threaded code, it's important that the
signal handler appear to execute at one point, during which the mainline
thread makes no progress. Handlers in single-threaded code are not
prepared to see volatile sig_atomic_t
objects change
asynchronously while the handler is running (unless they're changed by
another signal handler). Conversely, if a handler executes
x = y; x++;
mainline code should not see the intermediate
assignment to x
.
Note that in this formulation volatile sig_atomic_t
objects
are not immune from races, e.g. if one access is from a handler and the other
is form another thread. Nonetheless a volatile sig_atomic_t
evaluation in a signal handler may properly receive its value from an assignment
in another thread, due to synchronization.
Insert in 18.10 [support.runtime] after p8:
A call to the function
signal
synchronizes with any resulting invocation of the signal handler so installed.
Discussion:
This is necessary to allow signal handlers to access data that is read-only after installation of the handler. I expect this happens all the time already.
Note that 29.8p6 already talks about synchronizes-with relationships between a thread and a signal handler in the same thread, so I don't think this is a very fundamental change in perspective.
Change 18.10 [support.runtime] p9 as follows:
The common subset of the C and C++ languages consists of all declarations, definitions, and expressions that may appear in a well formed C++ program and also in a conforming C program. A POF ("plain old function") is a function that uses only features from this common subset, and that does not directly or indirectly use any function that is not a POF, except that it may use
functions defined in Clause 29 that are not member functionsplain lock-free atomic operations. A plain lock-free atomic operation is an invocation of a function f from clause 29, such that f is not a member function, and either f is the functionatomic_is_lock_free
, or for every atomic argumentA
passed to f,atomic_is_lock_free(A)
yields true. All signal handlers shall have C linkage.A POF that could be used as a signal handler in a conforming C program does not produce undefined behavior when used as a signal handler in a C++ program.The behavior of anyotherfunction other than a POF used as a signal handler in a C++ program is implementation-defined.
(Keep footnote 229.)
Discussion:
Since we currently refer to C99 as the base document and C99 does not
support thread_local
, this somewhat accidentally prohibits
use of thread_local
in signal handlers. Discussion in Bristol
suggests this may be a good thing, since thread_local
might
be implemented with a e.g. a locked hash table, which would result in
deadlocks if access from signal handlers were allowed.
Some of the earlier phone call discussion seems to have overlooked the existing
clause 29 exemption which, for example, makes calls to
atomic_is_lock_free()
legal.
That exemption was too broad, since it allowed non-lock-free calls. All calls that acquire locks need to be prohibited in signal handlers, since they typically deadlock if the mainline thread already holds the lock.
I don't understand the meaning of a normative sentence that says "X does not have undefined behavior". We otherwise define its meaning, so why would it possibly have undefined behavior without this sentence? Hence I'm proposing to rephrase.
Jens points out that it's confusing to have this text in the library rather than core language section of the standard. Either the committee or the editor may wish to address that in the future.
(This paragraph removes the need for a 1.10p5 change I previously proposed.)