Doc. No.:	WG21/N3910
Revision of:	WG21/N3787
Date:	2014-02-14
Reply to:	Hans-J. Boehm
Phone:	+1-650-857-3406
Email:	Hans.Boehm@hp.com

N3910: What can signal handlers do? (CWG 1441)

This is a revision of N3787. It attempts to reflect the remaining comments from CWG discussions in Chicago. It also reflects further in person discussions with Jens Maurer and Clark Nelson at the Issaquah meeting, and subsequent feedback from both SG1 and core. Some of the wording here came directly from Jens.

Background

CWG Issue 1441 points out that in the process of relaxing the restrictions on asynchronous signal handlers to allow use of atomics, we inadvertently made it impossible to use even local variables of non-volatile, non-atomic type. As a result of an initial discussion within CWG, Jens Maurer generated a proposed resolution, which addresses that specific issue.

Pre-Bristol discussion in SG1, both in Portland and during the February 2013 SG1 teleconference, raised a number of additional issues. Both Jens' solution and all prior versions of the standard still give undefined behavior to code involving signal handlers which we believe should clearly be legal. Our goal was to correct such oversights, and allow some realistic signal handlers to be portable, while preserving a significant amount of implementation freedom with respect to what is allowable in a signal handler. In particular, we do not want to reinvent Posix' notion of async-signal-safe functions here.

This issue was revisited as part of the concurrency group (SG1) meetings in Bristol. After some initial debate, that discussion concluded with several straw polls reaffirming that we want lock-free atomics to be usable in signal handlers, that signal handlers should be able to read data written before the handler installation, and that accesses to ordinary variables in signal handlers should be OK, so long as there are happens-before relationships separating such code from mainline code accesses. Some of us remember this as the original intent behind the C++11 changes, but recollections vary.

In spite of reaffirming the intent of this paper, the changes below were not brought forward as a change to the C++14 working paper, since there was a feeling we should take more time to consider alternate approaches to the wording, and that input from WG14 would be desirable.

My current hope is that something along the lines of this proposal, but probably not precisely these words, will find its way into the draft.

Proposed resolution and discussion

We give several proposed changes and summarize the reasoning behind the change as well as some of the past discussion:

Replace 1.9p6, the paragraph imposing the restrictions on signal handlers

Replace 1.9p6 [intro.execution]:

~~When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects which are neither~~

~~of type volatile std::sig_atomic_t nor~~
~~lock-free atomic objects (29.4)~~
~~are unspecified during the execution of the signal handler, and the value of any object not in either of these two categories that is modified by the handler becomes undefined.~~

with

If a signal handler is executed as a result of a call to the raise function, then the execution of the handler is sequenced after the invocation of the raise function and before its return. [Note: When a signal is received for another reason, the execution of the signal handler is usually unsequenced with respect to the rest of the program. -- end note.]

The original restriction would now be expressed elsewhere (see below) in terms of data races. This means that signal handlers can now access variables also accessed in mainline code, so long as the required happens-before orders are established.

We concluded during the February discussion that the old "interrupted by a signal" phrase referred to an asynchronous signal, and was basically OK. But after reading the C standard I'm not sure, and it makes sense to me to be more explicit. This is my latest attempt to do so.

Clarify relationship between signal handlers and threads

Add after 1.10p1[intro.multithread]:

A signal handler that is executed as a result of a call to the raise function belongs to the same thread of execution as the call to the raise function. Otherwise it is unspecified which thread of execution contains a signal handler invocation.

Weaken the restriction on unsequenced operations

Change one sentence in 1.9p15 [intro.execution]

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, and they are not potentially concurrent (1.10, intro.thread), the behavior is undefined. [Note: The next section imposes similar, but more complex restrictions on potentially concurrent computations. -- end note]

Discussion:

This is a delicate area. Asynchronous signal handlers are unsequenced. If we said nothing else, even atomic operations in mainline code and the signal handler might introduce undefined behavior. We don't want that.

This is not an issue for regular unsequenced expressions. Consider the question of whether the following is legal:

{ atomic<int *>p = 0; int i; (i = 17, p = &i, 1) + (p? *p : 0); }

After some false starts, we concluded in the February phone call that the answer is yes, for reasons having more to do with function calls in expressions than atomics. The store to p and the initial test of p are indeterminately sequenced. If the latter occurs first, the potentially unsequenced access to *p doesn't occur. In the other case, the store to i is sequenced before the store to p, which is sequenced before the test on p, which is sequenced before the questionable load from *p. This again relies heavily on the fact that atomic operations are function calls in C++. The situation in C is unfortunately different.

In spite of earlier contradictory conclusions, there are however strong reasons to treat unsequenced expressions differently from data races in signal handlers. These have to do with weaker memory orders. Consider the following example:

Mainline code: x = 1; y.store(1, memory_order_mo1);

Signal handler: if (y.load(memory_order_mo2)) tmp = x;

This should or should not result in undefined behavior, depending on mo1 and mo2. I don't think this is expressible without relying on happens-before.

Fortunately, I think this doesn't apply within expressions:

(x = 1, y.store(1, memory_order_relaxed), 0) + (y.load(memory_order_relaxed)? x : 1)

(all variables initially zero as usual) must return 1. A compiler that violates this by reordering the initial two stores and performing the y.load() in the middle is broken. (At least so we claim with only mild uncertainty.)

Thus the restriction on unsequenced operations should apply only to code that may not run concurrently. For code that may run concurrently (threads and signal handlers) we need the happens-before-based notion of data races that reflects memory_order specifications.

Expand the discussion of data races to cover signal handler invocations we want to prohibit

Change the normative part of 1.10p21 [intro.multithread] as follows:

Two actions are potentially concurrent if

they are performed by different threads, or
they are unsequenced, and at least one is performed by a signal handler.

The execution of a program contains a data race if it contains two potentially concurrent conflicting actions ~~in different threads~~, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
Two accesses to the same object of type volatile sig_atomic_t do not result in a data race if both occur in the same thread, even if one or more occurs in a signal handler. For each signal handler invocation, evaluations performed by the thread invoking a signal handler can be divided into two groups A and B, such that no evaluations in B happen before evaluations in A, and the evaluations of such volatile sig_atomic_t objects take values as though all evaluations in A happened before the execution of the signal handler and the execution of the signal handler happened before all evaluations in B.

Discussion:

By the above reasoning, we need to give signal handlers the same data-race-based treatment as threads. Memory_order specifications must be respected in determining whether there is undefined behavior.

There was some discussion during the February 2013 phone call as to whether we should view signal handlers as being performed by a specific thread at all, and I think we were moving towards removing that notion. A signal handler probably cannot portably tell which thread it's running on. But after thinking about this more, I don't know how to reconcile this change with atomic_signal_fence, so I am once again inclined to leave things more like they are.

These changes should now have the effect of allowing full atomics to be used in communicating with a signal handler. I can now allocate an object, assign it to an atomic pointer variable, and have a signal handler access the non-atomic objects through that variable, just as another thread could. Since signal handlers obey strictly more scheduling constraints than threads, I think this is entirely expected, and what we had in mind all the time.

Objects of type volatile sig_atomic_t seem to require special treatment. Unlike relaxed atomics, if two such assignments occur in mainline code, a signal handler cannot see the second without seeing the first. Especially in single-threaded code, it's important that the signal handler appear to execute at one point, during which the mainline thread makes no progress. Handlers in single-threaded code are not prepared to see volatile sig_atomic_t objects change asynchronously while the handler is running (unless they're changed by another signal handler). Conversely, if a handler executes x = y; x++; mainline code should not see the intermediate assignment to x.

Note that in this formulation volatile sig_atomic_t objects are not immune from races, e.g. if one access is from a handler and the other is form another thread. Nonetheless a volatile sig_atomic_t evaluation in a signal handler may properly receive its value from an assignment in another thread, due to synchronization.

Ensure that signal handler invocation happens after signal handler installation

Insert in 18.10 [support.runtime] after p8:

A call to the function signal synchronizes with any resulting invocation of the signal handler so installed.

Discussion:

This is necessary to allow signal handlers to access data that is read-only after installation of the handler. I expect this happens all the time already.

Note that 29.8p6 already talks about synchronizes-with relationships between a thread and a signal handler in the same thread, so I don't think this is a very fundamental change in perspective.

Clarify which C-like functions can be used in a signal handler

Change 18.10 [support.runtime] p9 as follows:

The common subset of the C and C++ languages consists of all declarations, definitions, and expressions that may appear in a well formed C++ program and also in a conforming C program. A POF ("plain old function") is a function that uses only features from this common subset, and that does not directly or indirectly use any function that is not a POF, except that it may use ~~functions defined in Clause 29 that are not member functions~~ plain lock-free atomic operations. A plain lock-free atomic operation is an invocation of a function f from clause 29, such that f is not a member function, and either f is the function atomic_is_lock_free, or for every atomic argument A passed to f, atomic_is_lock_free(A) yields true. All signal handlers shall have C linkage. ~~A POF that could be used as a signal handler in a conforming C program does not produce undefined behavior when used as a signal handler in a C++ program.~~ The behavior of any ~~other~~ function other than a POF used as a signal handler in a C++ program is implementation-defined.

(Keep footnote 229.)

Discussion:

Since we currently refer to C99 as the base document and C99 does not support thread_local, this somewhat accidentally prohibits use of thread_local in signal handlers. Discussion in Bristol suggests this may be a good thing, since thread_local might be implemented with a e.g. a locked hash table, which would result in deadlocks if access from signal handlers were allowed.

Some of the earlier phone call discussion seems to have overlooked the existing clause 29 exemption which, for example, makes calls to atomic_is_lock_free() legal.

That exemption was too broad, since it allowed non-lock-free calls. All calls that acquire locks need to be prohibited in signal handlers, since they typically deadlock if the mainline thread already holds the lock.

I don't understand the meaning of a normative sentence that says "X does not have undefined behavior". We otherwise define its meaning, so why would it possibly have undefined behavior without this sentence? Hence I'm proposing to rephrase.

Jens points out that it's confusing to have this text in the library rather than core language section of the standard. Either the committee or the editor may wish to address that in the future.

(This paragraph removes the need for a 1.10p5 change I previously proposed.)