Document Number:	N2959=09-0149
Date:	2009-09-21
Author:	Anthony Williams Just Software Solutions Ltd

N2959 - Managing the lifetime of thread_local variables with contexts (Revision 1)

This paper discusses a suggestion I made on the LWG reflector and cpp-thread mailing list to address the issues raised in N2880 surrounding the lifetime of thread_local variables.

The basic idea of this proposal is that the lifetime of thread_local variables is tied to the lifetime of an instance of the new class thread_local_context. Each thread has an implicit instance of such a class constructed prior to the invocation of the thread function, and destroyed after completion of the thread function, but additional instances can be created in order to deliberately limit the lifetime of thread_local variables: when a thread_local_context object is destroyed, all the thread_local variables tied to it are also destroyed.

This is a revision of N2907 to take account of the discussions that took place in Frankfurt. The key change is the lack of support for nested contexts.

Addressing the concerns of N2880

This enables us to address several of the concerns of N2880. Firstly, if we use a mechanism other than thread::join to wait for a thread to complete its work — such as waiting for a unique_future to be ready — then N2880 correctly highlights that under the current working paper the destructors of thread_local variables will still be running after the waiting thread has resumed. By judicious use of a thread_local_context instance and block scoping, we can ensure that the thread_local variables are destroyed before the future value is set. e.g.

int find_the_answer();
void thread_func(std::promise<int> * p)
{
    int local_result;
    {
        thread_local_context context; // create a new context for thread_locals
        local_result=find_the_answer();
    } // destroy thread_local variables along with the context object
    p->set_value(local_result);
}

int main()
{
    std::promise<int> p;
    std::thread t(thread_func,&p);
    t.detach(); // we're going to wait on the future
    std::cout<<p.get_future().get()<<std::endl;
}

When the call to get() returns, we know that not only is the future value ready, but the thread_local variables on the other thread have also been destroyed.

Reusing threads

A second concern of N2880 was the potential for accumulating vast amounts of thread_local variables when reusing threads for multiple independent tasks, such as when implementing a thread pool. Under such circumstances, the thread pool implementation can wrap each task inside a scope containing a thread_local_context variable to ensure that when a task is completed its thread_local variables are destroyed in a timely fashion. e.g.

std::mutex task_mutex;
std::queue<std::function<void()>> tasks;
std::condition_variable task_cond;
bool done=false;

void worker_thread()
{
    std::unique_lock<std::mutex> lk(task_mutex);
    while(!done)
    {
        task_cond.wait(lk,[]{return !tasks.empty();});
        std::function<void()> task=tasks.front();
        tasks.pop_front();
        lk.unlock();
        {
            thread_local_context context;
            task();
        }
        lk.lock();
    }
}

With this scheme, the thread_local variables are destroyed between each task invocation when the thread_local_context object is destroyed, so if the sets of variables used by the tasks do not overlap then the problem of increasing memory usage is avoided.

Consequences for implementations

Obviously, such a class would have to be tightly integrated with the mechanism for thread_local variables used by a compiler, so that they can be destroyed at the appropriate points, and constructed again if necessary. This is a key point — for the second scenario to work, then if a thread_local_context is destroyed and a fresh one constructed then any thread_local variables used during the lifetime of a context object must be created afresh, even if they were already created and destroyed during the lifetime of a prior context object on the same thread.

This does mean that implementations are pretty much restricted to initializing thread_local variables on first use, with a mechanism that allows the destructor of thread_local_context objects to reset that "first use" flag. If the thread_local_context is implemented with compiler intrinsics then the compiler may still be able to find optimization opportunities that allow batching of initializations or less-frequent checking of the "first use" flag.

C compatibility

For this mechanism to be compatible with the use of objects with thread storage duration from C, the C compiler must register the existence of such objects in a way that can be accessed by thread_local_context objects in order that they can be restored to their initial state.

Nesting of `thread_local_context` object lifetimes

As mentioned in the introduction, constructing a thread_local_context object whilst one already exists for a given thread is not permitted. This should result in an exception at run-time when the attempt is made to construct the second object.

Notifying other threads after `thread_local` have been destroyed

One of the key issues raised by N2880 is how to ensure that thread_local variables have been destroyed in a timely fashion for detached threads. If the completion of the work on a thread can be detected through another mechanism such as a future or a flag and condition variable then it is common practise to detach the thread and rely on the other synchronization mechanism as the sole means of waiting for the thread to finish.

thread_local variables with destructors interact badly with such practise, as they will thus run after the synchronization mechanism has notified any waiting threads of the completion of the task associated with the thread. Thus the thread is continuing to execute code even though other threads are proceeding as if it has completed. Where the task associated with a thread can be wrapped in a thread_local_context, this can be used as a mechanism to ensure that the synchronization is not triggered until after the thread_local variables have been destroyed. Unfortunately, this is not possible in all circumstances.

For example if we replace int with some more complex type in the example at the beginning of this paper then the local_result will be destroyed after the call to set_value() has completed, and thus after any waiting threads have been woken.

complex_type find_the_answer();
void thread_func(std::promise<int> * p)
{
    complex_type local_result;
    {
        thread_local_context context; // create a new context for thread_locals
        local_result=find_the_answer();
    } // destroy thread_local variables along with the context object
    p->set_value(local_result); // wake waiting threads
} // destroy local_result

To this end I propose to add new overloads of promise::set_value() and promise::set_exception() which take a thread_local_context object by reference. These overloads can then be used to delay the waking of waiting thread until the context is destroyed:

complex_type find_the_answer();
void thread_func(std::promise<int> * p)
{
    thread_local_context context; // create a new context for thread_locals
    p->set_value(context,find_the_answer()); // set value, but delay wake waiting threads
} // destroy thread_local variables along with the context object
// wake threads waiting on futures associated with p.

To the same end, I also propose adding a new member function execute() to std::packaged_task with the same properties: the task is executed and the value or exception stored, but the associated future is not made ready until the context is destroyed.

void task_executor(std::packaged_task<void(int)> task,int param)
{
    thread_local_context context;
    task.execute(context,param); // execute stored task
} // destroy context and wake threads waiting on futures from task

Finally, to allow this facility to be extended to other synchronization mechanisms, I propose that thread_local_context has a member function call_on_close which registers a function to be called when the thread_local variables associated with that context have been destroyed. It is undefined behaviour for this function to access thread_local variables.

std::condition_variable cv;
std::mutex m;
complex_type the_data;
void thread_func()
{
    thread_local_context context;
    std::lock_guard<std::mutex> lk(m);
    the_data=find_the_answer();
    context.call_on_close([]{cv.notify_all();});
} // destroy context, notify cv

Interaction with the proposed `std::async` function

If this proposal is adopted, then it could be used as part of an implementation of std::async (as proposed in N2889 and N2901) to ensure that the associated future did not become ready before the thread-local variables for the asynchronous task had been destroyed.

Proposed Wording

Modification to lifetime management clauses

Modify 3.6.3 [basis.start.term] paragraph 1 as follows:

Destructors (12.4) for initialized objects with static storage duration are called as a result of returning from main and as a result of calling std::exit (18.5). Destructors for initialized objects with thread storage duration within a given thread are called as a result of returning from the initial function of that thread, as part of the destruction of a thread_local_context object and as a result of that thread calling std::exit. ..... rest unchanged

Modify 3.6.3 [basis.start.term] paragraph 2 as follows:

If a function function contains a local object of thread storage duration that has been destroyed as part of the destruction of a thread_local_context object, and the flow of control passes through the definition of the previously destroyed object then the object shall be initialized as if this is its first use. Otherwise, ifIf a function contains a local object of static or thread storage duration that has been destroyed and the function is called during the destruction of an object with static or thread storage duration, the program has undefined behavior if the flow of control passes through the definition of the previously destroyed local object. Likewise, the behavior is undefined if the function-local object is used indirectly (i.e., through a pointer) after its destruction.

Modify 3.7.2 [basic.stc.thread] paragraph 2 as follows:

An object or reference with thread storage duration shall be initialized before its first use and, if constructed, shall be destroyed on thread exit. If a thread_local_context object exists for a given thread at the first use of an object of thread storage duration in that thread then that object shall become associated with the thread_local_context object, and destroyed as part of its destruction (30.3.3.2). The first use of an object of thread storage duration on a given thread following destruction of that object as part of the destruction of a thread_local_context object shall be treated as if it was the first use of that object by that thread.

Definition of `std::thread_local_context`

Add the following declaration to the synopsis of chapter 30.3:

class thread_local_context;

Add a new section to 30.3 as follows:

30.3.3 class `thread_local_context`

namespace std {
class thread_local_context {
public:
    thread_local_context();
    thread_local_context(thread_local_context const&) = delete;
    thread_local_context& operator=(thread_local_context const&) = delete;

    template<typename FunctionType>
    void call_on_close(FunctionType func);
};
}

The class thread_local_context provides a means of managing the lifetime of objects with thread storage duration (3.7.2). The construction of an instance of thread_local_context on a given thread marks the start of a new context for objects of thread storage duration. This context persists until the thread exits or the thread_local_context object is destroyed. When the context is destroyed then all objects of thread storage duration initialized on that thread during the life of the context are destroyed in reverse order of their initialization (6.7).

For an object of thread storage duration that was destroyed as part of the destruction of a thread_local_context object, the first use following the destruction is treated as the first use of that object, and the object is initialized again.

[Example:

int foo()
{
    static thread_local x=42;
    return ++x;
}

void bar()
{
    thread_local_context ctx;
    for(unsigned i=0;i<3;++i)
    {
        std::cout<<foo()<<std::endl;
    }
}

int main()
{
    bar(); // will output 43 44 45
    bar(); // will also output 43 44 45
}

&mdash end example]

Only one thread_local_context object may exist on a given thread at any one time. Any attempt to create a second such object will fail.

[Example:

void inner()
{
    thread_local_context ctx;
}

void outer()
{
    thread_local_context ctx;
    inner();
}

int main()
{
    inner(); // OK
    outer(); // construction of thread_local_context in inner() will fail
}

&mdash end example]

30.3.3.1 thread_local_context constructor

thread_local_context();

Effects:: Create a new context for thread_local variables.
Throws:: std::system_error if an error occurs.
Error Conditions:: operation_not_permitted: There is already a thread_local_context object for this thread.

30.3.3.2 thread_local_context destructor

~thread_local_context();

Effects:: Destroys the context for thread_local variables. All objects with thread storage duration (3.7.2) constructed on this thread after the construction of the thread_local_context object are destroyed in reverse order of construction (see 3.6.3), and restored to their initial state. Once all such objects have been destroyed, any functions registered with the context by calling call_on_close() are invoked in reverse order.
Throws:: Nothing.

30.3.3.3 thread_local_context members

template<typename FunctionType>
void call_on_close(FunctionType func);

Effects:: Register a copy of func to be called when *this is destroyed.
Throws:: std::bad_alloc if any required storage cannot be allocated. Any exceptions thrown by the copy constructor of func.
Requirements:: Invocation of the stored copy of func shall not exit via an exception, nor shall it access any objects of thread storage duration.

Modifications to `std::promise` and `std::packaged_task`

Add the following to the class definition of std::promise in section 30.6.4 [futures.promise]:

void set_value(thread_local_context & context,const R& r);
void set_value(thread_local_context & context,see below);
void set_exception(thread_local_context & context,exception_ptr p);

Add the following to the end of section 30.6.4 [futures.promise]:

void set_value(thread_local_context & context,const R& r);
void promise::set_value(thread_local_context & context,R&& r);
void promise<R&>::set_value(thread_local_context & context,R& r);
void promise<void>::set_value(thread_local_context & context);

Effects:: Stores r in the associated state. Updates context to set that state to ready when context is destroyed, as if by registering an appropriate function with context.call_on_close().
Throws:: future_error if its associated state already has a stored value or exception.
Error conditions:: promise_already_satisfied if its associated state already has a stored value or exception.

void set_exception(thread_local_context & context,exception_ptr p);

Effects:: Stores p in the associated state. Updates context to set that state to ready when context is destroyed, as if by registering an appropriate function with context.call_on_close().
Throws:: future_error if its associated state already has a stored value or exception.
Error conditions:: promise_already_satisfied if its associated state already has a stored value or exception.

Added the following member function to the class definition for std::packaged_task in 30.6.7 [futures.task]:

void execute(thread_local_context const& context,ArgTypes...);

Add the following to 30.6.7 [futures.task] following paragraph 17:

void execute(thread_local_context const&,ArgTypes... args);

Effects:: INVOKE (f, t1, t2, ..., tN, R), where f is the associated task of *this and t1, t2, ..., tN are the values in args.... If the task returns normally, the return value is stored as the asynchronous result associated with *this, otherwise the exception thrown by the task is stored. context is updated to ensure that any threads blocked waiting for the asynchronous result associated with the task are unblocked when context is destroyed, as-if by passing an appropriate function to context.call_on_close().
Throws:: std::bad_function_call if the task has already been invoked.

Acknowledgements

Thanks to Alberto Ganesh Barbati, Peter Dimov, Lawrence Crowl, Beman Dawes and others who have commented on earlier versions of this proposal on the mailing lists and via personal email.