Document Number: | N3038=10-0028 |
Date: | 2010-01-30 |
Author: | Anthony
Williams Just Software Solutions Ltd |
This is a revision of my earlier paper N2959 based on feedback from the LWG at the Santa Cruz WG21 meeting in October 2009. Following this meeting I was asked to produce a revised paper allowing for nested contexts.
The basic idea of this proposal is that the lifetime
of thread_local
variables is tied to the lifetime of an
instance of the new class thread_local_context
. Each
thread has an implicit instance of such a class constructed prior to
the invocation of the thread function, and destroyed after
completion of the thread function, but additional instances can be
created in order to deliberately limit the lifetime
of thread_local
variables: when
a thread_local_context
object is destroyed, all
the thread_local
variables tied to it are also
destroyed.
This enables us to address several of the concerns of
N2880. Firstly, if we use a mechanism other than thread::join
to wait for a thread to complete its work — such as waiting for a
unique_future
to be ready — then N2880 correctly
highlights that under the current working paper the destructors
of thread_local
variables will still be running after
the waiting thread has resumed. By judicious use
of a thread_local_context
instance and block scoping,
we can ensure that the thread_local
variables are
destroyed before the future value is set. e.g.
int find_the_answer(); void thread_func(std::promise<int> * p) { int local_result; { thread_local_context context; // create a new context for thread_locals local_result=find_the_answer(); } // destroy thread_local variables along with the context object p->set_value(local_result); } int main() { std::promise<int> p; std::thread t(thread_func,&p); t.detach(); // we're going to wait on the future std::cout<<p.get_future().get()<<std::endl; }
When the call to get()
returns, we know that not only
is the future value ready, but the thread_local
variables on the other thread have also been destroyed.
A second concern of N2880 was the potential for accumulating vast
amounts of thread_local
variables when reusing threads
for multiple independent tasks, such as when implementing a thread
pool. Under such circumstances, the thread pool implementation can
wrap each task inside a scope containing a
thread_local_context
variable to ensure that when a
task is completed its thread_local
variables are
destroyed in a timely fashion. e.g.
std::mutex task_mutex; std::queue<std::function<void()>> tasks; std::condition_variable task_cond; bool done=false; void worker_thread() { std::unique_lock<std::mutex> lk(task_mutex); while(!done) { task_cond.wait(lk,[]{return !tasks.empty();}); std::function<void()> task=tasks.front(); tasks.pop_front(); lk.unlock(); { thread_local_context context; task(); } lk.lock(); } }
With this scheme, the thread_local
variables are
destroyed between each task invocation when
the thread_local_context
object is destroyed, so if the
sets of variables used by the tasks do not overlap then the problem
of increasing memory usage is avoided.
Obviously, such a class would have to be tightly integrated with
the mechanism for thread_local
variables used by a
compiler, so that they can be destroyed at the appropriate points,
and constructed again if necessary. This is a key point — for
the second scenario to work, then if
a thread_local_context
is destroyed and a fresh one
constructed then any thread_local
variables used during
the lifetime of a context object must be created afresh, even if
they were already created and destroyed during the lifetime of a
prior context object on the same thread.
This does mean that implementations are pretty much restricted to
initializing thread_local
variables on first use, with
a mechanism that allows the destructor
of thread_local_context
objects to reset that "first
use" flag. If the thread_local_context
is implemented
with compiler intrinsics then the compiler may still be able to
find optimization opportunities that allow batching of
initializations or less-frequent checking of the "first use"
flag.
For this mechanism to be compatible with the use of objects with
thread storage duration from C, the C compiler must register the
existence of such objects in a way that can be accessed
by thread_local_context
objects in order that they can
be restored to their initial state.
thread_local_context
object lifetimesAs mentioned in the introduction, constructing
a thread_local_context
object whilst one already exists
for a given thread is now permitted. This ensures
that it is always safe to create a
new thread_local_context
object inside a library
function without having to place restrictions on the use
of thread_local_context
objects by the calling
code.
When a nested instance of thread_local_context
is
created then the existing thread_local
variables are
untouched. However, they are no longer accessible unless pointers or
references have been stored elsewhere — the names
of thread_local
variables now refer to the objects
within the new context, which are thus freshly initialized on first
use within the lifetime of the nested context. Pointers or references
to the existing thread_local
variables remain valid,
and continue to refer to the objects from the outer context.
A nested context must be destroyed before its parent context. Destroying a context whilst nested contexts are still alive yields undefined behaviour.
thread_local
have been
destroyedOne of the key issues raised by N2880 is how to ensure
that thread_local
variables have been destroyed in a
timely fashion for detached threads. If the completion of the work
on a thread can be detected through another mechanism such as a
future or a flag and condition variable then it is common practise
to detach the thread and rely on the other synchronization mechanism
as the sole means of waiting for the thread to finish.
thread_local
variables with destructors interact badly
with such practise, as they will thus run after the
synchronization mechanism has notified any waiting threads of the
completion of the task associated with the thread. Thus the thread
is continuing to execute code even though other threads are
proceeding as if it has completed. Where the task associated with a
thread can be wrapped in a thread_local_context
, this
can be used as a mechanism to ensure that the synchronization is not
triggered until after the thread_local
variables have
been destroyed. Unfortunately, this is not possible in all
circumstances.
For example if we replace int
with some more complex
type in the example at the beginning of this paper then
the local_result
will be destroyed after the call
to set_value()
has completed, and thus after any
waiting threads have been woken.
complex_type find_the_answer(); void thread_func(std::promise<int> * p) { complex_type local_result; { thread_local_context context; // create a new context for thread_locals local_result=find_the_answer(); } // destroy thread_local variables along with the context object p->set_value(local_result); // wake waiting threads } // destroy local_result
To this end I propose to add new overloads
of promise::set_value()
and promise::set_exception()
which take
a thread_local_context
object by reference. These
overloads can then be used to delay the waking of waiting thread
until the context is destroyed:
complex_type find_the_answer(); void thread_func(std::promise<int> * p) { thread_local_context context; // create a new context for thread_locals p->set_value(context,find_the_answer()); // set value, but delay wake waiting threads } // destroy thread_local variables along with the context object // wake threads waiting on futures associated with p.
To the same end, I also propose adding a new member
function execute()
to std::packaged_task
with the same properties: the task is executed and the value or
exception stored, but the associated future is not made ready until
the context is destroyed.
void task_executor(std::packaged_task<void(int)> task,int param) { thread_local_context context; task.execute(context,param); // execute stored task } // destroy context and wake threads waiting on futures from task
Finally, to allow this facility to be extended to other
synchronization mechanisms, I propose
that thread_local_context
has a member
function call_on_close
which registers a function to be
called when the thread_local
variables associated with
that context have been destroyed. It is undefined behaviour for this
function to access thread_local
variables.
std::condition_variable cv; std::mutex m; complex_type the_data; void thread_func() { thread_local_context context; std::lock_guard<std::mutex> lk(m); the_data=find_the_answer(); context.call_on_close([]{cv.notify_all();}); } // destroy context, notify cv
std::async
If this proposal is adopted, then it could be used as part of an
implementation of std::async
to ensure that the
associated future did not become ready before the thread-local
variables for the asynchronous task had been destroyed. This would
allow a single thread to be reused for multiple asynchronous
tasks.
Modify 3.6.3 [basis.start.term] paragraph 1 as follows:
Destructors (12.4) for initialized objects with static storage
duration are called as a result of returning from main and as a result
of calling std::exit (18.5). Destructors for initialized objects with
thread storage duration within a given thread are called as a result
of returning from the initial function of that thread, as part of
the destruction of a thread_local_context
object
and as a result of that thread calling std::exit. ..... rest
unchanged
Modify 3.6.3 [basis.start.term] paragraph 2 as follows:
If a function contains a local object of static or thread storage duration that has been destroyed and the function is called during the destruction of an object with static or thread storage duration, the program has undefined behavior if the flow of control passes through the definition of the previously destroyed local object. Likewise, the behavior is undefined if the function-local object is used indirectly (i.e., through a pointer) after its destruction. [Note: If an object with thread storage duration was destroyed as part of the destructor of athread_local_context
object then its state is restored to that prior to the construction of thethread_local_context
, and subsequent use does not trigger undefined behaviour unless it would do so in the absence of thethread_local_context
object. -- end note]
Modify 3.7.2 [basic.stc.thread] paragraph 2 as follows:
An object or reference with thread storage duration shall be initialized before its first use and, if constructed, shall be destroyed on thread exit. The first use of an object or reference of thread storage duration on a given thread following the construction of athread_local_context
object for that thread shall be treated as the first use of that object on that thread, and that object shall become associated with thethread_local_context
object, and destroyed as part of its destruction (30.3.3.2). The state of an object of thread storage duration on a given thread following destruction of that object as part of the destruction of athread_local_context
object shall be restored to the state of that object that existed prior to the construction of thethread_local_context
.
std::thread_local_context
Add the following declaration to the synopsis of chapter 30.3:
class thread_local_context;
Add a new section to 30.3 as follows:
thread_local_context
namespace std { class thread_local_context { public: thread_local_context(); thread_local_context(thread_local_context const&) = delete; thread_local_context& operator=(thread_local_context const&) = delete; template<typename FunctionType> void call_on_close(FunctionType func); }; }
The class thread_local_context
provides a means of
managing the lifetime of objects with thread storage duration
(3.7.2). The construction of an instance
of thread_local_context
on a given thread marks the
start of a new context for objects of thread storage duration. This
context persists until the thread exits or
the thread_local_context
object is destroyed. When the
context is destroyed then all objects of thread storage duration
initialized on that thread during the life of the context are
destroyed in reverse order of their initialization (6.7).
The existing state of all objects with thread storage duration for
a thread are set aside when a thread_local_context
object is constructed. The first use of an object with thread
storage duration after the construction of
a thread_local_context
on the same thread is treated
as-if it were the first use of that object on that thread. Pointers
and references to such objects remain valid, and continue to point
at the existing objects.
After the completion of the thread_local_context
destructor, the state of all objects of thread storage duration is
returned to that prior to the construction of
the thread_local_context
object.
[Example:
int foo()
{
static thread_local x=42;
return ++x;
}
void bar()
{
thread_local_context ctx;
for(unsigned i=0;i<3;++i)
{
std::cout<<foo()<<std::endl;
}
}
int main()
{
bar(); // will output 43 44 45
bar(); // will also output 43 44 45
}
&mdash end example]
Multiple thread_local_context
objects may be
constructed on a single thread. The construction of each object
creates a new context. The lifetime of such instances must be
strictly nested: if two objects of
type thread_local_context
a
and b
are constructed on the same thread such
that a
is constructed before b
then b
must be destroyed before a
,
otherwise the behavior of the program is undefined.
[Example:
#include <iostream>
thread_local std::string s="hello";
void inner(std::string* ps)
{
thread_local_context ctx;
std::cout<<"inner s="<<s<<std::endl;
std::cout<<"*ps="<<*ps<<std::endl;
*ps="changed";
std::cout<<"inner s="<<s<<std::endl;
}
void outer()
{
thread_local_context ctx;
s="outer";
std::cout<<"outer s="<<s<<std::endl;
inner(&s);
std::cout<<"outer s="<<s<<std::endl;
}
int main()
{
outer(); // OK
}
This program will output
outer s=outer
inner s=hello
*ps=outer
inner s=hello
outer s=changed
&mdash end example]
thread_local_context();
thread_local
variables.std::bad_alloc
if any required storage cannot be
allocated.~thread_local_context();
thread_local
variables. All objects with thread storage duration (3.7.2)
constructed on this thread after the construction of
the thread_local_context
object are destroyed in
reverse order of construction (see 3.6.3), and restored to their
initial state. Once all such objects have been destroyed, any
functions registered with the context by
calling call_on_close()
are invoked in reverse
order. It is undefined behaviour to destroy an instance
of thread_local_context
on a thread other than that
on which it was constructed. It is undefined behaviour to destroy
an instance of thread_local_context
when instances
of thread_local_context
created later on the same
thread have not yet been destroyed.template<typename FunctionType> void call_on_close(FunctionType func);
func
to be called
when *this
is destroyed.std::bad_alloc
if any required storage cannot be
allocated. Any exceptions thrown by the copy constructor
of func
.func
shall not
exit via an exception, nor shall it access any objects of thread
storage duration.std::promise
and std::packaged_task
Add the following to the class definition
of std::promise
in section 30.6.4
[futures.promise]:
void set_value(thread_local_context & context,const R& r); void set_value(thread_local_context & context,see below); void set_exception(thread_local_context & context,exception_ptr p);
Add the following to the end of section 30.6.4 [futures.promise]:
void set_value(thread_local_context & context,const R& r); void promise::set_value(thread_local_context & context,R&& r); void promise<R&>::set_value(thread_local_context & context,R& r); void promise<void>::set_value(thread_local_context & context);
context
to set that state to ready when context
is destroyed,
as if by registering an appropriate function
with context.call_on_close()
.future_error
if its associated state already has a
stored value or exception.promise_already_satisfied
if its associated state
already has a stored value or exception.void set_exception(thread_local_context & context,exception_ptr p);
context
to set that state to ready when context
is destroyed,
as if by registering an appropriate function
with context.call_on_close()
.future_error
if its associated state already has a
stored value or exception.promise_already_satisfied
if its associated state
already has a stored value or exception.Added the following member function to the class definition
for std::packaged_task
in 30.6.7 [futures.task]:
void execute(thread_local_context const& context,ArgTypes...);
Add the following to 30.6.7 [futures.task] following paragraph 17:
void execute(thread_local_context const&,ArgTypes... args);
*this
and t1, t2, ...,
tN are the values in args....
If the task
returns normally, the return value is stored as the asynchronous
result associated with *this, otherwise the exception thrown by
the task is stored. context
is updated to ensure
that any threads blocked waiting for the asynchronous result
associated with the task are unblocked when context
is destroyed, as-if by passing an appropriate function
to context.call_on_close()
.Thanks to Alberto Ganesh Barbati, Peter Dimov, Lawrence Crowl, Beman Dawes, Herb Sutter and others who have commented on earlier versions of this proposal on the mailing lists and via personal email.