Document Number: | N2907=09-0097 |
Date: | 2009-06-18 |
Author: | Anthony
Williams Just Software Solutions Ltd |
This paper discusses a suggestion I made on the LWG reflector
and cpp-thread mailing list to address the issues raised in N2880
surrounding the lifetime of thread_local
variables.
The basic idea of this proposal is that the lifetime
of thread_local
variables is tied to the lifetime of an
instance of the new class thread_local_context
. Each
thread has an implicit instance of such a class constructed prior to
the invocation of the thread function, and destroyed after
completion of the thread function, but additional instances can be
created in order to deliberately limit the lifetime
of thread_local
variables: when
a thread_local_context
object is destroyed, all
the thread_local
variables tied to it are also
destroyed.
This enables us to address several of the concerns of
N2880. Firstly, if we use a mechanism other than thread::join
to wait for a thread to complete its work — such as waiting for a
unique_future
to be ready — then N2880 correctly
highlights that under the current working paper the destructors
of thread_local
variables will still be running after
the waiting thread has resumed. By judicious use
of a thread_local_context
instance and block scoping,
we can ensure that the thread_local
variables are
destroyed before the future value is set. e.g.
int find_the_answer(); void thread_func(std::promise<int> * p) { int local_result; { thread_local_context context; // create a new context for thread_locals local_result=find_the_answer(); } // destroy thread_local variables along with the context object p->set_value(local_result); } int main() { std::promise<int> p; std::thread t(thread_func,&p); t.detach(); // we're going to wait on the future std::cout<<p.get_future().get()<<std::endl; }
When the call to get()
returns, we know that not only
is the future value ready, but the thread_local
variables on the other thread have also been destroyed.
A second concern of N2880 was the potential for accumulating vast
amounts of thread_local
variables when reusing threads
for multiple independent tasks, such as when implementing a thread
pool. Under such circumstances, the thread pool implementation can
wrap each task inside a scope containing a
thread_local_context
variable to ensure that when a
task is completed its thread_local
variables are
destroyed in a timely fashion. e.g.
std::mutex task_mutex; std::queue<std::function<void()>> tasks; std::condition_variable task_cond; bool done=false; void worker_thread() { std::unique_lock<std::mutex> lk(task_mutex); while(!done) { task_cond.wait(lk,[]{return !tasks.empty();}); std::function<void()> task=tasks.front(); tasks.pop_front(); lk.unlock(); { thread_local_context context; task(); } lk.lock(); } }
With this scheme, the thread_local
variables are
destroyed between each task invocation when
the thread_local_context
object is destroyed, so if the
sets of variables used by the tasks do not overlap then the problem
of increasing memory usage is avoided.
Obviously, such a class would have to be tightly integrated with
the mechanism for thread_local
variables used by a
compiler, so that they can be destroyed at the appropriate points,
and constructed again if necessary. This is a key point — for
the second scenario to work, then if
a thread_local_context
is destroyed and a fresh one
constructed then any thread_local
variables used during
the lifetime of a context object must be created afresh, even if
they were already created and destroyed during the lifetime of a
prior context object on the same thread.
This does mean that implementations are pretty much restricted to
initializing thread_local
variables on first use, with
a mechanism that allows the destructor
of thread_local_context
objects to reset that "first
use" flag. If the thread_local_context
is implemented
with compiler intrinsics then the compiler may still be able to
find optimization opportunities that allow batching of
initializations or less-frequent checking of the "first use"
flag.
thread_local_context
object lifetimesThere is are interesting issues surrounding the behaviour of code
with nested thread_local_context
objects. Is such
nesting allowed at all? What happens to thread_local
variables that have already been assigned variables when
a thread_local_context
object is constructed? What
about pointers to such variables?
I believe there are several possible answers to these questions, and I will address each in turn.
Certainly it could be argued that things are simpler if nesting is
disallowed, and the use cases primarily point
to thread_local_context
being used high up in the call
chain either directly in the thread function or not many levels
down. However, I think this is an unnecessary restriction. What I do
believe is important however is that lifetimes are properly nested,
and a couple of simple rules should be enforced:
thread_local_context
object should
be done on the same thread as construction, andthread_local_context
objects must be
in the order of construction.If these rules are not obeyed then std::terminate
should be called in the destructor of
the thread_local_context
object being executed when the
violation is discovered.
thread_local
variables with values
assigned prior to construction of
a thread_local_context
?The importance of this question can be neatly demonstrated by the following example. Note that this example does not use a nested context, but the same issues apply, and the answer should be the same in examples that do use nested contexts (if we permit them).
static thread_local int i=0; int main() { i=42; { thread_local_context context; std::cout<<i<<","; i=123; } std::cout<<i<<std::endl; }
What does this program print?
I can see use cases for both option 1 (42,123) and option 2
(0,42). Option 3 is only there as a straw man — the whole
point of the context objects is that thread_local
objects created within the lifetime of the context object are then
destroyed when the context object is destroyed.
Though potentially tempting, I think that undefined behaviour or
termination is undesirable as it would be hard to identify the
problem when looking at the source code, and it would be easy to
trigger such behaviour by calling a function that
used thread_local
prior to the construction of the
context.
So, which of options 1 and 2 do we go for? I favour option 2: the
construction of the context object creates a "clean slate"
for thread_local
variables.
The downside of doing so is that any library that
uses thread_local
data structures as a cache for
optimization purposes (such as an allocator with thread-local heaps)
will have to recreate those structures within each context, even
though it might be desirable to preserve such structures across
contexts. For example, with the worker_thread
in the
code above it might be desirable to preserve per-thread heaps across
task invocations to avoid repeatedly constructing/destructing the
heap.
However, I believe that this downside is outweighed by the clarity
of the code: with option 2, within a
new thread_local_context
you know that you have a
"clean slate", and that no thread_local
variables have
values left from another scope. With option 1, then our worker
thread example would suddenly start "leaking" values from one task
to another if that variable happened to be used in the code outside
the context. With option 2 this is not possible, as each task gets a
new copy of all the variables.
thread_local
variables?Let's look at our example again, but this time we'll also store the
address of i
in a normal local variable p
,
and dereference this pointer inside the context.
static thread_local int i=0; int main() { i=42; int* p=&i; { thread_local_context context; std::cout<<i<<","; std::cout<<*p<<","; *p=99; i=123; } std::cout<<i<<std::endl; }
What does this example print now?
I believe that options 1 and 2 here are the behaviours that best
correspond to options 1 and 2 for the lifetime issues: if we
preserve values from the parent context (option 1)
then *p
and i
refer to the same
variable. On the other hand, if we go for option 2 (the "clean
slate" option), then *p
refers to the variable from the
outer context, whereas in the nested context i
refers
to the new variable (which thus has a different address.)
I think the third and fourth alternatives are understandable from
an implementation perspective if we go for the "clean slate" option,
but not desirable. The third alternative corresponds to an
implementation that magically saves the values of
the thread_local
variables when the new context is
initialized, and reuses the same addresses to refer to the value of
that thread_local
variable in the current context. For
example, this could be done on a segmented architecture
where thread_local
variables live in a special segment,
and that segment is remapped for the new context, and the mapping
restored when the context is destroyed. However, I think this is
undesirable behaviour — we allow pointers
to thread_local
variables to be passed between threads,
and I think this is directly analagous: we should also allow
pointers to thread_local
variables to be passed between
contexts in a single thread. The fourth option (Undefined
behaviour) is just a "give implementors freedom" option, but I
think it is undesirable for the reasons just given, and because I
think undefined behaviour should not be introduced without very
good cause.
std::packaged_task
and the
proposed std::async
functionIf this proposal is adopted, then it could be used as part of an
implementation of std::async
(as proposed in N2889 and
N2901) to ensure that the associated future did not become ready
before the thread-local variables for the asynchronous task had been
destroyed.
This proposal could also be integrated
with std::packaged_task
to ensure that the contained
task was run in its own context, and that the context was destroyed
(and the future result value stored) before the future became
ready. This would allow end users to write a simple function for
spawning a task with a return value on a new thread without having
to worry about the issue of destruction of thread-local
variables. However, it could potentially yield surprising behaviour
if the task was invoked directly on an existing thread, particularly
if the "clean slate" option was chosen.
I have no proposed wording at this time. If the committee agrees to proceed with this, then I can work to provide wording.
Thanks to Alberto Ganesh Barbati, Peter Dimov, Lawrence Crowl, Beman Dawes and others who have commented on this proposal on the mailing lists and via personal email.