Document Number: N2139=06-0209
Anthony Williams <anthony@justsoftwaresolutions.co.uk>
Just Software Solutions Ltd
2006-11-06
This document outlines my thoughts for a new thread library interface for C++. It is based heavily on N2094 by Howard Hinnant, but also reflects thoughts arising from discussion on the committee reflectors and the boost mailing list.
This proposal assumes that the "Generalized Constant Expressions" proposal by Gabriel Dos Reis and Bjarne Stroustrup will be accepted for inclusion in the next C++ Standard. If this is not the case, then this proposal will need refining.
Being able to initialize an object in a guaranteed thread-safe manner, or ensure
that a function is run only once is important for many libraries. It is for this
reason that POSIX provides pthread_once
, and boost provides call_once
. Microsoft
Windows Vista is also due to have a one-time initialization facility.
Given the importance of this facility, I think it should be included in any threading library for C++, and I was surprised to find that it had been omitted from all the thread-related papers in the Pre-Portland mailing.
I propose that we provide two mechanisms for this. The first, call_once
, is
analagous to the functions already mentioned, and should provide a means to
execute a function precisely once, and ensure that any threads executing the
call_once
will wait until the once-function has been completed. In order to
provide the most benefit to user code, the once-function should not be limited
to functions taking no parameters and returning void, but should be extended to
cover any callable object which can be called with no parameters.
In common with the pre-existing functions, this proposal requires the use of a
once_flag
to guarantee the one-time initialization. The interface is proposed as
follows:
struct once_flag { constexpr once_flag(); }; template<typename Callable> void call_once(once_flag& flag, Callable func);
The Callable
parameter func
is copyable. Copying shall have no side
effects, and the effect of calling the copy shall be equivalent to
calling the original.
Calling func
shall not throw an exception.
The parameter func
, or a copy thereof, is called exactly once, even when
call_once
is called multiple times with the same flag
. If multiple calls
to call_once
with the same flag
are executing concurrently in separate
threads, then only one thread shall call func
, and no thread shall
proceed until the call to func
has completed.
A call to call_once
is not affected by a prior or concurrent call to
call_once
from the same or another thread with a different flag
parameter.
Instances of once_flag
may be declared with any storage duration,
including automatic and dynamic storage. Initialization must proceed
correctly in these cases.
std::once_flag flag; void init(); void f() { std::call_once(flag,init); } struct initializer { void operator()(); }; void g() { static std::once_flag flag2; std::call_once(flag2,initializer()); }
As well as running a function just once, an important use case is to initialize
an object for use from multiple threads. Though such a task could be done by
passing the address of the object to initialize into a function passed to
call_once
, (e.g. call_once(flag,bind(&object::initialize,&some_instance))
), this
is a clumsy construction.
I understand that in Portland it was agreed to investigate reuse of the private
and protected
keywords to ensure thread-safe initialization of local statics. I propose the following class template
once_init
as a library-based alternative.
template<typename T> class once_init { public: implementation-defined-type operator->(); };
The template parameter T
is default-constructible.
Creating an instance oi
of type once_init<T>
, will default-construct a
single object of type T
, which will persist for the lifetime of oi
. In
the presence of a race-condition on the construction of oi
and use of
the contained object, exactly one thread shall construct the contained
object. All other threads involved in the race condition shall block
until construction of said object has completed.
If oi
is an instance of type once_init<T>
, and d
and f
are a data
member and a member function of T
respectively,
oi->f()
is well-formed, and calls the member function f
on the
contained instance of T
.oi->d
is well-formed, and refers to the data member d
of the
contained instance of T
.
Instances of once_init
may be declared with any storage duration,
including automatic and dynamic storage. Initialization must proceed
correctly in these cases.
In order to avoid the general object-initialization-order problems
associated with namespace-scope objects, should
operator->
be invoked on an instance of
once_init
with static storage duration prior to its
constructor being run, the behaviour shall be as-if the constructor
was called immediately prior to the call to
operator->
, and the actual call to the constructor
shall have no effect.
class A { public: void f(); }; void foo() { static once_init<A> a; a->f(); }
The basic synchronization primitive in common use is the mutex. N2094 covers a
variety of types of mutex and associated locks. I believe that "convertible
shared ownership" is a dangerous concept and should not be supported, so the
try_unlock_shareable_and_lock
and try_unlock_shareable_and_lock_upgradeable
functions should be removed, and unlock_and_lock_shareable
moved into the
upgradeable concept. That leaves three mutex concepts: exclusive ownership,
shared ownership, and upgradeable ownership, which I believe cover the majority
of use cases.
There are many possible scenarios in which a mutex object may be constructed, for example:
std::mutex global; void f() { static std::mutex local_static; std::mutex automatic; std::mutex* dynamic=new std::mutex; }
Mutex types should be guaranteed to be correctly initialized in all cases.
This includes the local static, which would be subject to race conditions with
many current compilers, if f
or g
were to be called from multiple threads
concurrently. Correct initialization in such cases could be accomplished via
platform and compiler-specific mechanisms, by use of a One-Time Initialization
mechanism as described above, or by use of a constexpr
constructor.
A mutex object may also be used as a non-static data member of a class, in order to protect other data members from concurrent modification:
struct X { std::mutex mtx; std::vector<std::string> entries; void foo(std::string const& s) { std::exclusive_lock<std::mutex> lk(mtx); entries.push_back(s); } };
The mutex type must be such that no explicit initialization of such a data
member is required. A constexpr
constructor should yield sufficient guarantees.
Each of these mutex concepts provides an associated set of lock and unlock functions, including "try lock" functions. Rather than a single signature for each "try lock" function, I propose an overloaded set, as follows:
bool try_lock(); bool try_lock(unsigned spin_count); bool try_lock(target_time_type target_time); bool try_lock(time_duration_type wait_time);
with similar sets for try_lock_shareable
, try_lock_upgradeable
and so on.
The overload with no arguments should just try and acquire the lock, as in N2094.
The overload taking a spin count should use the spin count as a hint when trying to acquire the lock. The intention is that on those implementations where the attempted lock acquisition is implemented as a single atomic instruction, the implementation should spin the specified number of times. This hint is merely advisory, and an implementation may choose to ignore it.
The overload taking a target time should keep trying until the specified
time. It is intended that target_time_type
be compatible with the datetime
type
from N2058, and should be one and the same, if datetime
is incorporated into the
Standard.
The final overload, taking a wait time, should keep trying for the specified
duration. It is intended that time_duration_type
be compatible with the duration
types from N2058, so that one can write
some_mutex.try_lock(milliseconds(50));
As such, it may be necessary for this overload to be a template:
template<typename time_impl> bool try_lock(basic_time_duration<time_impl> wait_time);
although providing a separate time_duration_type
with implicit conversions from
basic_time_duration<T>
would also be acceptable to the author.
Since timed locks cannot always be efficiently implemented on the underlying mutex, I propose that the additional overloads with time parameters are separate "add-on" concepts to the basic mutex concepts. The implementation of the standard mutex types may or may not choose to support these concepts.
The overload with spin count hint should be part of the standard mutex concepts, however. For those cases where the spin count makes no sense, it can be ignored.
I therefore propose a set of adaptor templates: timed_exclusive_adaptor
,
timed_shareable_adaptor
and timed_upgradeable_adaptor
. Each of these templates
will take a mutex type as a template parameter, and this mutex must meet the corresponding
concept without the timed aspect. The adaptor will then provide the appropriate timed
signatures. It is expected that implementations will provide specializations of
this adaptor for the standard mutex types alongside the generic template, to
cover cases where they already provide the interface, or it can be efficiently
implemented with knowledge of the implementation details of the mutex.
template<typename Mutex> class timed_exclusive_adaptor { public: timed_exclusive_adaptor(); ~timed_exclusive_adaptor(); void lock(); bool try_lock(); bool try_lock(unsigned spin_count); bool try_lock(target_time_type target_time); bool try_lock(time_duration_type wait_time); void unlock(); }; template<typename Mutex> class timed_shareable_adaptor { public: timed_shareable_adaptor(); ~timed_shareable_adaptor(); void lock(); bool try_lock(); bool try_lock(unsigned spin_count); bool try_lock(target_time_type target_time); bool try_lock(time_duration_type wait_time); void unlock(); void lock_shareable(); bool try_lock_shareable(); bool try_lock_shareable(unsigned spin_count); bool try_lock_shareable(target_time_type target_time); bool try_lock_shareable(time_duration_type wait_time); void unlock_shareable(); }; template<typename Mutex> class timed_upgradeable_adaptor { public: timed_upgradeable_adaptor(); ~timed_upgradeable_adaptor(); void lock(); bool try_lock(); bool try_lock(unsigned spin_count); bool try_lock(target_time_type target_time); bool try_lock(time_duration_type wait_time); void unlock(); void lock_shareable(); bool try_lock_shareable(); bool try_lock_shareable(unsigned spin_count); bool try_lock_shareable(target_time_type target_time); bool try_lock_shareable(time_duration_type wait_time); void unlock_shareable(); void lock_upgradeable(); bool try_lock_upgradeable(); bool try_lock_upgradeable(unsigned spin_count); bool try_lock_upgradeable(target_time_type target_time); bool try_lock_upgradeable(time_duration_type wait_time); void unlock_upgradeable(); void unlock_upgradeable_and_lock(); bool try_unlock_upgradeable_and_lock(); bool try_unlock_upgradeable_and_lock(unsigned spin_count); bool try_unlock_upgradeable_and_lock(target_time_type target_time); bool try_unlock_upgradeable_and_lock(time_duration_type wait_time); void unlock_upgradeable_and_lock_sharable(); void unlock_and_lock_shareable(); void unlock_and_lock_upgradeable(); };
N2094 describes 3 different types of lock object: exclusive, shareable and upgradeable. Whilst I agree with the overall concepts, I believe that the use of move constructors to upgrade and downgrade locks is dangerous. For example:
void f(upgradeable_mutex& m) { upgradeable_lock ul(m); if(xyz()) { exclusive_lock el(move(ul)); do_stuff(); } // do we have a lock or not? }
The move constructors have their place, as they allow transfer of the lock
between functions. Therefore I propose an additional class
upgrade_to_exclusive_lock
, which takes an upgradeable_lock
and upgrades it to exclusive for
the lifetime of the upgrade_to_exclusive_lock
instance. e.g.
void f(upgradeable_mutex& m) { upgradeable_lock ul(m); if(xyz()) { upgrade_to_exclusive_lock el(ul); // we now have an exclusive lock // ul cannot be used } // ul is back to being an upgradeable }
template <class Mutex> class upgrade_to_exclusive_lock { public: typedef Mutex mutex_type; public: explicit upgrade_to_exclusive_lock(upgradeable_lock<mutex_type>& m); ~upgrade_to_exclusive_lock(); upgrade_to_exclusive_lock(upgrade_to_exclusive_lock&& sl); upgrade_to_exclusive_lock& operator= (upgrade_to_exclusive_lock&& sl); private: upgrade_to_exclusive_lock(upgrade_to_exclusive_lock const&); explicit upgrade_to_exclusive_lock(upgradable_lock<mutex_type> const&); upgrade_to_exclusive_lock& operator= (upgrade_to_exclusive_lock const&); public: bool owns() const; operator unspecified-bool-type() const; const mutex_type* mutex() const; void swap(upgrade_to_exclusive_lock&& w); };
On construction, the mutex owned by the upgradeable_lock
is upgraded to an
exclusive lock as-if by calling unlock_upgradeable_and_lock()
on the mutex
owned by the upgradeable_lock
.
On destruction, the mutex owned by the upgradeable_lock
is downgraded to an
upgradeable lock as-if by calling unlock_and_lock_upgradeable()
on the mutex
owned by the upgradeable_lock
.
Rather than just upgrading the mutex, but leaving the upgradeable_lock
alone, the constructor should modify the upgradeable_lock
so that any
operations performed directly on it fail in a clear manner, rather than
potentially deadlock.
Thanks to Howard Hinnant, Peter Dimov and Roland Schwarz for comments on some of the interfaces proposed here, and Jeff Garland who pointed me to the Date-Time papers N1900 and N2058.