Document Number: N2139=06-0209
Anthony Williams <anthony@justsoftwaresolutions.co.uk>
Just Software Solutions Ltd
2006-11-06

Thoughts on a Thread Library for C++

Introduction

This document outlines my thoughts for a new thread library interface for C++. It is based heavily on N2094 by Howard Hinnant, but also reflects thoughts arising from discussion on the committee reflectors and the boost mailing list.

This proposal assumes that the "Generalized Constant Expressions" proposal by Gabriel Dos Reis and Bjarne Stroustrup will be accepted for inclusion in the next C++ Standard. If this is not the case, then this proposal will need refining.

One-time initialization

Being able to initialize an object in a guaranteed thread-safe manner, or ensure that a function is run only once is important for many libraries. It is for this reason that POSIX provides pthread_once, and boost provides call_once. Microsoft Windows Vista is also due to have a one-time initialization facility.

Given the importance of this facility, I think it should be included in any threading library for C++, and I was surprised to find that it had been omitted from all the thread-related papers in the Pre-Portland mailing.

One-time function calls

I propose that we provide two mechanisms for this. The first, call_once, is analagous to the functions already mentioned, and should provide a means to execute a function precisely once, and ensure that any threads executing the call_once will wait until the once-function has been completed. In order to provide the most benefit to user code, the once-function should not be limited to functions taking no parameters and returning void, but should be extended to cover any callable object which can be called with no parameters.

In common with the pre-existing functions, this proposal requires the use of a once_flag to guarantee the one-time initialization. The interface is proposed as follows:

    struct once_flag
    {
        constexpr once_flag();
    };

    template<typename Callable>
    void call_once(once_flag& flag, Callable func);

Requires:

The Callable parameter func is copyable. Copying shall have no side effects, and the effect of calling the copy shall be equivalent to calling the original.

Calling func shall not throw an exception.

Semantics:

The parameter func, or a copy thereof, is called exactly once, even when call_once is called multiple times with the same flag. If multiple calls to call_once with the same flag are executing concurrently in separate threads, then only one thread shall call func, and no thread shall proceed until the call to func has completed.

A call to call_once is not affected by a prior or concurrent call to call_once from the same or another thread with a different flag parameter.

Instances of once_flag may be declared with any storage duration, including automatic and dynamic storage. Initialization must proceed correctly in these cases.

Examples:

        std::once_flag flag;

        void init();

        void f()
        {
            std::call_once(flag,init);
        }

        struct initializer
        {
            void operator()();
        };

        void g()
        {
            static std::once_flag flag2;
            std::call_once(flag2,initializer());
        }

One-time object initialization

As well as running a function just once, an important use case is to initialize an object for use from multiple threads. Though such a task could be done by passing the address of the object to initialize into a function passed to call_once, (e.g. call_once(flag,bind(&object::initialize,&some_instance))), this is a clumsy construction.

I understand that in Portland it was agreed to investigate reuse of the private and protected keywords to ensure thread-safe initialization of local statics. I propose the following class template once_init as a library-based alternative.

    template<typename T>
    class once_init
    {
    public:
        implementation-defined-type operator->();
    };

Requires:

The template parameter T is default-constructible.

Semantics:

Creating an instance oi of type once_init<T>, will default-construct a single object of type T, which will persist for the lifetime of oi. In the presence of a race-condition on the construction of oi and use of the contained object, exactly one thread shall construct the contained object. All other threads involved in the race condition shall block until construction of said object has completed.

If oi is an instance of type once_init<T>, and d and f are a data member and a member function of T respectively,

oi->f() is well-formed, and calls the member function f on the contained instance of T.
oi->d is well-formed, and refers to the data member d of the contained instance of T.

Instances of once_init may be declared with any storage duration, including automatic and dynamic storage. Initialization must proceed correctly in these cases.

In order to avoid the general object-initialization-order problems associated with namespace-scope objects, should operator-> be invoked on an instance of once_init with static storage duration prior to its constructor being run, the behaviour shall be as-if the constructor was called immediately prior to the call to operator->, and the actual call to the constructor shall have no effect.

Examples:


        class A
        {
        public:
            void f();
        };

        void foo()
        {
            static once_init<A> a;
            a->f();
        }

General Thread Synchronization

The basic synchronization primitive in common use is the mutex. N2094 covers a variety of types of mutex and associated locks. I believe that "convertible shared ownership" is a dangerous concept and should not be supported, so the try_unlock_shareable_and_lock and try_unlock_shareable_and_lock_upgradeable functions should be removed, and unlock_and_lock_shareable moved into the upgradeable concept. That leaves three mutex concepts: exclusive ownership, shared ownership, and upgradeable ownership, which I believe cover the majority of use cases.

Mutex Initialization

There are many possible scenarios in which a mutex object may be constructed, for example:

    std::mutex global;

    void f()
    {
        static std::mutex local_static;
        std::mutex automatic;
        std::mutex* dynamic=new std::mutex;
    }

Mutex types should be guaranteed to be correctly initialized in all cases.

This includes the local static, which would be subject to race conditions with many current compilers, if f or g were to be called from multiple threads concurrently. Correct initialization in such cases could be accomplished via platform and compiler-specific mechanisms, by use of a One-Time Initialization mechanism as described above, or by use of a constexpr constructor.

A mutex object may also be used as a non-static data member of a class, in order to protect other data members from concurrent modification:

    struct X
    {
        std::mutex mtx;
        std::vector<std::string> entries;

        void foo(std::string const& s)
        {
            std::exclusive_lock<std::mutex> lk(mtx);
            entries.push_back(s);
        }
    };

The mutex type must be such that no explicit initialization of such a data member is required. A constexpr constructor should yield sufficient guarantees.

Try Locks and Timed Locks

Each of these mutex concepts provides an associated set of lock and unlock functions, including "try lock" functions. Rather than a single signature for each "try lock" function, I propose an overloaded set, as follows:

    bool try_lock();
    bool try_lock(unsigned spin_count);
    bool try_lock(target_time_type target_time);
    bool try_lock(time_duration_type wait_time);

with similar sets for try_lock_shareable, try_lock_upgradeable and so on.

The overload with no arguments should just try and acquire the lock, as in N2094.

The overload taking a spin count should use the spin count as a hint when trying to acquire the lock. The intention is that on those implementations where the attempted lock acquisition is implemented as a single atomic instruction, the implementation should spin the specified number of times. This hint is merely advisory, and an implementation may choose to ignore it.

The overload taking a target time should keep trying until the specified time. It is intended that target_time_type be compatible with the datetime type from N2058, and should be one and the same, if datetime is incorporated into the Standard.

The final overload, taking a wait time, should keep trying for the specified duration. It is intended that time_duration_type be compatible with the duration types from N2058, so that one can write

    some_mutex.try_lock(milliseconds(50));

As such, it may be necessary for this overload to be a template:

    template<typename time_impl>
    bool try_lock(basic_time_duration<time_impl> wait_time);

although providing a separate time_duration_type with implicit conversions from basic_time_duration<T> would also be acceptable to the author.

Since timed locks cannot always be efficiently implemented on the underlying mutex, I propose that the additional overloads with time parameters are separate "add-on" concepts to the basic mutex concepts. The implementation of the standard mutex types may or may not choose to support these concepts.

The overload with spin count hint should be part of the standard mutex concepts, however. For those cases where the spin count makes no sense, it can be ignored.

I therefore propose a set of adaptor templates: timed_exclusive_adaptor, timed_shareable_adaptor and timed_upgradeable_adaptor. Each of these templates will take a mutex type as a template parameter, and this mutex must meet the corresponding concept without the timed aspect. The adaptor will then provide the appropriate timed signatures. It is expected that implementations will provide specializations of this adaptor for the standard mutex types alongside the generic template, to cover cases where they already provide the interface, or it can be efficiently implemented with knowledge of the implementation details of the mutex.

    template<typename Mutex>
    class timed_exclusive_adaptor
    {
    public:
        timed_exclusive_adaptor();
        ~timed_exclusive_adaptor();

        void lock();
        bool try_lock();
        bool try_lock(unsigned spin_count);
        bool try_lock(target_time_type target_time);
        bool try_lock(time_duration_type wait_time);
        void unlock();
    };

    template<typename Mutex>
    class timed_shareable_adaptor
    {
    public:
        timed_shareable_adaptor();
        ~timed_shareable_adaptor();

        void lock();
        bool try_lock();
        bool try_lock(unsigned spin_count);
        bool try_lock(target_time_type target_time);
        bool try_lock(time_duration_type wait_time);
        void unlock();

        void lock_shareable();
        bool try_lock_shareable();
        bool try_lock_shareable(unsigned spin_count);
        bool try_lock_shareable(target_time_type target_time);
        bool try_lock_shareable(time_duration_type wait_time);
        void unlock_shareable();
    };

    template<typename Mutex>
    class timed_upgradeable_adaptor
    {
    public:
        timed_upgradeable_adaptor();
        ~timed_upgradeable_adaptor();

        void lock();
        bool try_lock();
        bool try_lock(unsigned spin_count);
        bool try_lock(target_time_type target_time);
        bool try_lock(time_duration_type wait_time);
        void unlock();

        void lock_shareable();
        bool try_lock_shareable();
        bool try_lock_shareable(unsigned spin_count);
        bool try_lock_shareable(target_time_type target_time);
        bool try_lock_shareable(time_duration_type wait_time);
        void unlock_shareable();

        void lock_upgradeable();
        bool try_lock_upgradeable();
        bool try_lock_upgradeable(unsigned spin_count);
        bool try_lock_upgradeable(target_time_type target_time);
        bool try_lock_upgradeable(time_duration_type wait_time);
        void unlock_upgradeable();

        void unlock_upgradeable_and_lock();
        bool try_unlock_upgradeable_and_lock();
        bool try_unlock_upgradeable_and_lock(unsigned spin_count);
        bool try_unlock_upgradeable_and_lock(target_time_type target_time);
        bool try_unlock_upgradeable_and_lock(time_duration_type wait_time);

        void unlock_upgradeable_and_lock_sharable();
        void unlock_and_lock_shareable();
        void unlock_and_lock_upgradeable();
    };

Lock objects

N2094 describes 3 different types of lock object: exclusive, shareable and upgradeable. Whilst I agree with the overall concepts, I believe that the use of move constructors to upgrade and downgrade locks is dangerous. For example:

    void f(upgradeable_mutex& m)
    {
        upgradeable_lock ul(m);

        if(xyz())
        {
            exclusive_lock el(move(ul));
            do_stuff();
        }
        // do we have a lock or not?
    }

The move constructors have their place, as they allow transfer of the lock between functions. Therefore I propose an additional class upgrade_to_exclusive_lock, which takes an upgradeable_lock and upgrades it to exclusive for the lifetime of the upgrade_to_exclusive_lock instance. e.g.

    void f(upgradeable_mutex& m)
    {
        upgradeable_lock ul(m);

        if(xyz())
        {
            upgrade_to_exclusive_lock el(ul);
            // we now have an exclusive lock
            // ul cannot be used
        }
        // ul is back to being an upgradeable
    }

Interface:

    template <class Mutex>
    class upgrade_to_exclusive_lock
    {
    public:
        typedef Mutex mutex_type;
    public:
        explicit upgrade_to_exclusive_lock(upgradeable_lock<mutex_type>& m);

        ~upgrade_to_exclusive_lock();

        upgrade_to_exclusive_lock(upgrade_to_exclusive_lock&& sl);
        upgrade_to_exclusive_lock& operator=  (upgrade_to_exclusive_lock&& sl);
    private:
        upgrade_to_exclusive_lock(upgrade_to_exclusive_lock const&);
        explicit upgrade_to_exclusive_lock(upgradable_lock<mutex_type> const&);
        upgrade_to_exclusive_lock& operator=  (upgrade_to_exclusive_lock const&);
    public:
        bool owns() const;
        operator unspecified-bool-type() const;
        const mutex_type* mutex() const;

        void swap(upgrade_to_exclusive_lock&& w);
    };

Semantics:

On construction, the mutex owned by the upgradeable_lock is upgraded to an exclusive lock as-if by calling unlock_upgradeable_and_lock() on the mutex owned by the upgradeable_lock.

On destruction, the mutex owned by the upgradeable_lock is downgraded to an upgradeable lock as-if by calling unlock_and_lock_upgradeable() on the mutex owned by the upgradeable_lock.

Rather than just upgrading the mutex, but leaving the upgradeable_lock alone, the constructor should modify the upgradeable_lock so that any operations performed directly on it fail in a clear manner, rather than potentially deadlock.

Acknowledgements

Thanks to Howard Hinnant, Peter Dimov and Roland Schwarz for comments on some of the interfaces proposed here, and Jeff Garland who pointed me to the Date-Time papers N1900 and N2058.