The level of thread safety offered by shared_ptr
is the default for
the standard library: const operations on the same instance can be performed concurrently
by multiple threads, but updates require exclusive access. Some scenarios require
stronger guarantees; there is considerable demand (on the Boost mailing lists and
otherwise) for an atomic shared_ptr
, one that can withstand
concurrent updates from two threads without synchronization. This is often motivated
by the desire to port an existing scalable pattern that relies on atomic operations
on raw pointers in combination with garbage collection.
We propose the addition of the following members to shared_ptr
's interface
to address this use case:
shared_ptr<T> atomic_load() const; void atomic_store( shared_ptr<T> r ); shared_ptr<T> atomic_swap( shared_ptr<T> r ); bool atomic_compare_swap( shared_ptr<T> & v, shared_ptr<T> w );
A typical example scenario — a "lock-free" reader/writer pattern — that takes advantage of this functionality is outlined below:
shared_ptr<State> ps; void reader() { shared_ptr<State const> p = ps.atomic_load(); use *p; } // single writer case void writer() { shared_ptr<State> p( new State( *ps ) ); update *p reflecting new information; ps.atomic_store( move( p ) ); } // or, multiple writers case void writer() { shared_ptr<State> p = ps.atomic_load(); do { shared_ptr<State> q( new State( *p ) ); update *q reflecting new information; } while( !ps.atomic_compare_swap( p, move( q ) ) ); }
This feature extends the interface of shared_ptr
in a backward-compatible
way. We believe that it is a strong candidate for addition to the C++0x standard.
It introduces no source- or binary compatibility issues.
The proposed additions have been implemented in Boost and will be part of Boost 1.36. Performance tests have identified reader/writer scenarios where the above pattern achieves close to linear scalability, outperforming a read/write lock.
There are several possible approaches that lead to a lock-free implementation: hazard pointers, k-CAS, software transactional memory. The challenge is to consistently outperform the spinlock baseline implementation, and this part is not easy; that is why it makes sense to provide the functionality as part of the standard library.
Yes, the spinlock implementation is still worth having. For certain tasks and workloads, it still scales better than mutexes or read/write locks. See the performance section.
A spinlock-based implementation can be done non-intrusively in a TR, but a lock-free
implementation cannot, as it inherently requires access to shared_ptr
internals. We don't want to disallow lock-free implementations because they would
scale better. The area is a subject of active research and having a stable interface
as a target will, hopefully, lead to the emergence of such implementations.
We do not know whether a lock-free implementation will need to change shared_ptr
's
internal details, but the conservative assumption is that it might, which could
potentially require an ABI change unsuitable for a technical report.
shared_ptr
maintain a version counter?We believe that the interface is sufficient.
The need for a version counter is motivated by the read-upgrade-write pattern:
void maybe_writer() { obtain a read lock; use object; if( object needs updating ) { upgrade read lock to write lock; update object; } }
which in shared_ptr
terms looks like:
void maybe_writer() { shared_ptr<State> p = ps.atomic_load(); do { use *p; if( object doesn't need updating ) break; shared_ptr<State> q( new State( *p ) ); update *q; } while( !ps.atomic_compare_swap( p, move( q ) ) ); }
The other possible use for a version counter is to avoid the
ABA problem that is common for CAS-based algorithms. ABA cannot occur in
our case because the storage for the object referenced by p
cannot
be reused while p
is still alive.
std::atomic
specialization?
An std::atomic< shared_ptr<> >
specialization is not an
appropriate way to provide the functionality. The template parameter of std::atomic<T>
is required to have a trivial copy constructor and an assignment operator, for good
reasons; calling user code from within the atomic operations is a recipe for disaster.
Since the copy constructor and the assignment operator of shared_ptr
aren't trivial, it is not acceptable to instantiate std::atomic
on
it.
Providing a separate type atomic_shared_ptr<T>
is a legitimate
alternative. We have instead chosen to propose additions to the existing shared_ptr
interface for the following reasons:
shared_ptr
internals,
member functions seem the best place from an organizational point of view; it is
likely that they will be implemented by the same person who implements and maintains
shared_ptr
.shared_ptr
interface, we make it clear that
the support for atomic access and manipulation does not alter the type or the layout
of shared_ptr
.shared_ptr
instances atomically without sacrificing
the ability to keep them in containers. In particular, we could have an std::map<Key,
shared_ptr<State>>
and use the reader-writer pattern on the
elements of this map.atomic_shared_ptr
type and do
not feel confident that we can propose the right interface for it.atomic_shared_ptr
on top of shared_ptr
, but not vice versa.
Add to shared_ptr
[util.smartptr.shared] the following:
// [util.smartptr.shared.atomic], atomic access: shared_ptr atomic_load( memory_order mo = memory_order_seq_cst ) const; void atomic_store( shared_ptr r, memory_order mo = memory_order_seq_cst ); shared_ptr atomic_swap( shared_ptr r, memory_order mo = memory_order_seq_cst ); bool atomic_compare_swap( shared_ptr & v, shared_ptr w ); bool atomic_compare_swap( shared_ptr & v, shared_ptr w, memory_order success, memory_order failure );
Add a new section [util.smartptr.shared.atomic]:
Concurrent access to a
shared_ptr
instance from multiple threads does not introduce a data race if the access is done exclusively via the member functions in this section.The meaning of the arguments of type
memory_order
is explained in 29.1/1 [atomics.order] and 29.1/2.shared_ptr atomic_load( memory_order mo = memory_order_seq_cst ) const;Requires:
mo
shall not bememory_order_release
ormemory_order_acq_rel
.Returns:
*this
.Throws: nothing.
void atomic_store( shared_ptr r, memory_order mo = memory_order_seq_cst );Requires:
mo
shall not bememory_order_acquire
ormemory_order_acq_rel
.Effects:
swap( r )
.Throws: nothing.
shared_ptr atomic_swap( shared_ptr r, memory_order mo = memory_order_seq_cst );Effects:
swap( r )
.Returns: the previous value of
*this
.Throws: nothing.
bool atomic_compare_swap( shared_ptr & v, shared_ptr w, memory_order success, memory_order failure );Requires:
failure
shall not bememory_order_release
,memory_order_acq_rel
, or stronger thansuccess
.Effects: If
*this
is equivalent tov
, assignsw
to*this
and has synchronization semantics corresponding to to the value ofsuccess
, otherwise assigns*this
tov
and has synchronization semantics corresponding to to the value offailure
.Returns:
true
if*this
was equivalent tov
,false
otherwise.Throws: nothing.
Remarks: two
shared_ptr
instances are equivalent if they store the same pointer value and share ownership.bool atomic_compare_swap( shared_ptr & v, shared_ptr w );Returns:
atomic_compare_swap(v, w, memory_order_seq_cst, memory_order_seq_cst)
.
A straightforward implementation would use a per-instance spinlock, obtained from a spinlock pool keyed on a hash of the address of the instance. In pseudocode, the functions will be implemented as follows:
shared_ptr atomic_load() const { lock the spinlock for *this; shared_ptr r( *this ); unlock the spinlock for *this; return r; } void atomic_store( shared_ptr r ) { lock the spinlock for *this; swap( r ); unlock the spinlock for *this; } shared_ptr atomic_swap( shared_ptr r ) { lock the spinlock for *this; swap( r ); unlock the spinlock for *this; return r; } bool atomic_compare_swap( shared_ptr & v, shared_ptr w ) { lock the spinlock for *this; if( *this is equivalent to v ) { swap( w ); unlock the spinlock for *this; return true; } else { shared_ptr tmp( *this ); unlock the spinlock for *this; tmp.swap( v ); return false; } }
Note that the code carefully avoids destroying a non-empty shared_ptr
instance while holding the spinlock, as this may lead to user code being called
and present an opportunity for a deadlock. It also attempts to limit the spinlock
scope to the minimum necessary to prevent contention.
An implementation that closely follows this outline has already been committed to the Boost SVN and will be part of Boost release 1.36. The relevant files can be browsed online at:
We used
sp_atomic_mt2_test.cpp to evaluate the performance of the reader-writer
pattern from the overview. For comparison purposes we used
synchronization with a boost::detail::lightweight_mutex
, equivalent
to CRITICAL_SECTION
under Windows, and boost::shared_mutex
,
a reader/writer lock that allows multiple concurrent read locks and a single exclusive
write lock.
The test is intended to emulate a server process that fulfills client read or write requests from a number of worker threads.
We picked the following test parameters:
The results, obtained on a non-hyperthreaded dual core Pentium D (two hardware threads) under Windows XP64, are shown in the following tables.
Time in seconds:
Primitive | 1 thread | 2 threads | 4 threads |
mutex | 8.312 | 20.921 | 42.812 |
rwlock | 8.437 | 23.390 | 46.468 |
atomic access | 8.515 | 9.421 | 18.781 |
Operations per millisecond:
Primitive | 1 thread | 2 threads | 4 threads |
mutex | 120 | 96 | 93 |
rwlock | 119 | 86 | 86 |
atomic access | 117 | 212 | 213 |
It is clear that for this combination of parameters, the atomic access reader/writer pattern offers the best scalability.
Note that we make no claim that the atomic access approach unconditionally outperforms the others, merely that it may be best for certain scenarios that can be encountered in practice.
Thanks to Hans Boehm and Lawrence Crowl for reviewing this paper.
--end