Significant changes since P0652R2 are marked with blue.
There's a lot of use-cases where a concurrent modification of unordered associative container is required. For example, there is a popular web service named Memcached which simply caches objects in memory. Also, we can use it to collect statistics, to store connection metadata or just use as lightweight key-value storage. Some big companies like Facebook have it in their standard library. There were attempts to add such containers/data structures in the past (N3425, N3732, and others...)
This paper is another attempt to deal with the problem. This time we are trying to keep the interface familiar to users, hard to misuse but still functional.
Reference implementation is available at https://github.com/BlazingPhoenix/concurrent-hash-map.
When users wish to use the concurrent associative data structure, they are searching for performance and scalability. Fastest known implementations rely on the open addressing MemC3.pdf, so the interface of this proposal allows having Open Addressing implementation under the hood.
In C++17 std::shared_ptr::use_count()
function
was removed because users were misusing it. Users were hoping that the
result of the function is actual
at the point where they were trying to use it. However, as soon
as the result is returned from the function it could expire as someone
modifies the value from other thread.
We followed the C++17 decision and removed all the functions that are useless/dangerous in concurrent
environments: size()
, count()
, empty()
, buckets_count()
and so
forth.
Iterators must take care of synchronization, otherwise, the user can not dereference the iterator at all. If Iterators do some synchronization it affects performance. Otherwise, if Iterators do some synchronization then deadlocks will happen. For example, if in first thread we iterate from beginning to the end of the container and in the second thread we iterate from the end to the beginning, then the deadlock will happen in the middle as the iterators met.
It is possible to drop the open addressing idea and make the iterators to have shared ownership of buckets. In that case, iterator may keep the bucket alive. This seems implementable and usable by users but significantly affects performance by adding multiple additional atomic operations and adding additional indirections. We tried to stick to this idea for a long time and minimize the performance impact. Finally, we created a list of use-cases for concurrent associative data structures and found out that in all of those use-cases iterators are useless or kill performance of the whole class (so are also useless). Instead of that, we came up with an idea of unsynchronized view/policy.
This is the killer feature of this proposal that attempts to fix all the limitations from above and provide a much more useful interface.
The idea is following: multiple operations on unordered containers make sense only if that container is not concurrently modified. A user may take the responsibility that no-one is modifying the container at this point and gain all the operations and iterators.
Such approach allows to initialize/prepare the data for container without additional synchronization overhead. It also allows advanced usage:
concurrent_unordered_map
for reads and write simultaneously.const unordered_map_view
on the same concurrent_unordered_map
simultaneously
(no modifications through the concurrent_unordered_map
interface are allowed!).
unordered_map_view
(no modifications are allowed from other threads!).const concurrent_unordered_map&
for reads and multiple threads
use const unordered_map_view
on the same concurrent_unordered_map
simultaneously
(ineffective: use multiple const
unordered_map_view
instead).
std::unordered_map
iterator invalidation:This is a consequence of allowing the open addressing as an underlying implementation.
This is a very risky decision because it unleashes new ways for deadlocking/breaking the container (users may recursively visit the same value; users may call heavy functions that will degrade the overall performance of the container; users can call some parallel functions of the standard library that may potentially use the same mutexes as the container implementation...).
However, there's a lot of use-cases where a value must be updated depending on the value that is in the container. Without a visitation, there's no way to do that safely, as all the functions return values by copy. See examples B and C.
We've added an ability to visit all elements of the container without locking the whole container. In this case, contents can be occasionally changed during iteration. Such functionality can be useful for debugging.
Some implementations (for example Intel TBB) provide a possibility to get an item from a container with an appropriate lock guard. There are two possible ways to implement it:
mapped_type
some lock guarded type (for example boost::synchronized_value
)
we can get thread-safe shared ownership of the collection objects without a risk of getting deadlocks.
According to thoughts provided above, we considered not to propose such interface because it decreases performance and creates a very simple ability for a user to make a deadlock.
namespace std { template <class Key, class T, class Hash = hash<Key>, class Pred = equal_to<Key>, class Allocator = allocator<pair<const Key, T>> > class concurrent_unordered_map; template <class Key, class T, class Hash = hash<Key>, class Pred = equal_to<Key>, class Allocator = allocator<pair<const Key, T>> > void swap(concurrent_unordered_map< Key, T, Hash, Pred, Allocator>& lhs,concurrent_unordered_map< Key, T, Hash, Pred, Allocator>& rhs); }
concurrent_unordered_map
namespace std { template <class Key, class T, class Hash = hash<Key>, class Pred = equal_to<Key>, class Allocator = allocator<pair<const Key, T>> > class concurrent_unordered_map { public: // type aliases using key_type = Key; using mapped_type = T; using value_type = pair<const Key, T>; using hasher = Hash; using key_equal = Pred; using allocator_type = Allocator; using size_type = implementation-defined; class unordered_map_view; // construct/copy/assign/destroy concurrent_unordered_map(); explicit concurrent_unordered_map(size_type n); concurrent_unordered_map(size_type n, const Hash& hf, const key_equal& eql = key_equal(), const allocator_type& a = allocator_type()); template <typename InputIterator> concurrent_unordered_map(InputIterator first, InputIterator last, size_type n = implementation-defined, const hasher& hf = hasher(), const key_equal& eql = key_equal(), const allocator_type& a = allocator_type()); concurrent_unordered_map(const Allocator&); concurrent_unordered_map(initializer_list<value_type> il, size_type n = implementation-defined, const hasher& hf = hasher(), const key_equal& eql = key_equal(), const allocator_type& a = allocator_type()); concurrent_unordered_map(concurrent_unordered_map&& other) noexcept; concurrent_unordered_map(concurrent_unordered_map&& other, const Allocator&); concurrent_unordered_map& operator=(concurrent_unordered_map&& other) noexcept; concurrent_unordered_map& operator=(initializer_list<value_type>il); ~concurrent_unordered_map(); // members observers allocator_type get_allocator() const; hasher hash_function() const; key_equal key_eq() const; // visitation template <typename Visitor> void visit(const key_type& key, Visitor& f); template <typename Visitor> void visit(const key_type& key, Visitor& f) const; template <typename Visitor> void visit_all(Visitor& f); template <typename Visitor> void visit_all(Visitor& f) const; template<typename K, typename Visitor, typename... Args> bool visit_or_emplace(K&& key, Visitor& f, Args&&... args); // access optional<mapped_type> find(const key_type& key) const; template<typename... Args> mapped_type find(const key_type& key, Args&&... args) const; // modifiers template<typename K, typename... Args> bool emplace(K&& key, Args&&... args); template<typename K, typename... Args> bool insert_or_assign(K&& key, Args&&... args); template<typename... Args> size_type update(const key_type& key, Args&&... args); size_type erase(const key_type& key); template<class H2, class P2> void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>& source); template<class H2, class P2> void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>&& source); void swap(concurrent_unordered_map&) noexcept(allocator_traits<Allocator>::is_always_equal::value && is_nothrow_swappable_v<Hash> && is_nothrow_swappable_v<Pred>); void clear() noexcept; // view retrieval unordered_map_view make_unordered_map_view(bool lock = false) const noexcept; };
The concurrent_unordered_map
class is an associative data
structure that provides an effective key-value storage.
Concurrent member functions call on the same instance of concurrent_unordered_map
do not introduce data races.
key_type
satisfies Cpp17MoveConstructible requirements.
mapped_type
satisfies Cpp17MoveConstructible requirements.
Unless specified otherwise all the member functions of concurrent_unordered_map
have the following additional requirements, and remarks:
Hash
, Pred
, and Allocator
do not introduce data races [Note: Safe to use same instance of Hash
, Pred
, or Allocator
concurrently. - end note].Key
and T
do not introduce data races for the following functions: all the constructors (including default, move and copy constructors); copy and move assignments; destructor [Note: Safe to concurrently use different instances of Key
, or T
for copying, moving, etc. - end note].concurrent_unordered_map
[Note: Is is safe to concurrently use the same instance of concurrent_unordered_map
- end note].Calls to functions of concurrent_unordered_map
that successfully modify keys or values synchronize with calls to functions successfully reading the value or key of the same keys [Note: Modifications of the key or value in one thread are visible to the thread that reads (or modifies) the same key or value. - end note]
[Note: Implementations are encouraged to implment concurrent reads of the same key or value without contention. - end note]
Destructor call for the unordered_map_view
referencing the instance of concurrent_unordered_map
synchronize with calls to functions successfully reading the value or key of the same instance of concurrent_unordered_map
[Note: Destructor of the unordered_map_view
makes all the changes done through the instance of unordered_map_view
visible to all the threads that use a concurrent_unordered_map
.- end note].
concurrent_unordered_map(); explicit concurrent_unordered_map(size_type n); concurrent_unordered_map(size_type n, const Hash& hf, const key_equal& eql = key_equal(), const allocator_type& a = allocator_type()); template <typename InputIterator> concurrent_unordered_map(InputIterator first, InputIterator last, size_type n = implementation-defined, const hasher& hf = hasher(), const key_equal& eql = key_equal(), const allocator_type& a = allocator_type()); concurrent_unordered_map(const Allocator&); concurrent_unordered_map(initializer_list<value_type> il, size_type n = implementation-defined, const hasher& hf = hasher(), const key_equal& eql = key_equal(), const allocator_type& a = allocator_type());
concurrent_unordered_map
with the analogous to the unoredered_map
effects.concurrent_unordered_map(concurrent_unordered_map&& other) noexcept; concurrent_unordered_map(concurrent_unordered_map&& other, const Allocator&);
concurrent_unordered_map
with the content of other
, leaving other in valid but unspecified state.other
before the constructor call may not be equal to *this
after the constructor call only in case of concurrent access to the other
.concurrent_unordered_map& operator=(concurrent_unordered_map&& other) noexcept;
other
to *this
, leaving other
in valid but unspecified state.other
before the operator call may not be equal to *this
only in case of concurrent access to the other
or concurrent access to *this
.concurrent_unordered_map& operator=(initializer_list<value_type>il);
li
to *this
.li
before the operator call may not be equal to *this
only in case of concurrent access to *this
.~concurrent_unordered_map();
allocator_type get_allocator() const; hasher hash_function() const; key_equal key_eq() const;
allocator_type
, hasher
and key_equal
should not introduce data races.allocator_type
, hasher
and key_equal
respectively.template <typename Visitor> void visit(const key_type& key, Visitor& f);
is_invocable_v<Visitor, mapped_type&>
f
on the value stored with a key equivalent to the key
.f
do not introduce data races.template <typename Visitor> void visit(const key_type& key, Visitor& f) const;
is_invocable_v<Visitor, const mapped_type&>
f
on the value stored with a key equivalent to the key
.f
do not introduce data races.template <typename Visitor> void visit_all(Visitor& f);
is_invocable_v<Visitor, const key_type&, mapped_type&>
f
on key and value pairs stored in *this
.f
do not introduce data races. It is guaranteed to visit all the keys of *this
only if there is no concurrent modifications of *this
.[Note: Invocation of f
on some
key and value does not prevent modifications of other keys and values in *this
. - end note].template <typename Visitor> void visit_all(Visitor& f) const;
is_invocable_v<Visitor, const key_type&, const mapped_type&>
f
on key and value pairs stored in *this
.f
do not introduce data races.It is guaranteed to visit all the keys of *this
only if there is no concurrent modifications of *this
.[Note: Invocation of f
on some
key and value does not prevent modifications of other keys and values in *this
. - end note].template<typename K, typename Visitor, typename... Args> bool visit_or_emplace(K&& key, Visitor& f, Args&&... args);
is_invocable_v<Visitor, mapped_type&> && is_constructible_v<key_type, K&&> && is_constructible_v<mapped_type, Args&&...>
f
on value stored with the key equivalent to key
if such key exist in *this
.
Otherwise stores key_type(std::forward<Key>(key))
and mapped_type(std::forward<Args>(args)...)
.key_type
and mapped_type
, access or modification of the arguments passed into f
do not introduce data races.optional<mapped_type> find(const key_type& key) const;
is_copy_constructible_v<mapped_type>
key
in *this
, otherwise returns an optional holding a copy of mapped_type for that key.template<typename... Args> mapped_type find(const key_type& key, Args&&... args) const;
is_copy_constructible_v<mapped_type> && is_constructible_v<mapped_type, Args&&...>
mapped_type(std::forward<Args>(args...))
if there is no key equivalent to key
in *this
, otherwise returns a copy of mapped_type for that key.template<typename K, typename... Args> bool emplace(K&& key, Args&&... args);
is_constructible_v<key_type, K&&> && is_constructible_v<mapped_type, Args&&...>
true
if key key
was not in the container and function emplaced it, false
otherwisetemplate<typename K, typename... Args> bool insert_or_assign(K&& key, Args&&... args);
is_constructible_v<key_type, K&&> && is_constructible_v<mapped_type, Args&&...> && is_move_assigneble_v<mapped_type>
true
if key key
was not in the container and function emplaced it, false
if the key was in container and mapped_type for that key was replaced by move assigning mapped_type(std::forward<Args>(args...))
template<typename... Args> size_type update(const key_type& key, Args&&... args);
is_constructible_v<mapped_type, Args&&...> && is_move_assigneble_v<mapped_type>
0
if the key
was not in the container, 1
if the key was in the container and mapped_type for that key was replaced by move assigning mapped_type(std::forward<Args>(args...))
size_type erase(const key_type& key);
0
if the key
was not in the container, 1
if the key was in the container and was erased.template<class H2, class P2> void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>& source); template<class H2, class P2> void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>&& source);
source
into *this
. For each key and value pair from source
apply the following rules:
If *this
already has the key from source
, key and value are left in the source
. Otherwise, key and value are moved into *this
.
[Note: If new values are being concurrently inserted into source
during this operation then at the end of this function invocation source
may have keys
that do not exist in *this
- end note]void swap(concurrent_unordered_map& other) noexcept(allocator_traits<Allocator>::is_always_equal::value && is_nothrow_swappable_v<Hash> && is_nothrow_swappable_v<Pred>);
other
into *this
.
[Note: If new values are being concurrently inserted into other
or *this
during this operation
then at the end of this function invocation other
may not be equal to *this
before the invocation and
at the end of this function invocation *this
may not be equal to other
before the invocation- end note]void clear() noexcept;
*this
.
[Note: If new values are being concurrently inserted into *this
during this operation
then at the end of this function invocation *this
may contain some keys- end note]unordered_map_view make_unordered_map_view(bool lock = false) const noexcept;
true
any concurrent access to *this
through member functions or any subsequent attempt to invoke this->make_unordered_map_view(true)
should block as long as the view is not destroyed.*this
.unordered_map_view
template <class Key, class T, class Hash, class Pred, class Allocator> class concurrent_unordered_map<Key, T, Hash, Pred, Allocator>::unordered_map_view { concurrent_unordered_map<Key, T, Hash, Pred, Allocator>& delegate; // exposition only public: // types: using key_type = Key; using mapped_type = T; using value_type = pair<const Key, T>; using hasher = Hash; using key_equal = Pred; using allocator_type = Allocator; using pointer = typename allocator_traits<Allocator>::pointer; using const_pointer = typename allocator_traits<Allocator>::const_pointer; using reference = value_type&; using reference = const value_type&; using size_type = implementation-defined; using difference_type = implementation-defined; using iterator = implementation-defined; using const_iterator = implementation-defined; using local_iterator = implementation-defined; using const_local_iterator = implementation-defined; // construct/copy/destroy unordered_map_view() = delete; unordered_map_view(unordered_map_view&&) noexcept = delete; unordered_map_view(const unordered_map_view&) noexcept = delete; unordered_map_view& operator=(unordered_map_view&&) noexcept = delete; unordered_map_view& operator=(const unordered_map_view&) noexcept = delete; ~unordered_map_view(); // iterators: iterator begin() noexcept; const_iterator begin() const noexcept; iterator end() noexcept; const_iterator end() const noexcept; const_iterator cbegin() const noexcept; const_iterator cend() const noexcept; // capacity: bool empty() const noexcept; size_type size() const noexcept; size_type max_size() const noexcept; // modifiers: template<typename... Args> pair<iterator, bool> emplace(Args&&... args); // We considered have only forwarding reference variant of insert to avoid interface overloading template<class P> pair<iterator, bool> insert(P&& x); template<class InputIterator> void insert(InputIterator first, InputIterator last); void insert(initializer_list<value_type> il); template <typename... Args> pair<iterator, bool> try_emplace(const key_type& k, Args&&... args); template <typename... Args> pair<iterator, bool> try_emplace(key_type&& k, Args&&... args); template <class M> pair<iterator, bool> insert_or_assign(const key_type& k, M&& obj); template <class M> pair<iterator, bool> insert_or_assign(key_type&& k, M&& obj); iterator erase(iterator position); iterator erase(const_iterator position); size_type erase(const key_type& k); iterator erase(const_iterator first, const_iterator last); void swap(concurrent_unordered_map&) noexcept(allocator_traits<Allocator>::is_always_equal::value && is_nothrow_swappable_v<Hash> && is_nothrow_swappable_v<Pred>); void clear() noexcept; template<class H2, class P2> void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>&& source); template<class H2, class P2> void merge(concurrent_unordered_multimap<Key, T, H2, P2, Allocator>&& source); // observers: allocator_type get_allocator() const; hasher hash_function() const; key_equal key_eq() const; // map operations: iterator find(const key_type& k); const_iterator find(const key_type& k) const; size_type count(const key_type& k) const; pair<iterator, iterator> equal_range(const key_type& k); pair<const_iterator, const_iterator> equal_range(const key_type& k) const; // element access: mapped_type& operator[](const key_type& k); mapped_type& operator[](key_type&& k); const mapped_type& at(const key_type& k) const; mapped_type& at(const key_type& k); // bucket interface: size_type bucket_count() const; size_type max_bucket_count() const; size_type bucket_size(size_type n); size_type bucket(const key_type& k) const; local_iterator begin(size_type n); const_local_iterator begin(size_type n) const; local_iterator end(size_type n); const_local_iterator end(size_type n) const; const_local_iterator cbegin(size_type n) const; const_local_iterator cend(size_type n) const; // hash policy: void rehash(size_type n); float load_factor() const noexcept; }; }
unordered_map_view
class refers concurrent_unordered_map
and provides an interface to concurrent_unordered_map
that satisfies unordered associative container requirements [unord.req]
except iterator invalidation requirements, load_factor
functions, size()
complexity requirements, buckets and node operations.
[ Note: Concurrent const member functions calls on the instances of unordered_map_view
referencing
the same concurrent_unordered_map
introduce data races - end note. ]
[ Note: Concurrent member functions calls on the concurrent_unordered_map
instance A and
on the unordered_map_view
referencing the instance A introduce data races. - end note. ]
~unordered_map_view();
*this
was created by make_unordered_map_view(true)
resumes execution of all the waiting opearions on concurrent_unordered_map
.#include <concurrent_unordered_map> #include <chrono> #include <string_view> #include <memory> using namespace std; void precess_user(string_view name, shared_ptr<const user_t> user); auto get_new_user(); auto get_request(); int main() { concurrent_unordered_map<string_view, shared_ptr<user_t> > users; // single threaded fill read_users_from_file(users.make_unordered_map_view()) constexpr unsigned threads_count = 10; while(1) { // concurrent work std::atomic<int> b{threads_count * 100500}; thread threads[threads_count]; for (auto& t: threads) { // processing users t = thread([&users, &b]() { while (--b > 0) { auto [user_name, data] = co_await get_request(); shared_ptr<const user_t> user = users.find(user_name, shared_ptr<const user_t>{}); if (!user) continue; precess_user(*user, data); } }); } // accepting users auto start = std::chrono::system_clock::now(); while (--b > 0) { auto [new_user_name, user] = co_await get_new_user(); users.insert(new_user_name, user); if (std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start) > 5000) { // debug print rough contents each 5 seconds users.visit_all([] (const string_view& name, const shared_ptr<user_t> user) { cout << name << " " << user->status() << endl; }); start = std::chrono::system_clock::now(); } } for (auto& t: threads) { t.join(); } // single threaded processing auto unsafe_users = users.make_unordered_map_view(); count_statistics(unsafe_users); dump_to_file(unsafe_users); cleanup(unsafe_users); } }
#include <concurrent_unordered_map> #include <algorithm> int main() { using namespace std; using event_id = ...; struct event_data { event_data(const event_data&) = delete; event_data& operator=(const event_data&) = delete; ... }; concurrent_unordered_map<event_id, unique_ptr<event_data> > events; // Getting unique events. auto event_generators = get_event_generators(); for_each(execution::par_unseq, event_generators.begin(), event_generators.end(), [&events](auto& g) { while (1) { auto [event_name, data] = co_await g.get_event(); if (event_name.empty()) { break; // no more events } events.visit_or_emplace(event_name, [&data](unique_ptr<event_data>& v){ if (v || v->priority() < data->priority()) { std::swap(data, v); } }, unique_ptr<event_data>{}); } }); auto v = events.make_unordered_map_view(); for_each(execution::par_unseq, v.begin(), v.end(), [](auto& e) { process(e.first, std::move(e.second)); }); }
#include <concurrent_unordered_map> #include <utility> int main() { using namespace std; using id_t = ...; using use_count_t = size_t; concurrent_unordered_map<id_t, use_count_t> stats; constexpr unsigned threads_count = 10; thread threads[threads_count]; for (auto& t: threads) { t = thread([&stats]() { while (1) { auto [id, data] = co_await get_something(); stats.visit_or_emplace( id, [](auto& v){ ++v; }, 0 ); precess_stuff(id, data); } }); } for (auto& t: threads) { t.join(); } process_stats(events.make_unordered_map_view()); }
Write fraction | Thread count | boost::synchronized_value<std::unordered_map>, ms | std::concurrent_unordered_map prototype, ms | Sharded unordered map, ms | libcuckoo cuckoohash_map, ms | folly::ConcurrentHashMap, ms |
---|---|---|---|---|---|---|
1/1 | 1 | 358 | 324 | 413 | 332 | 373 |
1/1 | 2 | 2034 | 563 | 1347 | 444 | 724 |
1/1 | 4 | 4236 | 678 | 2047 | 619 | 1412 |
1/1 | 8 | 15234 | 872 | 3382 | 703 | 2760 |
1/1 | 16 | 32094 | 1248 | 5593 | 1370 | 5889 |
1/1 | 32 | 73701 | 2123 | 8669 | 1585 | 13838 |
1/1 | 64 | 165842 | 5624 | 12076 | 7264 | 26686 |
1/5 | 1 | 274 | 178 | 294 | 238 | 333 |
1/5 | 2 | 1157 | 169 | 805 | 195 | 347 |
1/5 | 4 | 2565 | 189 | 1075 | 222 | 380 |
1/5 | 8 | 7562 | 400 | 1550 | 262 | 620 |
1/5 | 16 | 17829 | 558 | 2730 | 488 | 1368 |
1/5 | 32 | 40953 | 622 | 4826 | 598 | 2771 |
1/5 | 64 | 91353 | 1902 | 6925 | 3423 | 5332 |
1/20 | 1 | 199 | 146 | 257 | 189 | 192 |
1/20 | 2 | 694 | 152 | 426 | 193 | 196 |
1/20 | 4 | 1425 | 216 | 622 | 260 | 271 |
1/20 | 8 | 4483 | 167 | 1212 | 279 | 264 |
1/20 | 16 | 12908 | 200 | 1341 | 292 | 337 |
1/20 | 32 | 30692 | 310 | 2219 | 407 | 620 |
1/20 | 64 | 70597 | 703 | 3266 | 2795 | 1535 |