1. Motivation and Scope
[N3657] merged into C++14 IS introduced heterogeneous lookup support for ordered associative containers
(
,
, etc) to the C++ Standard Library. Authors of that document pointed that
the requirement to construct (either implicitly or explicitly) the object of
to do the lookup
may be really expensive.
Unordered containers still lack support for such functionality and users are often hit by that performance problem.
1.1. Performance related concerns
Consider such use case:
std :: unordered_map < std :: string , int > map = /* ... */ ; auto it1 = map . find ( "abc" ); auto it2 = map . find ( "def" sv );
In C++17 above code will construct
temporary and then will compare it with container’s
elements to find the key. There is no implementation-specific reason to prevent lookup by an arbitrary
key type
, as long as
for any key
in the map, if
.
1.2. Design related concerns
Another motivating case is mentioned in [N3573]. Consider:
Whilst it’s possible to insert
into the set, there are no means to erase or test for
membership, as that would involve constructing two
to the same resource.
In such a case C++ developer is forced to either:
-
Weaken the design and not use smart pointers for memory ownership management which may result int stability or security issues.
-
Provide custom stateful (memory overhead) deleter that only optionally destroys the managed resource as suggested by [STACKOVERFLOW-1]:
class opt_out_deleter { bool delete_ ; public : explicit opt_out_deleter ( bool do_delete = true) : delete_ { do_delete } {} template < typename T > void operator ()( T * p ) const { if ( delete_ ) delete p ; } }; template < typename T > using set_unique_ptr = std :: unique_ptr < T , opt_out_deleter > ; template < typename T > set_unique_ptr < T > make_find_ptr ( T * raw ) { return set_unique_ptr < T > { raw , opt_out_deleter { false}}; } set_unique_ptr set = /* ... */ ; auto it = set . find ( make_find_ptr ( raw ));
-
Use
instead which again results in memory overhead.std :: unordered_map < T * , std :: unique_ptr < T >>
The similar code may also have a different side effect. Let’s consider:
struct my_data { size_t i ; std :: array < char , 256 > data ; explicit my_data ( size_t i_ ) : i { i_ } { std :: iota ( begin ( data ), end ( data ), 0 ); } }; struct my_data_equal { bool operator ()( const std :: unique_ptr < my_data >& l , const std :: unique_ptr < my_data >& r ) const { return l -> i == r -> i ; } }; struct my_data_hash { size_t operator ()( const std :: unique_ptr < my_data >& v ) const { return std :: hash < size_t > {}( v -> i ); } }; using my_set = std :: unordered_set < std :: unique_ptr < my_data > , my_data_hash , my_data_equal > ; my_set set = /* ... */ ; auto it = set . find ( std :: make_unique < my_data > ( 1 ));
This case not only introduces a dynamic memory allocation related performance hit on every lookup but also messes up with nicely defined ownership strategy.
2. Prior Work
[N3573] tried to address this issue. While the motivation described in that paper sounds reasonable the proposed solution goes too far and may cause problems. See §4 Design Decisions for more details.
3. Impact On The Standard
This proposal modifies the unordered associative containers in
and
by
overloading the lookup member functions with member function templates.
There are no language changes.
Almost all existing C++17 code is unaffected because new member functions are disabled from overload
resolution process unless
template parameter has
property. That is not the case
for the code created before this proposal.
4. Design Decisions
4.1. Heterogeneous hash function object
[N3573] paper suggests adding
namespace std { template < typename T = void > struct hash ; template <> struct hash < void > { template < typename T > std :: size_t operator ()( T && t ) { return std :: hash < typename std :: decay < T >:: type > ()( std :: forward < T > ( t )); } }; }
While that could be useful and compatible with changes introduced for many operations in [N3421], there is too big chance of two types being equality-comparable but having incompatible hashes.
Following issue was pointed out in the [REFLECTOR-1].
For example, under gcc 7.2.0,
std :: hash < long > {}( - 1L ) == 18446744073709551615ULL std :: hash < double > {}( - 1.0 ) == 11078049357879903929ULL
which makes following code fail
std :: unordered_set < double , std :: hash <> , std :: equal_to <>> s ; s . insert ( - 1L ); // Internally converts -1L to -1.0 assert ( s . find ( - 1L ) != s . end ()); // Fails: calls hash<long>(-1L) and gets the wrong bucket
Note that under C++17 rules that code succeeds, because
also converts its parameter to
before hashing.
This proposal intentionally does not suggest standardizing heterogeneous hash function object
. Doing that might be tempting but it requires more investigations and
can be always added via future proposals.
4.2. Additional parameters in lookup member functions overloads
[N3573] also proposes adding additional parameters to lookup functions so the users may provide different hash and equality comparison functor objects for each member function call.
template < typename T , typename Hash = std :: hash <> , typename Eq = std :: equal_to <>> iterator find ( T t , Hash h = Hash (), Eq e = Eq ()); template < typename T , typename Hash = std :: hash <> , typename Eq = std :: equal_to <>> const_iterator find ( T t , Hash h = Hash (), Eq e = Eq ()) const ;
That is not consistent with the current interface of ordered associative containers and therefore it is not proposed by this paper. If such functionality is considered useful it can be added in the future by other paper both for ordered and unordered associative containers.
4.3. Lookup member functions template overloads
For consistency reasons this paper proposes heterogeneous lookup for unordered associative containers
should be provided by the similar means as it is the case for ordered ones. Containers will only
change their interface when hash functor will define nested tag type called
that specifies transparent equality comparator type to be used by the container instead of a type
provided (or default type) for
template parameter.
The container will fail to compile (with proper diagnostics applied) when:
-
equality comparator type provided by the
is not transparent (does not providehasher :: transparent_key_equal
tag type)is_transparent -
container’s template argument is neitherPred -
the default type of that template parameter (namely
)equal_to < Key > -
the same type as provided by the hasher via
tagtransparent_key_equal
-
member type of the container will specify either:
-
the type provided by the
template argument of the container (or its default type) in case the heterogeneous lookup is disabledPred -
the type provided by the
tag of the hash functor object otherwisetransparent_key_equal
Note: Changing the specification of the default type in container’s template parameters would cause the ABI break, therefore, it is not suggested by that proposal.
By providing explicit tag
in the hash functor object, the user explicitly states that
the intention of this type is to provide coherent and interchangeable hash values for all the types
supported by the functor’s call operators.
Concerns raised in §1 Motivation and Scope are addressed by this proposal in the following way:
struct string_hash { using transparent_key_equal = std :: equal_to <> ; // Pred to use using hash_type = std :: hash < std :: string_view > ; // just a helper local type size_t operator ()( std :: string_view txt ) const { return hash_type {}( txt ); } size_t operator ()( const std :: string & txt ) const { return hash_type {}( txt ); } size_t operator ()( const char * txt ) const { return hash_type {}( txt ); } }; std :: unordered_map < std :: string , int , string_hash > map = /* ... */ ; map . find ( "abc" ); map . find ( "def" sv );
Note that in the above example the 4th template argument (
) is intentionally skipped and
will be overwritten with the type provided by the
.
In case the user needs to provide custom
type the
arguments needs to match the type
provided by
:
std :: unordered_map < std :: string , int , string_hash , string_hash :: transparent_key_equal , std :: allocator < std :: pair < const std :: string , int >>> map = /* ... */ ;
To find more details on how to address all code examples provided in this paper please refer to §7 Implementation Experience.
5. Proposed Wording
The proposed changes are relative to the working draft of the standard as of [n4762].
Modify 21.2.7 [unord.req] paragraph 5 as follows:
Two valuesand
k1
k2 of type Keyare considered equivalent if the container’s key equality predicateis valid and returns
pred ( k1 , k2 ) true
when passed those values. Ifand
k1 are equivalent, the container’s hash function shall return the same value for both. ...
k2
Modify 21.2.7 [unord.req] paragraph 11 as follows:
In Table 91:denotes an unordered associative container class,
X denotes a value of type
a ,
X denotes a value of a type with nodes compatible with type
a2 (Table 89),
X denotes a possibly
b value of type
const ,
X denotes a value of type
a_uniq when
X supports unique keys,
X denotes a value of type
a_eq when
X supports equivalent keys,
X denotes a possibly
a_tran value of type
const when the qualified-id
X is valid and denotes a type (12.9.2),
X :: hasher :: transparent_key_equal and
i denote input iterators that refer to
j ,
value_type denotes a valid range,
[ i , j ) and
p denote valid constant iterators to
q2 ,
a and
q denote valid dereferenceable constant iterators to
q1 ,
a denotes a valid dereferenceable iterator to
r ,
a denotes a valid range in
[ q1 , q2 ) ,
a denotes a value of type
il ,
initializer_list < value_type > denotes a value of type
t ,
X :: value_type denotes a value of type
k ,
key_type denotes a possibly
hf value of type
const ,
hasher denotes a possibly
eq value of type
const ,
key_equal is a value such that (1)
ke with
eq ( r1 , ke ) == eq ( ke , r1 ) the key value of
r1 and
e in
e , (2)
a_tran if
hf ( r1 ) == hf ( ke ) is
eq ( r1 , ke ) true
, and (3)where
( eq ( r1 , ke ) && eq ( r1 , r2 )) == eq ( r2 , ke ) is the key of an element in
r2 ,
a_tran denotes a value of type
n ,
size_type denotes a value of type
z , and
float denotes a non-const rvalue of type
nh .
X :: node_type
Modify table 72 in section 21.2.7 [unord.req] as follows:
Expression Return type Assertion/note pre-/post-condition Complexity X::key_equal Predif such a qualified-id is valid and denotes a type (12.9.2); otherwise,
Hash :: transparent_key_equal .
Pred Requires: is
key_equal .
CopyConstructible shall be a binary predicate that takes two arguments of type
key_equal .
Key is an equivalence relation.
key_equal compile time ...
b . find ( k ) ;
iterator for const
const_iterator .
b Returns an iterator pointing to an element with key equivalent to , or
k if no such element exists.
b . end () Average case O(1), worst case O( ).
b . size ()
a_tran . find ( ke ) ;
iterator for const
const_iterator .
a_tran Returns an iterator pointing to an element with key equivalent to , or
ke if no such element exists.
a_tran . end () Average case O(1), worst case O( ).
a_tran . size ()
b . count ( k )
size_type Returns the number of elements with key equivalent to .
k Average case O( ), worst case O(
b . count ( k ) ).
b . size ()
a_tran . count ( ke )
size_type Returns the number of elements with key equivalent to .
ke Average case O( ), worst case O(
a_tran . count ( ke ) ).
a_tran . size ()
b . contains ( k ) bool Equivalent to
b . find ( k ) != b . end () Average case O(1), worst case O( )
b . size ()
a_tran . contains ( ke ) bool Equivalent to
a_tran . find ( ke ) != a_tran . end () Average case O(1), worst case O( )
a_tran . size ()
b . equal_range ( k ) ;
pair < iterator , iterator > for const
pair < const_iterator , const_iterator > .
b Returns a range containing all elements with keys equivalent to . Returns
k if no such elements exist.
make_pair ( b . end (), b . end ()) Average case O( ), worst case O(
b . count ( k ) ).
b . size ()
a_tran . equal_range ( ke ) ;
pair < iterator , iterator > for const
pair < const_iterator , const_iterator > .
a_tran Returns a range containing all elements with keys equivalent to . Returns
ke if no such elements exist.
make_pair ( a_tran . end (), a_tran . end ()) Average case O( ), worst case O(
a_tran . count ( ke ) ).
a_tran . size ()
Add new paragraphs (18, 19) in 21.2.7 [unord.req]:
If the qualified-idis valid and denotes a type (12.9.2), then the program is ill-formed if either:
Hash :: transparent_key_equal
qualified-id
is not valid or does not denote a type, or
Hash :: transparent_key_equal :: is_transparent
is a different type than
Pred or
equal_to < Key > .
Hash :: transparent_key_equal
The member function templates,
find ,
count , and
equal_range shall not participate in overload resolution unless the qualified-id
contains is valid and denotes a type (12.9.2).
Hash :: transparent_key_equal
Modify 21.5.4.1 [unord.map.overview] paragraph 3 as follows:
namespace std { template < class Key , class T , class Hash = hash < Key > , class Pred = equal_to < Key > , class Allocator = allocator < pair < const Key , T >>> class unordered_map { public : // types using key_type = Key ; using mapped_type = T ; using value_type = pair < const Key , T > ; using hasher = Hash ; using key_equal = Pred ; using key_equal = see [ unord . req ] ; using allocator_type = Allocator ;
// map operations: iterator find ( const key_type & k ); const_iterator find ( const key_type & k ) const ; template < class K > iterator find ( const K & k ); template < class K > const_iterator find ( const K & k ) const ; size_type count ( const key_type & k ) const ; template < class K > size_type count ( const K & k ) const ; bool contains ( const key_type & k ) const ; template < class K > bool contains ( const K & k ) const ; pair < iterator , iterator > equal_range ( const key_type & k ); pair < const_iterator , const_iterator > equal_range ( const key_type & k ) const ; template < class K > pair < iterator , iterator > equal_range ( const K & k ); template < class K > pair < const_iterator , const_iterator > equal_range ( const K & k ) const ;
In 21.5.5.1 [unord.multimap.overview] add:
namespace std { template < class Key , class T , class Hash = hash < Key > , class Pred = equal_to < Key > , class Allocator = allocator < pair < const Key , T >>> class unordered_multimap { public : // types using key_type = Key ; using mapped_type = T ; using value_type = pair < const Key , T > ; using hasher = Hash ; using key_equal = Pred ; using key_equal = see [ unord . req ] ; using allocator_type = Allocator ;
// map operations: iterator find ( const key_type & k ); const_iterator find ( const key_type & k ) const ; template < class K > iterator find ( const K & k ); template < class K > const_iterator find ( const K & k ) const ; size_type count ( const key_type & k ) const ; template < class K > size_type count ( const K & k ) const ; bool contains ( const key_type & k ) const ; template < class K > bool contains ( const K & k ) const ; pair < iterator , iterator > equal_range ( const key_type & k ); pair < const_iterator , const_iterator > equal_range ( const key_type & k ) const ; template < class K > pair < iterator , iterator > equal_range ( const K & k ); template < class K > pair < const_iterator , const_iterator > equal_range ( const K & k ) const ;
In 21.5.6.1 [unord.set.overview] add:
namespace std { template < class Key , class Hash = hash < Key > , class Pred = equal_to < Key > , class Allocator = allocator < pair < const Key , T >>> class unordered_set { public : // types using key_type = Key ; using value_type = Key ; using hasher = Hash ; using key_equal = Pred ; using key_equal = see [ unord . req ] ; using allocator_type = Allocator ;
// set operations: iterator find ( const key_type & k ); const_iterator find ( const key_type & k ) const ; template < class K > iterator find ( const K & k ); template < class K > const_iterator find ( const K & k ) const ; size_type count ( const key_type & k ) const ; template < class K > size_type count ( const K & k ) const ; bool contains ( const key_type & k ) const ; template < class K > bool contains ( const K & k ) const ; pair < iterator , iterator > equal_range ( const key_type & k ); pair < const_iterator , const_iterator > equal_range ( const key_type & k ) const ; template < class K > pair < iterator , iterator > equal_range ( const K & k ); template < class K > pair < const_iterator , const_iterator > equal_range ( const K & k ) const ;
In 21.5.7.1 [unord.multiset.overview] add:
namespace std { template < class Key , class Hash = hash < Key > , class Pred = equal_to < Key > , class Allocator = allocator < pair < const Key , T >>> class unordered_multiset { public : // types using key_type = Key ; using value_type = Key ; using hasher = Hash ; using key_equal = Pred ; using key_equal = see [ unord . req ] ; using allocator_type = Allocator ;
// set operations: iterator find ( const key_type & k ); const_iterator find ( const key_type & k ) const ; template < class K > iterator find ( const K & k ); template < class K > const_iterator find ( const K & k ) const ; size_type count ( const key_type & k ) const ; template < class K > size_type count ( const K & k ) const ; bool contains ( const key_type & k ) const ; template < class K > bool contains ( const K & k ) const ; pair < iterator , iterator > equal_range ( const key_type & k ); pair < const_iterator , const_iterator > equal_range ( const key_type & k ) const ; template < class K > pair < iterator , iterator > equal_range ( const K & k ); template < class K > pair < const_iterator , const_iterator > equal_range ( const K & k ) const ;
6. Feature Testing
Add the following row to a Table 35 in 16.3.1 [support.limits.general] paragraph 3:
Macro name | Value | Header(s) |
---|---|---|
... | ||
__cpp_lib_generic_associative_lookup | 201304L | <map> <set> |
__cpp_lib_generic_unordered_lookup | <unordered_map> <unordered_set> |
7. Implementation Experience
Changes related to this proposal as well as answers to all of the code examples provided in this paper are partially implemented in GitHub repo against libc++ 5.0.0.
Simple performance tests provided there proved more than:
-
20% performance gain for short text (SSO used in
temporary) in EXAMPLE 1std :: string -
35% performance gain for long text (dynamic memory allocation in
temporary) in EXAMPLE 1std :: string -
85% performance gain in EXAMPLE 3
8. Possible Future Extensions
§4.1 Heterogeneous hash function object and §4.2 Additional parameters in lookup member functions overloads are not proposed by this paper but can be explored and proposed in the future.
9. Revision History
9.1. r2 ➡ r3 [diff]
-
Rebased to [n4762]
-
Wording updated according to the LWG feedback
-
Feature test macro name changed
9.2. r1 ➡ r2 [diff]
-
Added support for
member functions introduced by [P0458r2]contains ()
9.3. r0 ➡ r1 [diff]
The paper was reviewed by LEWG at the 2018 Jacksonville meeting and resulted with the following straw polls
-
Do we want encourage to do more work in that subject? Unanimous consent.
-
Put the comparator in the hash (and forbid specifying it as an explicit 4th template parameter).
| SF | F | N | A | SA | | 6 | 8 | 2 | 0 | 2 |
r1 changes the way the transparent equality comparator is provided to the class template. Instead of depending on the user to do the right thing in providing both hasher and comparator that are transitive and compatible with each other, now the design forces hasher to provide compatible comparator.
10. Acknowledgements
Thanks to Casey Carter for initial review of this proposal and help with wording issues.
Special thanks and recognition goes to Epam Systems for supporting my membership in the ISO C++ Committee and the production of this proposal.