map
and
unordered_map
Document #: | P3091R0 |
Date: | 2024-02-06 10:36 EST |
Project: | Programming Language C++ |
Audience: |
LEWGI |
Reply-to: |
Pablo Halpern <phalpern@halpernwightsoftware.com> |
The most convenient way to look up an element of a
map
orunordered_map
is to use the index operator, i.e.,theMap[key]
. This operator has a number of limitations, however: 1) it does not work forconst
containers, 2) it does not work if the mapped type is not default-constructible, 3) it modifies the container if the key is not found 4) a default-constructed value is not always the desired value when the key is not found. These limitations often require that the user resort to using thefind
member function, which returns an iterator that points to apair
and typically leads to more complex code having at least oneif
statement and/or duplicate lookup operations. This paper take inspiration from other languages, especially Python, and proposes three simple member functions (get
,get_ref
, andget_as
) that simplify lookups formap
andunordered_map
in situations where the index operator is suboptimal.
Just about every modern computer language has, as part of the
language or its standard library, one or more associative containers
that uniquely associates a key with a mapped value, variously called
map
,
hash_map
,
dictionary
, or something
similar. In C++ we have
std::map
,
std::unordered_map
, and
eventually (per [P0429R9] and [P2767R2])
std::flat_map
. The easy-to-write
and easy-to-read expression for getting a value for an associated key is
simply, theMap[key]
; in other
words, a mapped value is retrieved (and often set) using the index
operator, which returns a reference to the value associated with the
key. Unfortunately, the index operator in the C++ associative containers
has a number of shortcomings compared to many other languages:
const
containers.Consider a std::map
named
theMap
, that maps an integer key
to a floating-point value, modeling a sparse array of
double
. If we want to find the
largest of the double
values
mapped to the integer keys in the range 1 to 100, we might be tempted to
write the following loop:
double largest = -std::numeric_limits<double>::infinity();
for (int i = 1; i <= 100; ++i)
= std::max(largest, theMap[i]); largest
If theMap
is
const
, the loop will not
compile. If any of the keys in the range
[1, 100]
are absent from the
map, then theMap[i]
will return
a default-constructed double
having value 0.0, which might or might not be desirable, depending on
whether we want to treat missing values as truly having value 0.0 or
whether they should be ignored (or, equivalently, treated as having
value -INF). Finally if, before executing this loop,
theMap
contains only, say, 5
entries, at the end of the loop it will contain at least 100 entries,
most of which will have zero values.
There are alternatives that avoid all of these shortcomings, but the
alternatives are significantly less elegant, and therefore more error
prone. For example, the at
member function looks a lot like
operator[]
and has none of the
above shortcomings, but missing keys are handled by throwing exceptions,
making it impractical for situations other than when the key is almost
certain to exist. Moreover, a
try
-catch
block within a loop is rarely a clean way to structure code:
double largest = -std::numeric_limits<double>::infinity();
for (int i = 1; i <= 100; ++i)
{
try {
= std::max(largest, theMap.at(i));
largest } catch (const std::out_of_range&) { }
}
The above code would work with a
const
map, would ignore missing
elements (rather than treating them as zeros), and would not populate
the map with useless entries, but many would argue that the loop is
inelegant, at best. In most C++ implementations, it would be extremely
inefficient unless key misses are rare.
The other obvious alternative uses the
find
member function:
double largest = -std::numeric_limits<double>::infinity();
for (int i = 1; i <= 100; ++i)
{
auto iter = theMap.find(i);
if (iter != theMap.end())
= std::max(largest, iter->second);
largest }
This version of the loop is only slightly more verbose than our
starting point and avoids all of the issues, but the use of iterators,
the need to call two member functions
(find
and
end
), and having to extract the
second
element of the element
pair for what should be a simple operation increases the
cognitive load on both the programmer and the reader. Moreover, a
generic use of find
can yield a
subtle bug. Consider function template
f
:
template <class Key, class Value>
void f(const Key& k, const std::map<Key, Value>& aMap)
{
= some-default-obj-value-expression;
Value obj auto iter = aMap.find(k);
if (iter != aMap.end())
= iter->second;
obj // code that uses `obj` ...
}
The above code will not compile unless
Value
is copy-assignable. Worse,
unless tested with a non-assignable type, the bug could go undetected
for a long time. One fix would be initialize
obj
in a single expression:
= aMap.count(k) ? aMap.at(k) : some-default-obj-value-expression; Value obj
While correct, this solution involves two lookup operations: One for
the count
and one for the
at
. A better fix requires a bit
more code:
auto iter = aMap.find(k);
= iter != aMap.end() ? iter->second : some-default-obj-value-expression; Value obj
Note that the last solution again involves
iterator
,
pair
, and a conditional
expression, which is a far cry from the simplicity of
operator[]
.
What’s desired is a simple member function that, given a key, returns
the mapped value if the key exists and a specified alternative
if the value does not exist. Inspired by a similar member of Python
dictionaries, I propose a get
member function and related
get_ref
and
get_as<T>
member functions
for all C++ dictionary-like associative containers. A slightly
simplified version of set of the proposed prototypes is:
// Return by value
template <class... Args>
(const key_type& key, Args&&... args) const;
mapped_type get
// Return by reference
template <class Arg>
<mapped_type&, Arg&> get_ref(const key_type& key, Arg& ref);
common_reference_t
template <class Arg>
<const mapped_type&, Arg&> get_ref(const key_type& key, Arg& ref) const;
common_reference_t
// Return as a specific type
template <class R, class... Args>
(const key_type& key, Args&&... args);
R get_as
template <class R, class... Args>
(const key_type& key, Args&&... args) const; R get_as
In each case, if key
is
found, the corresponding mapped value is returned, otherwise, an
alternative return value is constructed from the remaining arguments.
The differences among the proposed member functions are in the way that
the return types are determined.
get
member functionThe proposed get
member is
the simplest to use and is suitable for types having relatively
inexpensive copy constructors. If the key is found, the return value is
a copy of the mapped value, otherwise the return value is
constructed from args
. If the
key is not found and args
is an
empty pack, then the return value is default constructed.
Returning by value is not as expensive as it once was because the
materialization rules (formerly copy elision rules) mean that
fewer temporaries are created. Moreover, returning by value allows the
alternative value to be specified as a prvalue such as the literal
-1
or
nullptr
.
get_ref
member functionSometimes returning a reference to the mapped item is desirable,
either for efficiency (to avoid an expensive copy constructor) or
functionality (if you want to modify the item or inspect its address).
The get_ref
member provides the
necessary functionality. Consider an example where we modify mapped
values through the reference returned from
get_ref
:
::map<std::string, int> theMap;
std// ...
// Increment the entries matching `names`, but only if they are already in `theMap`.
for (const auto& name : names) {
int temp = 0; // Value is irrelevant
++theMap.get_ref(name, temp); // increment through reference
// Possibly-modified value of `temp` is discarded here.
}
Note that the second argument to
get_ref
must be a single lvalue,
in this case a local variable that will be discarded whether or not it
is modified.
The function’s interface is designed to avoid accidentally binding an
rvalue to a const
reference
argument, even when the map is
const
. For example, the
temporary object created when converting a string literal argument to
std::string
would go out of
scope before ref
goes out of
scope (lifetime extension is not transitive). The interface is designed
so that this error will be caught at compile time:
void f(const std::map<int, std::string>& theMap) {
const std::string& ref = theMap.get_ref(0, "zero"); // ERROR: temporary `std::string("zero")`
// ...
}
Finally, the return type for
get_ref
is the common
reference type between
mapped_type&
(or
const mapped_type&
) and the
reference passed as the second argument. This definition makes it easy
to generate a meaningful result when mixing cv-qualifications or when
mixing base-class references with derived-class references in a single
call:
void g(const std::map<int, int>& theMap)
{
// ...
int alt = 0;
auto& ref = theMap.get_ref(key, alt); // `ref` has type `const int&`
// ...
}
The result of get_ref
is
const
if either or both of the
map or alternative reference are
const
. Thus
ref
is
const
in the example above
because theMap
is a
const
lvalue, even though
alt
is non-const. A similar
normalization occurs when mixing base-class and derived-class
references:
::map<int, Derived> theMap;
std// ...
{ ... };
Base altauto& ref = theMap.get_ref(key, alt); // `ref` has type `Base&`
get_as
member functionThe get_as
member allows the
user to specify the desired return type. It is useful when the desired
type is convertible from the mapped type and where a more
efficient construction is possible for the alternative value. A
ubiquitous example is
std::string_view
, which can be
constructed efficiently from a
std::string
or a
char
array. The
get_as
member allows the
mapped_type
to be
std::string
and the alternative
value to be char[]
without
generating extra temporary values:
::map<int, std::string> theMap;
std// ...
::string_view sv = theMap.get_as<std::string_view>(key, "none"); std
The above example creates the resulting
std::string_view
from either the
std::string
stored in the map or
the const char[5]
passed as the
alternative value, without creating an intervening
std::string
. It would be
inefficient to convert the char
array to std::string
before
converting it to a
std::string_view
and it would
create lifetime issues:
// BUG: Dangling reference converting returned temporary `string` to `string_view`
::string_view sv = theMap.get(key, "none");
std
// ERROR: cannot bind temporary `string` to `string&` parameter
::string_view sv = theMap.get_ref(key, "none"); std
If the specified return type is a reference type, then
get_as
behaves much like
get_ref
except that the return
type is exactly the specified type, rather than the deduced common
reference type between the map and the alternative argument:
::map<int, int> theMap;
stdconst int zero = 0;
auto& v1 = theMap.get_as<int&>(0, zero); // ERROR: `const` mismatch
auto& v2 = theMap.get_as<const int&>(0, zero); // OK
The following tables shows how operations are simplified using the
proposed new member functions. In these examples
K
,
T
, and
U
are object types;
m
is an object of (possibly
cv-)
std::map<K, T>
,
unordered_map<K, T>
or
flat_map<K, T>
,
k
is a value compatible with
K
,
v
is an lvalue of type (possibly
cv-) T
, and
a1..aN
are arguments that can
used to construct an object of type
T
.
Before
|
After/Alternative
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The name get
is borrowed from
the Python dictionary member of the same name. Other names considered
were lookup
,
lookup_or
, and
get_or
; the latter two follow
the precedent of
std::optional<T>::value_or
.
The get
name was selected for
conciseness and familiarity.
optional
or
expected
Some of the deficiencies of
operator[]
could be addressed by
adding a member function that returns an
optional<mapped_type>
object:
// ALTERNATIVE (not proposed)
auto get(const key_type& k) const -> optional<mapped_type>;
Better yet, if [P2988R1] is adopted,
it could return an
optional<T&>
:
// ALTERNATIVE (not proposed) with `optional<T&>` per [P2988R1]
auto get(const key_type& k) -> optional<mapped_type&>;
auto get(const key_type& k) const -> optional<const mapped_type&>;
While this approach solves the issues listed in the motivation
section of this paper, typical use would require an
if
statement (making it almost
as verbose to use as find
), or
the use of value_or
, making it
syntactically similar to, but more complicated, than the proposed
get
member.
Returning T
(proposed)
|
Returning
optional<T> (not
proposed)
|
---|---|
|
|
In the second column, a call to
get
, followed by a call to
value_or
on the returned
optional
causes the mapped value
to be copied twice: first into the
optional
value, then again into
the temporary returned by
value_or
. Even if
optional<T&>
is
adopted, per [P2988R1], there
would be no equivalent to
get_ref
(because [P2988R1] currently has
value_or
return by value, even
for a reference value_type
) or
get_as
. Ultimately, it was
decided that the get
interface
returning optional
does not
provide a substantial advantage over the proposed interface, especially
in the absence of optional references.
get
and
get_ref
into a single functionIf the Args
parameter pack
happens to consist of a single lvalue reference that is compatible with
mapped_type&
, it would be
possible to return a reference instead of a value, obviating
get_ref
as a separate function.
Such a “simplification” was seen as confusing, especially to a human
reader, who must analyze the code carefully to determine whether
non-const operations on the returned entity might change a value in the
original map.
get_as<mapped_type&>
instead of get_ref
During development of the reference implementation for this proposal,
get_as
was seen as sufficient
for returning a reference to a member object. Trying to use the library
in usage examples turned up some disadvantages, however:
const
qualifications were
added.const
qualifications correct when using
non-const
maps with
const
alternative values.The introduction of get_ref
made the code easier to read and write, which is the point of this
entire paper.
I considered adding an rvalue-reference overload for
get_ref
that would accept rvalue
reference alternative and return an rvalue reference. None of the other
element lookup functions are overloaded in this way, however (e.g.,
std::move(m)[k]
does not return
an rvalue reference), so an rvalue overload would be novel and
inconsistent. If, in the future,
operator[]
and
at
were to be enhanced with
rvalue overloads, then get_ref
should be, as well.
It is possible to provide the functionality described in this paper
using namespace-scope functions, without modifying
std::map
and
std::unordered_map
:
template <class Map, class K, class... Args>
typename Map::mapped_type get(Map& m, const K& k, Args&&... args);
auto x = get(m, k, a1...aN);
One benefit to this approach is that the
get
template can handle any
dictionary type that follows the general formula of
std::map
(i.e., having a
mapped_type
, and a
find
that returns an
iterator
pointing to a key-value
pair
). Such an approach has
disadvantages, however:
get
outside of the map
interface.In the end, it seemed wise to stick to member functions.
An implementation, with tests and usage examples, can be found at https://github.com/phalpern/WG21-halpern/tree/main/P3091-map_lookup.
Changes are relative to the 2023-12-18 working draft, [N4971].
In 24.4.4.1
[map.overview]1/2, insert the
get
,
get_ref
, and
get_as
element-access
members:
//
24.4.4.3 [map.access], element access
mapped_type& operator[](const key_type& x);
mapped_type& operator[](key_type&& x);
template<class K> mapped_type& operator[](K&& x);
mapped_type& at(const key_type& x);
const mapped_type& at(const key_type& x) const;
template<class K> mapped_type& at(const K& x);
template<class K> const mapped_type& at(const K& x) const;
template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
Wording question: In my reference implementation, I
declared all of these functions as
[[nodiscard]]
, but the standard
currently seems to use that attribute very parsimoniously, mostly (but
not exclusively) in situations where a resource might leak or where a
function returning a result could be confused with a function that
modifies its input. These functions fit neither criteria, should they be
[[nodiscard]]
or not?
At the end of 24.4.4.3 [map.access], add these descriptions:
template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
Constraints: For the second overload, the qualified-id
Compare::is_transparent
is valid and denotes a type.
Returns:
get_as<mapped_type>(x, std::forward<Args>(args)...)
.
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
Constraints: For the third and fourth overloads, the qualified-id
Compare::is_transparent
is valid and denotes a type.
Mandates:
- The expression
find(x)
is well-formed.- The result type is a reference type.
find(x)->second
andref
are both convertible to the result type.
Preconditions: The expression
find(x)
has well-defined behavior.
Returns:
find(x)->second
iffind(x) == end()
isfalse
, otherwiseref
.
Throws: nothing
Complexity: Logarithmic
Editorial comment: The last two mandates
are expressed in English rather than code because to express them in
code would involve a hard-to-read construction like is_convertible_v<decltype((find(x)->second)), common_reference_t<decltype((find(x)->second)), U&>
.
This could be simplified using some “Let CV_MAPPED_TYPE be…”
language if LWG prefers it.
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
Constraints: For the third and fourth overloads, the qualified-id
Compare::is_transparent
is valid and denotes a type.
Mandates:
- The expression
find(x)
is well-formed.find(x)->second
is convertible toR
.is_constructible<R, Args...>
istrue
.- If
R
is a reference type, thensizeof...(Args)
is1
andis_lvalue_reference_v<Args...>
istrue
.
Preconditions: The expression
find(x)
has well-defined behavior.
Returns:
find(x)->second
iffind(x) == end()
isfalse
, otherwiseR(std::forward<Args>(args)...)
.
Throws: nothing unless the result object’s constructor throws.
Complexity: Logarithmic
In 24.5.4.1
[unord.map.overview]/3,
insert the get
,
get_ref
, and
get_as
element-access
members:
//
24.5.4.3 [unord.map.elem], element access
mapped_type& operator[](const key_type& x);
mapped_type& operator[](key_type&& x);
template<class K> mapped_type& operator[](K&& x);
mapped_type& at(const key_type& x);
const mapped_type& at(const key_type& x) const;
template<class K> mapped_type& at(const K& x);
template<class K> const mapped_type& at(const K& x) const;
template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
At the end of 24.5.4.3 [unord.map.elem], add these descriptions:
template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
Constraints: For the second overload, the qualified-ids
Hash::is_transparent
andPred::is_transparent
are valid and denote types.
Returns:
get_as<mapped_type>(x, std::forward<Args>(args)...)
.
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
Constraints: For the third and fourth overloads, the qualified-ids
Hash::is_transparent
andPred::is_transparent
are valid and denote types.
Mandates:
- The expression
find(x)
is well-formed.- The result type is a reference type.
find(x)->second
andref
are both convertible to the result type.
Preconditions: The expression
find(x)
has well-defined behavior.
Returns:
find(x)->second
iffind(x) == end()
isfalse
, otherwiseref
.
Throws: nothing
Complexity: Logarithmic
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
Constraints: For the third and fourth overloads, the qualified-ids
Hash::is_transparent
andPred::is_transparent
are valid and denote types.
Mandates:
- The expression
find(x)
is well-formed.find(x)->second
is convertible toR
.is_constructible<R, Args...>
istrue
.- If
R
is a reference type, thensizeof...(Args)
is1
andis_lvalue_reference_v<Args...>
istrue
.
Preconditions: The expression
find(x)
has well-defined behavior.
Returns:
find(x)->second
iffind(x) == end()
isfalse
, otherwiseR(std::forward<Args>(args)...)
.
Throws: nothing unless the result object’s constructor throws.
Complexity: Logarithmic
All citations to the Standard are to working draft N4971 unless otherwise specified.↩︎