map and
unordered_map| Document #: | P3091R0 |
| Date: | 2024-02-06 10:36 EST |
| Project: | Programming Language C++ |
| Audience: |
LEWGI |
| Reply-to: |
Pablo Halpern <phalpern@halpernwightsoftware.com> |
The most convenient way to look up an element of a
maporunordered_mapis to use the index operator, i.e.,theMap[key]. This operator has a number of limitations, however: 1) it does not work forconstcontainers, 2) it does not work if the mapped type is not default-constructible, 3) it modifies the container if the key is not found 4) a default-constructed value is not always the desired value when the key is not found. These limitations often require that the user resort to using thefindmember function, which returns an iterator that points to apairand typically leads to more complex code having at least oneifstatement and/or duplicate lookup operations. This paper take inspiration from other languages, especially Python, and proposes three simple member functions (get,get_ref, andget_as) that simplify lookups formapandunordered_mapin situations where the index operator is suboptimal.
Just about every modern computer language has, as part of the
language or its standard library, one or more associative containers
that uniquely associates a key with a mapped value, variously called
map,
hash_map,
dictionary, or something
similar. In C++ we have
std::map,
std::unordered_map, and
eventually (per [P0429R9] and [P2767R2])
std::flat_map. The easy-to-write
and easy-to-read expression for getting a value for an associated key is
simply, theMap[key]; in other
words, a mapped value is retrieved (and often set) using the index
operator, which returns a reference to the value associated with the
key. Unfortunately, the index operator in the C++ associative containers
has a number of shortcomings compared to many other languages:
const
containers.Consider a std::map named
theMap, that maps an integer key
to a floating-point value, modeling a sparse array of
double. If we want to find the
largest of the double values
mapped to the integer keys in the range 1 to 100, we might be tempted to
write the following loop:
double largest = -std::numeric_limits<double>::infinity();
for (int i = 1; i <= 100; ++i)
largest = std::max(largest, theMap[i]);If theMap is
const, the loop will not
compile. If any of the keys in the range
[1, 100] are absent from the
map, then theMap[i] will return
a default-constructed double
having value 0.0, which might or might not be desirable, depending on
whether we want to treat missing values as truly having value 0.0 or
whether they should be ignored (or, equivalently, treated as having
value -INF). Finally if, before executing this loop,
theMap contains only, say, 5
entries, at the end of the loop it will contain at least 100 entries,
most of which will have zero values.
There are alternatives that avoid all of these shortcomings, but the
alternatives are significantly less elegant, and therefore more error
prone. For example, the at
member function looks a lot like
operator[] and has none of the
above shortcomings, but missing keys are handled by throwing exceptions,
making it impractical for situations other than when the key is almost
certain to exist. Moreover, a
try-catch
block within a loop is rarely a clean way to structure code:
double largest = -std::numeric_limits<double>::infinity();
for (int i = 1; i <= 100; ++i)
{
try {
largest = std::max(largest, theMap.at(i));
} catch (const std::out_of_range&) { }
}The above code would work with a
const map, would ignore missing
elements (rather than treating them as zeros), and would not populate
the map with useless entries, but many would argue that the loop is
inelegant, at best. In most C++ implementations, it would be extremely
inefficient unless key misses are rare.
The other obvious alternative uses the
find member function:
double largest = -std::numeric_limits<double>::infinity();
for (int i = 1; i <= 100; ++i)
{
auto iter = theMap.find(i);
if (iter != theMap.end())
largest = std::max(largest, iter->second);
}This version of the loop is only slightly more verbose than our
starting point and avoids all of the issues, but the use of iterators,
the need to call two member functions
(find and
end), and having to extract the
second element of the element
pair for what should be a simple operation increases the
cognitive load on both the programmer and the reader. Moreover, a
generic use of find can yield a
subtle bug. Consider function template
f:
template <class Key, class Value>
void f(const Key& k, const std::map<Key, Value>& aMap)
{
Value obj = some-default-obj-value-expression;
auto iter = aMap.find(k);
if (iter != aMap.end())
obj = iter->second;
// code that uses `obj` ...
}The above code will not compile unless
Value is copy-assignable. Worse,
unless tested with a non-assignable type, the bug could go undetected
for a long time. One fix would be initialize
obj in a single expression:
Value obj = aMap.count(k) ? aMap.at(k) : some-default-obj-value-expression;While correct, this solution involves two lookup operations: One for
the count and one for the
at. A better fix requires a bit
more code:
auto iter = aMap.find(k);
Value obj = iter != aMap.end() ? iter->second : some-default-obj-value-expression;Note that the last solution again involves
iterator,
pair, and a conditional
expression, which is a far cry from the simplicity of
operator[].
What’s desired is a simple member function that, given a key, returns
the mapped value if the key exists and a specified alternative
if the value does not exist. Inspired by a similar member of Python
dictionaries, I propose a get
member function and related
get_ref and
get_as<T> member functions
for all C++ dictionary-like associative containers. A slightly
simplified version of set of the proposed prototypes is:
// Return by value
template <class... Args>
mapped_type get(const key_type& key, Args&&... args) const;
// Return by reference
template <class Arg>
common_reference_t<mapped_type&, Arg&> get_ref(const key_type& key, Arg& ref);
template <class Arg>
common_reference_t<const mapped_type&, Arg&> get_ref(const key_type& key, Arg& ref) const;
// Return as a specific type
template <class R, class... Args>
R get_as(const key_type& key, Args&&... args);
template <class R, class... Args>
R get_as(const key_type& key, Args&&... args) const;In each case, if key is
found, the corresponding mapped value is returned, otherwise, an
alternative return value is constructed from the remaining arguments.
The differences among the proposed member functions are in the way that
the return types are determined.
get member functionThe proposed get member is
the simplest to use and is suitable for types having relatively
inexpensive copy constructors. If the key is found, the return value is
a copy of the mapped value, otherwise the return value is
constructed from args. If the
key is not found and args is an
empty pack, then the return value is default constructed.
Returning by value is not as expensive as it once was because the
materialization rules (formerly copy elision rules) mean that
fewer temporaries are created. Moreover, returning by value allows the
alternative value to be specified as a prvalue such as the literal
-1 or
nullptr.
get_ref member functionSometimes returning a reference to the mapped item is desirable,
either for efficiency (to avoid an expensive copy constructor) or
functionality (if you want to modify the item or inspect its address).
The get_ref member provides the
necessary functionality. Consider an example where we modify mapped
values through the reference returned from
get_ref:
std::map<std::string, int> theMap;
// ...
// Increment the entries matching `names`, but only if they are already in `theMap`.
for (const auto& name : names) {
int temp = 0; // Value is irrelevant
++theMap.get_ref(name, temp); // increment through reference
// Possibly-modified value of `temp` is discarded here.
}Note that the second argument to
get_ref must be a single lvalue,
in this case a local variable that will be discarded whether or not it
is modified.
The function’s interface is designed to avoid accidentally binding an
rvalue to a const reference
argument, even when the map is
const. For example, the
temporary object created when converting a string literal argument to
std::string would go out of
scope before ref goes out of
scope (lifetime extension is not transitive). The interface is designed
so that this error will be caught at compile time:
void f(const std::map<int, std::string>& theMap) {
const std::string& ref = theMap.get_ref(0, "zero"); // ERROR: temporary `std::string("zero")`
// ...
}Finally, the return type for
get_ref is the common
reference type between
mapped_type& (or
const mapped_type&) and the
reference passed as the second argument. This definition makes it easy
to generate a meaningful result when mixing cv-qualifications or when
mixing base-class references with derived-class references in a single
call:
void g(const std::map<int, int>& theMap)
{
// ...
int alt = 0;
auto& ref = theMap.get_ref(key, alt); // `ref` has type `const int&`
// ...
}The result of get_ref is
const if either or both of the
map or alternative reference are
const. Thus
ref is
const in the example above
because theMap is a
const lvalue, even though
alt is non-const. A similar
normalization occurs when mixing base-class and derived-class
references:
std::map<int, Derived> theMap;
// ...
Base alt{ ... };
auto& ref = theMap.get_ref(key, alt); // `ref` has type `Base&`get_as member functionThe get_as member allows the
user to specify the desired return type. It is useful when the desired
type is convertible from the mapped type and where a more
efficient construction is possible for the alternative value. A
ubiquitous example is
std::string_view, which can be
constructed efficiently from a
std::string or a
char array. The
get_as member allows the
mapped_type to be
std::string and the alternative
value to be char[] without
generating extra temporary values:
std::map<int, std::string> theMap;
// ...
std::string_view sv = theMap.get_as<std::string_view>(key, "none");The above example creates the resulting
std::string_view from either the
std::string stored in the map or
the const char[5] passed as the
alternative value, without creating an intervening
std::string. It would be
inefficient to convert the char
array to std::string before
converting it to a
std::string_view and it would
create lifetime issues:
// BUG: Dangling reference converting returned temporary `string` to `string_view`
std::string_view sv = theMap.get(key, "none");
// ERROR: cannot bind temporary `string` to `string&` parameter
std::string_view sv = theMap.get_ref(key, "none");If the specified return type is a reference type, then
get_as behaves much like
get_ref except that the return
type is exactly the specified type, rather than the deduced common
reference type between the map and the alternative argument:
std::map<int, int> theMap;
const int zero = 0;
auto& v1 = theMap.get_as<int&>(0, zero); // ERROR: `const` mismatch
auto& v2 = theMap.get_as<const int&>(0, zero); // OKThe following tables shows how operations are simplified using the
proposed new member functions. In these examples
K,
T, and
U are object types;
m is an object of (possibly
cv-)
std::map<K, T>,
unordered_map<K, T> or
flat_map<K, T>,
k is a value compatible with
K,
v is an lvalue of type (possibly
cv-) T, and
a1..aN are arguments that can
used to construct an object of type
T.
Before
|
After/Alternative
|
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The name get is borrowed from
the Python dictionary member of the same name. Other names considered
were lookup,
lookup_or, and
get_or; the latter two follow
the precedent of
std::optional<T>::value_or.
The get name was selected for
conciseness and familiarity.
optional or
expectedSome of the deficiencies of
operator[] could be addressed by
adding a member function that returns an
optional<mapped_type>
object:
// ALTERNATIVE (not proposed)
auto get(const key_type& k) const -> optional<mapped_type>;Better yet, if [P2988R1] is adopted,
it could return an
optional<T&>:
// ALTERNATIVE (not proposed) with `optional<T&>` per [P2988R1]
auto get(const key_type& k) -> optional<mapped_type&>;
auto get(const key_type& k) const -> optional<const mapped_type&>;While this approach solves the issues listed in the motivation
section of this paper, typical use would require an
if statement (making it almost
as verbose to use as find), or
the use of value_or, making it
syntactically similar to, but more complicated, than the proposed
get member.
Returning T
(proposed)
|
Returning
optional<T> (not
proposed)
|
|---|---|
|
|
In the second column, a call to
get, followed by a call to
value_or on the returned
optional causes the mapped value
to be copied twice: first into the
optional value, then again into
the temporary returned by
value_or. Even if
optional<T&> is
adopted, per [P2988R1], there
would be no equivalent to
get_ref (because [P2988R1] currently has
value_or return by value, even
for a reference value_type) or
get_as. Ultimately, it was
decided that the get interface
returning optional does not
provide a substantial advantage over the proposed interface, especially
in the absence of optional references.
get and
get_ref into a single functionIf the Args parameter pack
happens to consist of a single lvalue reference that is compatible with
mapped_type&, it would be
possible to return a reference instead of a value, obviating
get_ref as a separate function.
Such a “simplification” was seen as confusing, especially to a human
reader, who must analyze the code carefully to determine whether
non-const operations on the returned entity might change a value in the
original map.
get_as<mapped_type&>
instead of get_refDuring development of the reference implementation for this proposal,
get_as was seen as sufficient
for returning a reference to a member object. Trying to use the library
in usage examples turned up some disadvantages, however:
const qualifications were
added.const
qualifications correct when using
non-const maps with
const alternative values.The introduction of get_ref
made the code easier to read and write, which is the point of this
entire paper.
I considered adding an rvalue-reference overload for
get_ref that would accept rvalue
reference alternative and return an rvalue reference. None of the other
element lookup functions are overloaded in this way, however (e.g.,
std::move(m)[k] does not return
an rvalue reference), so an rvalue overload would be novel and
inconsistent. If, in the future,
operator[] and
at were to be enhanced with
rvalue overloads, then get_ref
should be, as well.
It is possible to provide the functionality described in this paper
using namespace-scope functions, without modifying
std::map and
std::unordered_map:
template <class Map, class K, class... Args>
typename Map::mapped_type get(Map& m, const K& k, Args&&... args);
auto x = get(m, k, a1...aN);One benefit to this approach is that the
get template can handle any
dictionary type that follows the general formula of
std::map (i.e., having a
mapped_type, and a
find that returns an
iterator pointing to a key-value
pair). Such an approach has
disadvantages, however:
get outside of the map
interface.In the end, it seemed wise to stick to member functions.
An implementation, with tests and usage examples, can be found at https://github.com/phalpern/WG21-halpern/tree/main/P3091-map_lookup.
Changes are relative to the 2023-12-18 working draft, [N4971].
In 24.4.4.1
[map.overview]1/2, insert the
get,
get_ref, and
get_as element-access
members:
//24.4.4.3 [map.access], element access
mapped_type& operator[](const key_type& x);
mapped_type& operator[](key_type&& x);
template<class K> mapped_type& operator[](K&& x);
mapped_type& at(const key_type& x);
const mapped_type& at(const key_type& x) const;
template<class K> mapped_type& at(const K& x);
template<class K> const mapped_type& at(const K& x) const;template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
Wording question: In my reference implementation, I
declared all of these functions as
[[nodiscard]], but the standard
currently seems to use that attribute very parsimoniously, mostly (but
not exclusively) in situations where a resource might leak or where a
function returning a result could be confused with a function that
modifies its input. These functions fit neither criteria, should they be
[[nodiscard]] or not?
At the end of 24.4.4.3 [map.access], add these descriptions:
template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
Constraints: For the second overload, the qualified-id
Compare::is_transparentis valid and denotes a type.
Returns:
get_as<mapped_type>(x, std::forward<Args>(args)...).
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
Constraints: For the third and fourth overloads, the qualified-id
Compare::is_transparentis valid and denotes a type.
Mandates:
- The expression
find(x)is well-formed.- The result type is a reference type.
find(x)->secondandrefare both convertible to the result type.
Preconditions: The expression
find(x)has well-defined behavior.
Returns:
find(x)->secondiffind(x) == end()isfalse, otherwiseref.
Throws: nothing
Complexity: Logarithmic
Editorial comment: The last two mandates
are expressed in English rather than code because to express them in
code would involve a hard-to-read construction like is_convertible_v<decltype((find(x)->second)), common_reference_t<decltype((find(x)->second)), U&>.
This could be simplified using some “Let CV_MAPPED_TYPE be…”
language if LWG prefers it.
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
Constraints: For the third and fourth overloads, the qualified-id
Compare::is_transparentis valid and denotes a type.
Mandates:
- The expression
find(x)is well-formed.find(x)->secondis convertible toR.is_constructible<R, Args...>istrue.- If
Ris a reference type, thensizeof...(Args)is1andis_lvalue_reference_v<Args...>istrue.
Preconditions: The expression
find(x)has well-defined behavior.
Returns:
find(x)->secondiffind(x) == end()isfalse, otherwiseR(std::forward<Args>(args)...).
Throws: nothing unless the result object’s constructor throws.
Complexity: Logarithmic
In 24.5.4.1
[unord.map.overview]/3,
insert the get,
get_ref, and
get_as element-access
members:
//24.5.4.3 [unord.map.elem], element access
mapped_type& operator[](const key_type& x);
mapped_type& operator[](key_type&& x);
template<class K> mapped_type& operator[](K&& x);
mapped_type& at(const key_type& x);
const mapped_type& at(const key_type& x) const;
template<class K> mapped_type& at(const K& x);
template<class K> const mapped_type& at(const K& x) const;template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
At the end of 24.5.4.3 [unord.map.elem], add these descriptions:
template <class... Args> mapped_type get(const key_type& x, Args&&... args) const;
template <class K, class... Args> mapped_type get(const K& x, Args&&... args) const;
Constraints: For the second overload, the qualified-ids
Hash::is_transparentandPred::is_transparentare valid and denote types.
Returns:
get_as<mapped_type>(x, std::forward<Args>(args)...).
template <class U>
common_reference_t<mapped_type&, U&> get_ref(const key_type& x, U& ref);
template <class U>
common_reference_t<const mapped_type&, U&> get_ref(const key_type& x, U& ref) const;
template <class K, class U>
common_reference_t<mapped_type& , U&> get_ref(const K& x, U& ref);
template <class K, class U>
common_reference_t<const mapped_type&, U&> get_ref(const K& x, U& ref) const;
Constraints: For the third and fourth overloads, the qualified-ids
Hash::is_transparentandPred::is_transparentare valid and denote types.
Mandates:
- The expression
find(x)is well-formed.- The result type is a reference type.
find(x)->secondandrefare both convertible to the result type.
Preconditions: The expression
find(x)has well-defined behavior.
Returns:
find(x)->secondiffind(x) == end()isfalse, otherwiseref.
Throws: nothing
Complexity: Logarithmic
template <class R, class... Args> R get_as(const key_type& x, Args&&... args);
template <class R, class... Args> R get_as(const key_type& x, Args&&... args) const;
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args);
template <class R, class K, class... Args> R get_as(const K& x, Args&&... args) const;
Constraints: For the third and fourth overloads, the qualified-ids
Hash::is_transparentandPred::is_transparentare valid and denote types.
Mandates:
- The expression
find(x)is well-formed.find(x)->secondis convertible toR.is_constructible<R, Args...>istrue.- If
Ris a reference type, thensizeof...(Args)is1andis_lvalue_reference_v<Args...>istrue.
Preconditions: The expression
find(x)has well-defined behavior.
Returns:
find(x)->secondiffind(x) == end()isfalse, otherwiseR(std::forward<Args>(args)...).
Throws: nothing unless the result object’s constructor throws.
Complexity: Logarithmic
All citations to the Standard are to working draft N4971 unless otherwise specified.↩︎