While the C++ iterators and algorithms interface has proven remarkably successful, a number of short-comings have come to light over the decade since it was designed. Many of these were addressed by the concepts work occurring as part of the C++0x project, and with the passing of that feature, all the work done on iterators has been effectively lost. This paper sets out to resolve as many of the known design and documentation issues as possible, following the model laid out in the concepts work.
There are a number of LWG issues in this area. Those highlighted in the list below should change status as a result of this paper:
Number | Clause | Description | Status (before) | Status (after) |
408 | 24.2 [iterator.requirements] | Is vector<reverse_iterator<char*> > forbidden? | Open | NAD Editorial |
446 | 24.2 [iterator.requirements] | Iterator equality between different containers | Open | NAD Editorial |
1210 | 24.2 [iterator.requirements] | iterator reachability should not require a container | New | NAD Editorial |
1212 | 24.2 [iterator.requirements] | result of post-increment/decrement operator | New | NAD Editorial |
1185 | 24.2 [iterator.requirements] | iterator categories and output iterators | New | NAD Editorial |
485 | 24.2.2 [output.iterators] | output iterator insufficiently constrained | Ready | NAD Editorial |
476 | 24.2.3 [forward.iterators] | Forward Iterator implied mutability | NAD | Dup 1185 |
1311 | 24.2.3 [forward.iterators] | multi-pass property of Forward Iterator underspecified | New | NAD Editorial |
299 | 24.2.4 [bidirectional.iterators] | Incorrect return types for iterator dereference | Open | NAD Editorial |
458 | 24.2.5 [random.access.iterators] | unintended limitation for operator- | NAD | NAD |
1079 | 24.2.5 [random.access.iterators] | RandomAccessIterator's operator- has nonsensical effects clause | Open | NAD Editorial |
940 | 24.4.4 [iterator.operations] | std::distance | Ready | NAD Editorial |
1211 | 24.5.3.1 [move.iterator] | move iterators should be restricted as input iterators | Open | NAD Editorial |
The following issues are related to topics in this paper, but will not change status. They may be updated by other papers proceeding in parallel at the same meeting.
Number | Clause | Description | Status |
594 | 20.2.1 [utility.arg.requirements] | Disadvantages of defining Swappable in terms of CopyConstructible and Assignable | Open |
742 | 20.2.1 [utility.arg.requirements] | Enabling swap for proxy iterators | Open |
1213 | 24.2 [iterator.requirements] | Meaning of valid and singular iterator underspecified | New |
Note that several of the issues marked NAD were resolved as such on the basis that concepts would solve the problem, or remove the underlying wording. That is no longer the case, so we revisit those issues in this paper to see if there should be a solution as part of the 0x wording.
Here are several simple examples of types that either fail to be iterators in unexpected ways, or appear to offer more than they can actually deliver, while staying within the letter of the standard.
template<unsigned M, unsigned N> { struct iterator { unsigned position; iterator(unsigned val) : position{val} {} unsigned operator*() const { return position; } iterator& operator++() { ++position; return *this; } iterator& operator++(int) { return iterator{++position}; } bool operator==(iterator rhs) const { return position == rhs.position; } }; static iterator begin() { return iterator{M}; } static iterator end() { return iterator{N}; } };
This example is typical of a range of problematic iterators frequently referred to as proxy iterators. The key problem is that dereferencing the iterator does not return a true native reference to the underlying element. Examples in the library would include the move_iterator adaptor and the well-known vector<bool>::iterator.
Note that the proxy in the example is quite subtle, as it would be quite reasonable to return a const unsigned & from operator*, yet this would still be a proxy iterator. The reason is that each iterator is maintaining its own copy of position, rather than sharing a reference to the same element in the underlying sequence. Naturally, it is difficult to share elements that exist purely in the abstract, rather than physically in memory. The key to this problem is the following requirement on Forward Iterators:
"If a and b are both dereferenceable, then a == b if and only if *a and *b are the same object."
The following constant iterator allows iteration over an infinite sequence of values.
template<unsigned N> { struct iterator { static const unsigned value = N; iterator() {} const unsigned & operator*() const { return value; } iterator& operator++() { return *this; } iterator& operator++(int) { return *this; } bool operator==(iterator rhs) const { return true; } };
Note that this iterator seems fine at first, and does not suffer the proxy issue as all references truly refer to the same object. However, it has the unusual property that the distance between any two iterators is zero. This will be very surprising for consumers of Forward Iterators and the more capable categories, yet there is nothing restricting this to a mere Input Iterator.
Note that it would be very simple to add support for a distance property to this iterator by adding an additional size_t member to track position. It would seem a stretch to describe such a range as truly infinite though, if the maximum expressible distance is distinctly finite.
Consider the following iterator adaptor:
template<typename Iter> struct shared_iterator { shared_ptrposition; shared_iterator(Iter val) : position{new Iter{val}} {} unsigned operator*() const { return *position; } shared_iterator& operator++() { ++*position; return *this; } shared_iterator operator++(int) { shared_iterator result{*this}; ++*iterator; return result; } bool operator==(const shared_iterator& rhs) const { return *position == *rhs.position; } };
This example subverts the notion that a Forward Iterator is a multi-pass iterator by simply wrapping any other iterator in a shared_ptr, and deferring all operations to the shared iterator. All copies of this iterator share the same underlying base iterator, so incrementing any one will increment them all. Yet this adaptor still meets all the requirements of a conforming Forward Iterator. While iterating the underlying sequence is consumed, but all copied iterators reach the end at the same time, and will continue to behave correctly as past-the-end iterators at that point. This is the basis of LWG issue 1311.
One of the features of the 0x standardization process in the library clauses has been an improved consistency of wording and style. There is less duplication, as common terms are defined in a single place, typically the Requirement section of clause 20, and then referenced as needed. The iterators clause relied on concepts to provide the same consistency, so the wording has not benefitted from the same attention to detail. The effects are more noticeable as the core language grows, delivering an increasingly diverse range of syntax to effect the same expression.
One of the key discoveries of the concepts work was a new iterator concept dubbed the Shuffle Iterator. This combined many of the features of a mutable Forward Iterator with the Swappable concept in order to better describe the requirements on algorithms in clause 25.
It is not clear that the requirements for a Shuffle Iterator can be so easily expressed without the language of concepts, so this paper makes no attempt to add Shuffle Iterators into the standard taxonomy of iterators.
The goal of this paper is in many ways to be evolutionary, not revolutionary. Rather than immediately solving all the possible problems, the goal is to update the wording of this clause to a level consistent with the rest of the working paper, and consistent with the lessons from concepts. This clarified wording will form a strong basis for future papers or issues to address specific concerns
It is generally thought that the biggest problem facing iterator consumers is the poorly defined support for proxied iterators. This paper starts to address that by naming the reference type returned by an iterator's operator*. This is transparently made possible by extending iterator_traits with a new type alias using the new decltype keyword. This automatically supports pointers and all existing iterators with an appropriate default, and could be opened up as an extension point in the future, once clear requirements on the reference type have been defined. We choose to go no further on this issue at the moment, waiting on clarification of the wording for Swappable types among other issues.
There are many useful terms defined in clause 20.2 [Requirements] that can simplify the presentation of the iterator requirements. This happened by default in the concept-based version of the library. The benefit is clear, as many of these terms have been tweaked with a considerable attention to detail in the currently proposed standard. The requirements clauses and tables for each iterator category are adjusted accordingly.
LWG 476 points out that it is very easy to read into the current wording the requirement that all Forward, Bidirectional and Random Access Iterators must also be Output Iterators. Hence, std::vector<T>::const_iterator is merely an Input Iterator. This was much clearer in the concpets version of the clause, so that wording is adopted here.
1 Iterators are a generalization of pointers that allow a C++ program to work with different data structures (containers) in a uniform manner. To be able to construct template algorithms that work correctly and efficiently on different types of data structures, the library formalizes not just the interfaces but also the semantics and complexity assumptions of iterators. All input iterators i support the expression *i, resulting in a value of some class, enumeration, or built-in type T, called the value type of the iterator. All output iterators support the expression *i = o where o is a value of some type that is in the set of types that are writable to the particular iterator type of i. All iterators i for which the expression (*i).m is well-defined, support the expression i->m with the same semantics as (*i).m. For every iterator type X for which equality is defined, there is a corresponding signed integral type called the difference type of the iterator.
2 Since iterators are an abstraction of pointers, their semantics is a generalization of most of the semantics of pointers in C++. This ensures that every function template that takes iterators works as well with regular pointers. This International Standard defines five categories of iterators, according to the operations defined on them: input iterators, output iterators, forward iterators, bidirectional iterators and random access iterators, as shown in Table 100. Table 100 — Relations among iterator categories Random Access ! Bidirectional ! Forward ! Input ! Output
3 Forward iterators satisfy all the requirements of the input and output iterators and can be used whenever
either kindan input iterator is specified; Bidirectional iterators also satisfy all the requirements of the forward iterators and
can be used whenever a forward iterator is specified; Random access iterators also satisfy all the requirements
of bidirectional iterators and can be used whenever a bidirectional iterator is specified.
4 Besides its category, a forward, bidirectional, or random access iterator can also be mutable or constant
depending on whether the result of the expression *i behaves as a reference or as a reference to a constant.
Constant iterators do not satisfy the requirements for output iterators, and the result of the expression *i
(for constant iterator i) cannot be used in an expression where an lvalue is required.
4 Iterators that further satisy all the requirements of output iterators are called mutable iterators. Nonmutable iterators are referred to as constant iterators.
5 Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of
the array, so for any iterator type there is an iterator value that points past the last element of a corresponding
sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is
defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable.
Iterators can also have singular values that are not associated with any containersequence. [ Example: After the
declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular
value of a pointer. —end example ] Results of most expressions are undefined for singular values; the only
exceptions are destroying an iterator that holds a singular value and, the assignment of a non-singular value
to an iterator that holds a singular value, and, for iterators that satisfy the DefaultConstructible
requirements, using a value-initialized iterator as the source of a copy or
move operation. [Note: this guarantee is not offered for
default-initialization, although the distinction only matters for types with
trivial default contructors, such as pointers or
aggregates holding pointers.— end note]. In this case the singular value is overwritten the same way as any
other value. Dereferenceable values are always non-singular.
6 An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to elements of the same sequence.
7 Most of the library's algorithmic templates that operate on data structures have interfaces that use ranges.
A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is
an empty range; in general, a range [i,j) refers to the elements in the data structure starting with the oneelement
pointed to by i and up to but not including the oneelement pointed to by j. Range [i,j) is valid if and only if j
is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.
8 All the categories of iterators require only those functions that are realizable for a given category in constant time (amortized). Therefore, requirement tables for the iterators do not have a complexity column.
9 Destruction of an iterator may invalidate pointers and references previously obtained from that iterator.
10 An invalid iterator is an iterator that may be singular.267
11 In the following sections, a and b denote values of type
X or const X, difference_type and
reference refer to the types iterator_traits<X>::difference_type
and iterator_traits<X>::reference respectively, n
denotes a value of the difference type Distance
difference_type, u, tmp, and m
denote identifiers, r denotes a value of X&, t denotes a value of
value type T, o denotes a value of some type
that is writable to the output iterator. [Note:
for an iterator type X there must be an instantiation of
iterator_traits<X> (24.4.1 [iterator.traits]) - end note]
267) This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.
1 The Iterator requirements form the basis of the iterator concept taxonomy, and every iterator meets the Iterator requirements. This set of requirements specifies operations for dereferencing and incrementing the iterator. Most algorithms will require additional operations to read ([input.iterators]) or write ([output.iterators]) values, or to provide a richer set of iterator movements ([forward.iterators],[birectional.iterators], [random.access.iterators]).
2 A type X conforms to the Iterator requirements if:
Expression | Return type | Operational semantics | Assertion/note/ pre-/post-condition |
*r | reference | pre: r is dereferenceable. | |
++r | X& |
1 A class or a built-in type X satisfies the requirements of an input iterator for the value type T if X satisfies the Iterator (24.2.1) and EqualityComparable (20.2) requirements , and the following expressions are valid, as shown in Table 101.
2 In Table 101, the term the domain of == is used in the ordinary mathematical sense to denote the set of values over which == is (required to be) defined. This set can change over time. Each algorithm places additional requirements on the domain of == for the iterator values it uses. These requirements can be inferred from the uses that algorithm makes of == and !=. [ Example: the call find(a,b,x) is defined only if the value of a has the property p defined as follows: b has property p and a value i has property p if (*i==x) or if (*i!=x and ++i has property p). —end example ]
Return Type | Operational semantics | ||
X u(a); | X |
post: u is a copy of a A destructor is assumed to be present and accessible. |
|
u = a; | X& | result: u post: u is a copy of a |
|
a == b | convertible to bool | == is an equivalence relation over its domain. | |
a != b | contextually convertible to bool | !(a == b) |
pre: (a,b) is in the domain of ==. |
*a | convertible to T |
pre: a is dereferenceable. (void)*a,*a is equivalent to *a. If a == b and (a,b) is in the domain of == then *a is equivalent to *b. |
|
a->m | (*a).m |
pre: a is dereferenceable. |
|
++r | X& |
pre: r is dereferenceable. post: r is dereferenceable or r is past-the-end. post: any copies of the previous value of r are no longer required either to be dereferenceable or to be in the domain of ==. |
|
(void)r++ | equivalent to (void)++r | ||
*r++ | convertible to T | { T tmp = *r; ++r; return tmp; } |
3 [ Note: For input iterators, a == b does not imply ++a == ++b.
(Equality does not guarantee the substitution property or referential transparency.)
Algorithms on input iterators should never attempt to pass through the same
iterator twice. They should be single pass algorithms. Value type T is
not required to be an CopyAssignable type
(230.2). These algorithms can be used with istreams
as the source of the input data through the istream_iterator class
template. —end note ]
1 A class or a built-in type X satisfies the requirements of an output
iterator if X is a CopyConstructible (34) and
Assignablesatisfies the
Iterator requirements type (23.2) and also the following expressions
are valid, as shown in Table 102.
Expression | Return type | Operational semantics | Assertion/note/ pre-/post-condition |
X(a) | a = t is equivalent to X(a) = t. note: a destructor is assumed. |
||
X u(a); | |||
X u = a; | |||
*r = o | result is not used |
post: r is not required to be dereferenceable. post: r is incrementable. |
|
++r | X& |
&r == &++r. post: r is dereferenceable, unless otherwise specified. post: r is not required to be incrementable. |
|
r++ | convertible to const X& | { X tmp = r; ++r; return tmp; } |
post: r is dereferenceable, unless otherwise specified. post: r is not required to be incrementable. |
*r++ = o | result is not used |
post: r is dereferenceable, unless otherwise specified. post: r is not required to be incrementable. |
2 [ Note: The only valid use of an operator* is on the left side of the assignment statement. Assignment through the same value of the iterator happens only once. Algorithms on output iterators should never attempt to pass through the same iterator twice. They should be single pass algorithms. Equality and inequality might not be defined. Algorithms that take output iterators can be used with ostreams as the destination for placing data through the ostream_iterator class as well as with insert iterators and insert pointers. —end note ]
1 A class or a built-in type X satisfies the requirements of a forward
iterator if the following expressions are valid, as shown in Table 103.:
The domain of == for forward iterators is that of iterators over the same underlying sequence.
Two dereferenceable iterators a and b of type X deliver
the multi-pass guarantee if:
Expression | Return type | Operational semantics | Assertion/note/ pre-/post-condition |
X u; | note: u might have a singular value. note: a destructor is assumed. |
||
X() | note: X() might be singular. | ||
X(a) | a == X(a) | ||
X u(a); X u = a; |
X u; u = a; | post: u == a | |
a == b | convertible to bool | == is an equivalence relation. | |
a != b | convertible to bool | !(a == b) | |
r = a | X& | post: r == a | |
*a |
T& if X is mutable, otherwise const T& |
pre: a is dereferenceable. a == b implies *a == *b. If X is mutable, *a = t is valid. |
|
a->m |
U& if X is mutable, otherwise const U& |
(*a).m | pre: (*a).m is well-defined. |
++r | X& |
pre: r is dereferenceable. post: r is dereferenceable or r is past-the-end. r == s and r is dereferenceable implies ++r == ++s. &r == &++r. |
|
r++ | convertible to const X& | { X tmp = r; ++r; return tmp; } | |
*r++ |
reference |
— If a and b are equal, then either a and b
are both dereferenceable or else neither is dereferenceable.
— If a and b are both dereferenceable, then a == b
if and only if *a and *b are bound to the same object.
2 [ Note: The conditionrequirement that a == b
implies ++a == ++b (which is not true for input and output iterators)
and the removal of the restrictions on the number of the assignments through
thea mutable iterator (which applies to output iterators)
allows the use of multi-pass one-directional algorithms with forward iterators.
—end note ]
1 A class or a built-in type X satisfies the requirements of a bidirectional iterator if, in addition to satisfying the requirements for forward iterators, the following expressions are valid as shown in Table 104.
Expression | Return type | Operational semantics | Assertion/note/ pre-/post-condition |
--r | X& |
pre: there exists s such that r == ++s. post: r is dereferenceable. --(++r) == r. --r == --s implies r == s. &r == &--r. |
|
r-- | convertible to const X& | { X tmp = r; --r; return tmp; } | |
*r-- |
reference |
2 [ Note: Bidirectional iterators allow algorithms to move iterators backward as well as forward. —end note ]
1 A class or a built-in type X satisfies the requirements of a random access iterator if, in addition to satisfying the requirements for bidirectional iterators, the following expressions are valid as shown in Table 105.
Expression | Return type | Operational semantics | Assertion/note/ pre-/post-condition |
r += n | X& |
{ |
|
a + n n + a |
X | { X tmp = a; return tmp += n; } | a + n == n + a. |
r -= n | X& | return r += -n; | |
a - n | X | { X tmp = a; return tmp -= n; } | |
b - a |
return n; |
pre: there exists a value n of b == a + (b - a). |
|
a[n] | convertible to |
*(a + n) | |
a < b | contextually convertible to bool | b - a > 0 | < is a total ordering relation |
a > b | contextually convertible to bool | b < a | > is a total ordering relation opposite to <. |
a >= b | contextually convertible to bool | !(a < b) | |
a <= b | contextually convertible to bool | !(a > b) |
In order to remain compatible with Ready issues currently going being handled by regular issues list processing, the following changes should also be made:
Change 24.4.4 [iterator.operations]/4+5 as indicated:
template<class InputIterator> typename iterator_traits<InputIterator>::difference_type distance(InputIterator first, InputIterator last);
4 Effects: If InputIterator meets the requirements of random access iterator then
returns (last - first), otherwise rReturns the number of
increments or decrements needed to get from first to last.
5 Requires: If InputIterator meets the requirements of random access iterator then last shall be reachable from first or first shall be reachable from last, otherwise last shall be reachable from first.