Author | Doug Gregor, Jeremy Siek |
---|---|
Contact | dgregor@cs.indiana.edu |
Date | 2005-04-14 |
Number | N1798=05-0058 |
Working Group | Evolution |
This paper illustrates that explicit model definitions (as required by the Indiana concepts proposal, N1758) are necessary for any safe, correct, and backward-compatible formulation of concepts in C++. We show that certain common idioms necessitate explicit model definitions, in particular the use of iterators in the C++ Standard Library. More precisely, we show that existing (correct) code using the standard library will still compile but will fail at run-time if concepts are matched implicitly based on structure, as in the Texas A&M concepts proposal, N1782.
istream_iterator
Consider the following definition of
the ForwardIterator
concept that is not mutable,
adapted from N1782:
concept Forward_iterator<Input_iterator Iter> where Default_constructible<Iter> && Assignable<Iter> { Iter p, q; Iter& r = (p = q); const value_type& t = *p; Iter& q2 = ++p; const Iter& q3 = p++; const value_type& t2 = *p++ };
and this definition of istream_iterator
(from the GNU implementation
of the C++ Standard Library, libstdc++):
template<typename _Tp, typename _CharT = char, typename _Traits = char_traits<_CharT>, typename _Dist = ptrdiff_t> class istream_iterator : public iterator<input_iterator_tag, _Tp, _Dist, const _Tp*, const _Tp&> { public: typedef _CharT char_type; typedef _Traits traits_type; typedef basic_istream<_CharT, _Traits> istream_type; istream_iterator(); istream_iterator(istream_type& __s); istream_iterator(const istream_iterator& __obj); const _Tp& operator*() const; const _Tp* operator->() const; istream_iterator& operator++(); istream_iterator operator++(int); }; template<typename _Tp, typename _CharT, typename _Traits, typename _Dist> inline bool operator==(const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __x, const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __y); template <class _Tp, class _CharT, class _Traits, class _Dist> inline bool operator!=(const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __x, const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __y);
Does istream_iterator
match the syntactic
constraints of the Forward_iterator
concept? Manually
checking this confirms that it does. We also
checked istream_iterator
against
the ForwardIterator
concept checker in the Boost Concept
Checking library (commenting
out iterator_category
checks, of course!) and it did
pass, i.e., it structurally matches
the Forward_iterator
concept.
So, there are certain input iterator types (such
as istream_iterator
) that would be misclassified as
forward iterators. What is the danger in this? Some algorithms
dispatch based on Input_iterator
vs. Forward_iterator
. For instance, the range
constructor for a std::vector
has two conceptual
implementations required by the standard:
// O(lg n) allocations template<Input_iterator Iter> vector(Iter first, Iter last) { while (first != last) push_back(*first++); } // 1 allocation template<Forward_iterator Iter> vector(Iter first, Iter last) { typename Iter::difference_type n = distance(first, last); reserve(n); while (n--) push_back(*first++); }
Concept-based overloading will pick the bottom version for
Forward_iterators
or the top version
for Input_iterator
s. The Forward_iterator
version is preferred when it can be used
because Forward_iterator
is a refinement
of Input_iterator
.
Now, we put the pieces together in one line of code, which
reads integers from cin
and puts them into
the vector
named ints
:
vector<int> ints(istream_iterator<int>(cin), istream_iterator<int>());
We've established that istream_iterator
models Forward_iterator
, so the
second vector
range constructor will be
selected. Now, consider the execution of that function:
vector
.vector
, but the stream has already been exhausted
(remember, input iterators are single-pass: you can only read them
once!). This causes undefined behavior, because "first" is a
past-the-end iterator and we are incrementing it.Why did this fail? The ForwardIterator
concept
adds one very important semantic guarantee
to InputIterator
: you can pass through and read a
ForwardIterator
sequence many times, but
an InputIterator
sequence can only be read
once. Structural conformance assumes that semantics follow from
syntax, which is not the case with these iterator concepts.
The code above works today (without concepts) because we have
the notion of iterator_category
(in iterator_traits
), which is a an explicit model
declaration. input_iterator_tag
,
forward_iterator_tag
, etc. are just hacks that help
us tell the library "my type semantically models
the InputIterator
concept", etc. However, if we go
to a purely structural model for concepts in C++ (only syntactic
matching), the example program will compile but fail at run
time.
The second problem with misinterpreting input iterators as forward iterators is that one will not receive a compiler error for code such as:
istream_iterator<int> i = adjacent_find(istream_iterator<int>(cin), istream_iterator<int>());
This code is incorrect, and will fail to compile with standard
library implementations that explicitly check
the iterator_category
. Using purely structural
conformance, the program compiles. While it executes, it invokes
undefined behavior (because it reads an input iterator after a
copy of it has been incremented), but without checking it returns
almost the right answer: the iterator returned will be one step
too far, violating the semantics in the standard. Using explicit
model declarations ensures that this error is detected at compile
time.
Here are some other examples of concepts that differ only by semantics, for which code that works today (because it emulates explicit model declarations) would not work if we introduce a purely syntactic concept system into C++0x:
The Abstract Algebra concepts of Groupoid
and SemiGroup
can be described by:
template<typeid T> concept Groupoid { T operator+(T,T); }; template<typeid T> concept SemiGroup : Groupoid<T> { // operator+ is associative };
The semantic difference becomes important in parallel
reduction operations, which can only operate in parallel when the
operation is associative (i.e., the type models
the SemiGroup
concept). For instance, a parallelizing
STL could apply parallel reduction to
implement std::accumulate
when the "plus" operation
is associative (i.e., the type models the SemiGroup
concept) but it must perform sequential accumulation when the
"plus" operation is not associative (i.e., the type only models
the Groupoid
concept). The two concepts are
syntactically indistinguishable, meaning that there is no way to
safely determine that the parallelization can be performed.
Input_iterator
, Output_iterator
,
or Forward_iterator
include read-once iterators and
write-once iterators that are useful in the context of move
semantics.template<typeid PG> concept MessagingProcessGroup {}; template<typeid PG> concept BSPProcessGroup : MessagingProcessGroup {}; template<typeid PG> concept ImmediateProcessGroup : MessagingProcessGroup {};
The difference between these concepts is entirely semantic:
MessagingProcessGroup
gives very weak
guarantees about message delivery time (in a distributed computing
environment), whereas the two refining concepts tighen up these
delivery time guarantees. We need to dispatch on the various
process group types because certain distributed algorithms require
certain delivery time guarantees that are part
of BSPProcessGroup
or ImmediateProcessGroup
.
Concepts contain both syntax and semantics. When model definitions are required, the user states explicitly that the semantics are satisfied. When model definitions are optional, the compiler matches the syntax of the concept and then assumes that the semantics are correct. The danger is that a data type can have the semantics of one concept but the syntax of another (typically more refined) concept, invoking improper optimizations. We avoid these problems in current C++ by emulating explicit model definitions using traits and category tags.