Explicit model definitions are necessary

Author	Doug Gregor, Jeremy Siek
Contact	dgregor@cs.indiana.edu
Date	2005-04-14
Number	N1798=05-0058
Working Group	Evolution

Abstract

This paper illustrates that explicit model definitions (as required by the Indiana concepts proposal, N1758) are necessary for any safe, correct, and backward-compatible formulation of concepts in C++. We show that certain common idioms necessitate explicit model definitions, in particular the use of iterators in the C++ Standard Library. More precisely, we show that existing (correct) code using the standard library will still compile but will fail at run-time if concepts are matched implicitly based on structure, as in the Texas A&M concepts proposal, N1782.

The trouble with istream_iterator
Additional examples
Conclusions

The trouble with `istream_iterator`

Consider the following definition of the ForwardIterator concept that is not mutable, adapted from N1782:

  concept Forward_iterator<Input_iterator Iter>
    where Default_constructible<Iter>
          && Assignable<Iter> {
      Iter p, q;     
      Iter& r = (p = q);   
      const value_type& t = *p;     
      Iter& q2 = ++p;    
      const Iter& q3 = p++;    
      const value_type& t2 = *p++  
  };

and this definition of istream_iterator (from the GNU implementation of the C++ Standard Library, libstdc++):

  template<typename _Tp, typename _CharT = char, 
           typename _Traits = char_traits<_CharT>, typename _Dist = ptrdiff_t> 
    class istream_iterator 
      : public iterator<input_iterator_tag, _Tp, _Dist, const _Tp*, const _Tp&>
    {
    public:
      typedef _CharT                         char_type;
      typedef _Traits                        traits_type;
      typedef basic_istream<_CharT, _Traits> istream_type;

      istream_iterator();
      istream_iterator(istream_type& __s);
      istream_iterator(const istream_iterator& __obj);
      const _Tp& operator*() const;
      const _Tp* operator->() const;
      istream_iterator& operator++();
      istream_iterator operator++(int);
    };
  
  template<typename _Tp, typename _CharT, typename _Traits, typename _Dist>
    inline bool 
    operator==(const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __x,
	       const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __y);

  template <class _Tp, class _CharT, class _Traits, class _Dist>
    inline bool 
    operator!=(const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __x,
	       const istream_iterator<_Tp, _CharT, _Traits, _Dist>& __y);

Does istream_iterator match the syntactic constraints of the Forward_iterator concept? Manually checking this confirms that it does. We also checked istream_iterator against the ForwardIterator concept checker in the Boost Concept Checking library (commenting out iterator_category checks, of course!) and it did pass, i.e., it structurally matches the Forward_iterator concept.

So, there are certain input iterator types (such as istream_iterator) that would be misclassified as forward iterators. What is the danger in this? Some algorithms dispatch based on Input_iterator vs. Forward_iterator. For instance, the range constructor for a std::vector has two conceptual implementations required by the standard:

  // O(lg n) allocations
  template<Input_iterator Iter>
    vector(Iter first, Iter last)
    { while (first != last) push_back(*first++); }

  // 1 allocation
  template<Forward_iterator Iter>
    vector(Iter first, Iter last)
    {
      typename Iter::difference_type n = distance(first, last);
      reserve(n);
      while (n--) push_back(*first++); 
    }

Concept-based overloading will pick the bottom version for Forward_iterators or the top version for Input_iterators. The Forward_iterator version is preferred when it can be used because Forward_iterator is a refinement of Input_iterator.

Now, we put the pieces together in one line of code, which reads integers from cin and puts them into the vector named ints:

  vector<int> ints(istream_iterator<int>(cin), istream_iterator<int>());

We've established that istream_iterator models Forward_iterator, so the second vector range constructor will be selected. Now, consider the execution of that function:

The first line determines how long the sequence is, by running through the input stream. Note that this reads all values in the stream in the process.
The second line reserves space in the vector.
The third line tries to read the sequence to put it into the vector, but the stream has already been exhausted (remember, input iterators are single-pass: you can only read them once!). This causes undefined behavior, because "first" is a past-the-end iterator and we are incrementing it.

Why did this fail? The ForwardIterator concept adds one very important semantic guarantee to InputIterator: you can pass through and read a ForwardIterator sequence many times, but an InputIterator sequence can only be read once. Structural conformance assumes that semantics follow from syntax, which is not the case with these iterator concepts.

The code above works today (without concepts) because we have the notion of iterator_category (in iterator_traits), which is a an explicit model declaration. input_iterator_tag, forward_iterator_tag, etc. are just hacks that help us tell the library "my type semantically models the InputIterator concept", etc. However, if we go to a purely structural model for concepts in C++ (only syntactic matching), the example program will compile but fail at run time.

The second problem with misinterpreting input iterators as forward iterators is that one will not receive a compiler error for code such as:

istream_iterator<int> i = adjacent_find(istream_iterator<int>(cin), istream_iterator<int>());

This code is incorrect, and will fail to compile with standard library implementations that explicitly check the iterator_category. Using purely structural conformance, the program compiles. While it executes, it invokes undefined behavior (because it reads an input iterator after a copy of it has been incremented), but without checking it returns almost the right answer: the iterator returned will be one step too far, violating the semantics in the standard. Using explicit model declarations ensures that this error is detected at compile time.

Additional examples

Here are some other examples of concepts that differ only by semantics, for which code that works today (because it emulates explicit model declarations) would not work if we introduce a purely syntactic concept system into C++0x:

The Abstract Algebra concepts of Groupoid and SemiGroup can be described by:
```
    template<typeid T>
      concept Groupoid
      {
        T operator+(T,T);
      };

    template<typeid T>
      concept SemiGroup : Groupoid<T>
      {
        // operator+ is associative
      };
      
```
The semantic difference becomes important in parallel reduction operations, which can only operate in parallel when the operation is associative (i.e., the type models the SemiGroup concept). For instance, a parallelizing STL could apply parallel reduction to implement std::accumulate when the "plus" operation is associative (i.e., the type models the SemiGroup concept) but it must perform sequential accumulation when the "plus" operation is not associative (i.e., the type only models the Groupoid concept). The two concepts are syntactically indistinguishable, meaning that there is no way to safely determine that the parallelization can be performed.
Other iterator categories that syntactically match Input_iterator, Output_iterator, or Forward_iterator include read-once iterators and write-once iterators that are useful in the context of move semantics.
The Parallel Boost Graph Library has the following concepts with identical syntax:
```
          template<typeid PG> concept MessagingProcessGroup {};
          template<typeid PG> concept BSPProcessGroup : MessagingProcessGroup {};
          template<typeid PG> concept ImmediateProcessGroup : MessagingProcessGroup {};
        
```
The difference between these concepts is entirely semantic: MessagingProcessGroup gives very weak guarantees about message delivery time (in a distributed computing environment), whereas the two refining concepts tighen up these delivery time guarantees. We need to dispatch on the various process group types because certain distributed algorithms require certain delivery time guarantees that are part of BSPProcessGroup or ImmediateProcessGroup.

Conclusions

Concepts contain both syntax and semantics. When model definitions are required, the user states explicitly that the semantics are satisfied. When model definitions are optional, the compiler matches the syntax of the concept and then assumes that the semantics are correct. The danger is that a data type can have the semantics of one concept but the syntax of another (typically more refined) concept, invoking improper optimizations. We avoid these problems in current C++ by emulating explicit model definitions using traits and category tags.

Explicit model definitions are necessary

Abstract

The trouble with istream_iterator

Additional examples

Conclusions

The trouble with `istream_iterator`