Doc no: N2231=07-0091
Reply to: Matt Austern <austern@google.com>
Date: 2007-03-19

STL singly linked lists

The standard container class list is a doubly linked list. It provides bidirectional iterators, and the obvious implementation is a collection of list nodes each of which contains a pointer to both the following and the preceding node. Doubly linked lists are more flexible than singly linked lists, but they also have higher overhead in both space and time. In practice, many programs don't need the flexibility of doubly linked lists; singly linked lists are the primary data structure in many functional languages, and structs that are essentially hand-written singly linked list nodes are a common pattern for C APIs.

There is extensive experience with STL singly linked lists classes. Many C++ standard library implementations provide an slist class as an extension; it's similar to list, but it has forward iterators instead of bidirectional. The SGI, Dinkumware, and Metrowerks libraries all include slist. This proposal is a subset of the SGI version. (And hence a subset of the GNU version, which is descended from the SGI.)

This proposal is a pure extension. It does not propose any changes to the list class, or to the sequence concepts. A future proposal might reasonably introduce new concepts for node-based containers, and might implement algorithms in terms of node-based concepts.

Design decisions

Except where there is reason to differ, slist is modeled on list.

Since singly linked lists have strictly less functionality than doubly linked lists, the only reason to include them in the library is performance. It's important to notice, furthermore, that this isn't a matter of asymptotic complexity, but a matter of bytes and cycles. Inserting a new node is O(1) in either a singly linked list or a doubly linked list, but in a singly linked list it's faster. This fact influences most of the design decisions in this proposal.

Insert-before versus insert-after

There is one fundamental design decision that can be expressed in several different ways. From the point of view of the standard it is most naturally expressed as a question about the complexity of certain container operations, such as insert and erase, but it is more illuminating to think about this decision in terms of the implementation.

Consider a typical C singly-linked list node, such as this example from libxml2:

typedef struct _xmlEnumeration xmlEnumeration;
struct _xmlEnumeration {
struct _xmlEnumeration *next; /* next one */
const xmlChar *name; /* Enumeration name */
};

A list would just be represented as a pointer to the first list node, an xmlEnumeration*. An "iterator", i.e. a way of referring to a particular list element, would also just be a pointer to some list node. Traversing the list is a loop with a skeleton of this form:

while (p != 0) {
...
p = p->next;
}

Inserting a new element into such a list is equally straightforward: create a new node, and fix up pointers:

xmlEnumeration *new = malloc(sizeof(xmlEnumeration));
new->name = strdup(name);
new->next = p->next;
p->next = new;

Note that the new node is inserted after p, not before. This is unavoidable given the definition of xmlEnumeration, since there is no way to find the node whose next pointer points to p.

It's easy to translate this general idea into a C++ class. The core of such a list class might be written something like this:

struct list_node_base {
list_node_base() : next(0) { }
list_node_base* next;
};

template <typename Value>
struct list_node : public list_node_base {
list_node(const Value& v) : list_node_base(), val(v) { }
Value val;
};

template <typename Value>
struct list_iterator {
typedef Value value_type;
typedef Value* pointer;
typedef Value& reference;
typedef ptrdiff_t difference_type;
typedef std::forward_iterator_tag iterator_category;

list_iterator(list_node_base* n) : node(n) { }

value_type& operator*() const {
    return static_cast<list_node<value_type>*>(node)->val;
}

list_iterator& operator++() { node = node->next; return *this; }
list_iterator operator++(int) {
    list_iterator tmp(*this); ++*this; return tmp;
}

bool operator==(const list_iterator<value_type>& x) const {
    return node == x.node;
}
bool operator!=(const list_iterator<value_type>& x) const {
    return node != x.node;
}

list_node_base* node;
};

template <typename Value>
class slist {
private:
typedef list_node<Value> Node;
list_node_base head;
public:
slist() { head.next = 0; }

slist(const slist& x) : head() {
    const list_node_base* from = &x.head;
    list_node_base* to = &this->head;
    while (from->next != 0) {
      const Node* nextf = static_cast<Node*>(from->next);
      to->next = new Node(*nextf);
      from = from->next;
      to = to->next;
    }
}

slist<Value>& operator=(const slist& x) {
    if (&x != this) {
      slist<Value> tmp(x);
      this->swap(tmp);
    }
    return *this;
}

~slist() {
    list_node_base* p = head.next;
    while (p != 0) {
      Node* tmp = static_cast<Node*>(p);
      p = p->next;
      delete tmp;
    }
}

void swap(slist<Value>& x) { std::swap(this->head.next, x.head.next); }

public:
typedef list_iterator<Value> iterator;
iterator begin() { return iterator(head.next); }
iterator end() { return iterator(0); }
iterator before_begin() { return iterator(&head); }

public:
iterator insert_after(iterator i, const Value& x) {
    Node* p = new Node(x);
    p->next = i.node->next;
    i.node->next = p;
    return iterator(p);
}

iterator erase_after(iterator i) {
    Node* p = static_cast<Node*>(i.node->next);
    i.node->next = p->next;
    delete *p;
    return iterator(i.node->next);
}
};

In some respects this is an ordinary STL container with an ordinary iterator interface, but in other respects it is highly nontraditional. It differs from std::list more than one might have expected at first sight. While std::list provides O(1) insert(i), which inserts an element before i, and erase(i), which deletes the element that i points to, this class instead provides O(1) insert_after and erase_after. If we wanted to provide the more traditional versions in this design, they would have to be O(N): essentially, they would take an iterator argument, search for the preceding list node, and and then call insert_after or erase_after. (Or, of course, if we didn't provide insert and erase, it would be easy for users to write their own functions that do exactly the same thing.) This is directly analogous to the use model of C-style singly linked lists like xmlEnumeration.

These restrictions aren't inevitable. It is perfectly possible to write an slist with O(1) insert and erase, at the cost of making iterators slightly more complicated. The general idea is quite simple: the iterator points to a node before the one it conceptually refers to. For example, the operator* above could be replaced by the following definition:
value_type& operator*() const {
return static_cast<list_node<value_type>*>(node->next)->val;
}
This permits "insert-before" semantics because in low-level implementation terms insert(i, x) would insert the new list node after i.node, but from the user's view it would insert the new element before the element referred to by *i. One consequence is that end() must become slightly more complicated. Conceptually a past-the-end iterator points to a nonexistent element, which in these terms would mean that end().node->next, as opposed to just end().node, is a null pointer, and end().node is the last node in the list. In this design, then, slist will have to have two member variables instead of one: it still needs head, but it also needs a last_node member variable so that end() can return a node whose next pointer is null. Finally, insert and erase also have to become slightly more complicated to maintain the invariant that last_node is always the last node in the list.

We thus have two different designs, which can be characterized from an implementation or a user point of view:

Does an iterator point to the list node of the element it refers to, or to the preceding list node?
Is a past-the-end iterator represented as a null pointer, or as a pointer to the last list node?
Does an slist object contain a single pointer, the list head, or does it contain both a head pointer and a tail pointer?
Are the traditional STL definitions of insert and erase O(1), or do programmers need to use insert-after and erase-after if they want O(1) operations?

This is the most important design choice for a singly-linked list proposal.

This proposal chooses the version in which an iterator points to the list node of the element it refers to, instead of the preceding node. The main disadvantage of that design is that programmers have to use insert_after and erase_after instead of the more familiar insert and erase. The advantage, however, is that all other operations are more efficient: iterator dereference is a single pointer dereference, and the slist object itself doesn't need to use a second word to store a tail pointer. Admittedly these performance issues are small: they involve changes in small constant factors, not changes of asymptotic complexity. I still consider them important, however, because those constant factors are the only reason for using slist instead of std::list in the first place. The main design goal for this proposal is that slist should have zero overhead relative to a hand-written C-style linked list like xmlEnumeration, and a design in which iterators point to the preceding list node does not satisfy that goal. Long experience shows that for some uses the restrictions of C-style or lisp-style singly linked lists are not onerous. For other uses, std::list is still available.

A smaller question: should we provide O(N) insert and erase, or omit those operations entirely? I have chosen the latter, because providing operations with the same names as those in std::list, but with much worse performance, would invite bugs. It's also the more conservative choice, because it's always easier to add features at a later date than to remove them. One consequence of this fact is that slist will not satisfy the Sequence requirements of Table 67.

Size

The only way to calculate the size of a singly linked list is to walk through it, which is an O(N) operation. All existing STL containers have a size() member function, and indeed this is part of the container requirements of Table 65. We have three choices for slist::size():

Provide an O(N) slist::size(), essentially equivalent to std::distance(begin(), end())
Provide an O(1) slist::size(), where the container keeps the node count as a separate member variable that it updates whenever nodes are added or destroyed.
Don't provide slist::size() at all.

I have chosen the third option.

Option 3 is superior to Option 1 because experience has shown that users find an O(N) size surprising. Users expect such a simple member function to be fast; it's too tempting to write code like

if (l.size() > 1) { ... }

and so an O(N) size invites serious performance bugs. Since users can already get the effect of an O(N) size by writing std:distance(l.begin(), l.end()), leaving it out doesn't decrease slist's functionality. This does mean that we will have an STL container that doesn't satisfy the Container requirements of table 65, but we already have precedent for that: the unordered associative containers don't satisfy all of those requirements either. It seems more important to have a useful set of container member functions than to worry about strict conformance to this interface.

The choice between Option 3 and Option 2 is more a matter of judgment. I have chosen Option 3 for the same reason that I chose insert-after instead of insert-before: Option 3 is more consistent with the goal of zero overhead compared to a hand-written C-style linked list. Maintaining a count doubles the size of an slist object (one word for the list head and one for the count), and it slows down every operation that changes the number of nodes. In most cases this isn't a change in asymptotic complexity (the one change in asymptotic complexity is is one of the forms of splice), but it is nonzero overhead. It's a cost that all users would have to pay for, whether they need this feature or not, and, for users who care about maintaining a count, it's just as easy to maintain it outside the list, by incrementing the count with every insert and decrementing it with every erase, as it is to maintain the count within the list.

Node-based algorithms

The standard list class includes a number of special functions that are reminiscent of STL algorithms, including remove, unique, and sort. These member functions don't duplicate the freestanding STL algorithms: std::remove operates by copying elements from one location in a sequence to another, while std::list::remove operates on the list's node structure itself. The freestanding std::remove can't create and destroy nodes, since it only knows about iterators, but list::remove can.

All of these algorithms can be implemented for singly-linked lists (Common Lisp is an existence proof, as is the SGI slist), and this proposal includes all of them.

This is not the only reasonable decision. An alternative would be to define a framework for node-based algorithms (probably based on the notion of iterators with the ability to mutate the next/previous relationship) and then pull out all of these operations from both list and slist, defining them as generic algorithms for arbitrary node-based structures. That alternative would be elegant, and would be consistent with the principles of generic programming. I am not proposing it at this time for two related reasons. First, it would mean that this proposal wouldn't be a pure extension—it would modify an existing component, list. Second, generic node-based algorithms are experimental, while an slist with special algorithms is existing practice.

Proposed wording

Add the following class synopsis to section 23.2 [lib.sequences], and the rest of the text as a new subsection of 23.2.

Header <slist> synopsis

namespace std {
template <class T, class Allocator = allocator<T> > class slist;
template <class T, class Allocator>
    bool operator==(const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator< (const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator!=(const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator> (const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator>=(const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator<=(const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    void swap(slist<T,Allocator>& x, slist<T,Allocator>& y);
}

23.2.x Class template slist

An slist is a container that supports forward iterators and allows constant time insert and erase operations anywhere within the sequence, with storage management handled automatically. Fast random access to list elements is not supported.

A slist satisfies all of the requirements of a container (table 65), except that the size() member function is not provided. Descriptions are provided here only for operations on slist that are not described in that table or for operations where there is additional semantic information.

namespace std {
template <class T, class Allocator = allocator<T> >
class slist {
public:
    // types:
    typedef typename Allocator::reference reference;
    typedef typename Allocator::const_reference const_reference;
    typedef implementation_defined iterator;       // See 23.1
    typedef implementation_defined const_iterator; // See 23.1
    typedef implementation_defined size_type;      // See 23.1
    typedef implementation_defined difference_type;// See 23.1
    typedef T value_type;
    typedef Allocator allocator_type;
    typedef typename Allocator::pointer pointer;
    typedef typename Allocator::const_pointer const_pointer;

    // 23.2.x.1 construct/copy/destroy:
    explicit slist(const Allocator& = Allocator());
    explicit slist(size_type n, const T& value = T(),
                   const Allocator& = Allocator());
    template <class InputIterator>
      slist(InputIterator first, InputIterator last,
            const Allocator& = Allocator());
    slist(const slist<T,Allocator>&x);
    ~slist();
    slist<T,Allocator>& operator=(const slist<T,Allocator>&x);
    template <class InputIterator>
      void assign(InputIterator first, InputIterator last);
    void assign(size_type n, const T& t);
    allocator_type get_allocator() const;

    // 23.2.x.2 iterators:
    iterator before_begin();
    const_iterator before_begin() const;
    iterator begin();
    const_iterator begin() const;
    iterator end();
    const_iterator end() const;

    // capacity:
    bool empty() const;
    size_type max_size() const;

    // 23.2.x.3 element access:
    reference front();
    const_reference front() const;

    // 23.2.x.4 modifiers:
    void push_front(const T& x);
    void pop_front();

    iterator insert_after(iterator position, const T& x);
    void insert_after(iterator position, size_type n, const T& x);
    template <class InputIterator>
      void insert_after(iterator position, InputIterator first,
                        InputIterator last);
    iterator erase_after(iterator position);
    iterator erase_after(iterator position, iterator last);
    void swap(slist<T,Allocator>&);
    void clear();

    // 23.2.x.5 slist operations:
    void splice_after(iterator position, slist<T,Allocator>& x);
    void splice_after(iterator position, slist<T,Allocator>& x, iterator i);
    void splice_after(iterator position, slist<T,Allocator>& x, iterator first,
                      iterator last);

    void remove(const T& value);
    template <class Predicate> void remove_if(Predicate pred);

    void unique();
    template <class BinaryPredicate> void unique(BinaryPredicate binary_pred);

    void merge(slist<T,Allocator>& x);
    template <class Compare> void merge(slist<T,Allocator>& x, Compare comp);

    void sort();
    template <class Compare> void sort(Compare comp);

    void reverse();
};

// Comparison operators
template <class T, class Allocator>
    bool operator==(const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
bool operator< (const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator!=(const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator> (const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator>=(const slist<T,Allocator>& x, const slist<T,Allocator>& y);
template <class T, class Allocator>
    bool operator<=(const slist<T,Allocator>& x, const slist<T,Allocator>& y);

// 23.2.x.6 specialized algorithms:
template <class T, class Allocator>
    void swap(slist<T,Allocator>& x, slist<T,Allocator>& y);
}

3.2.x.1 slist constructors, copy, and assignment

explicit slist(const Allocator& = Allocator());

Effects: Constructs an empty slist, using the specified allocator.
Complexity: Constant.

explicit slist(size_type n, const T& value= T(),
const Allocator& = Allocator());

Effects: Constructs an slist with n copies of value, using the specified allocator.
Complexity: Linear in n.

template <class InputIterator>
slist(InputIterator first, InputIterator last,
const Allocator& = Allocator());

Effects: Constructs an slist equal to the range [first,last).
Complexity: Linear in distance(first, last)

template <class InputIterator>
void assign(InputIterator first, InputIterator last);

Effects: clear(); insert_after(before_begin(), first, last);

void assign(size_type n, const T& t);

Effects: clear(); insert_after(before_begin(), n, t);

23.2.x.2 slist iterators

iterator before_begin();
const_iterator before_begin() const;

Returns: A non-dereferenceable iterator that, when incremented, is equal to the iterator returned by begin().

23.2.x.3 slist element access

reference front();
const_reference front() const;

Returns: *begin().

23.2.x.4 slist modifiers

None of the overloads of insert_after shall affect the validity of iterators and reference, and erase_after shall invalidate only the iterators and references to the erased elements. If an exception is thrown during insert_after there shall be no effect. Insertion of n elements into an slist is linear in n, and the number of calls to the copy constructor of T is exactly equal to n. Erasing n elements from an slist is linear time in n and the number of calls to the destructor of type T is exactly equal to n.

void push_front(const T& x);

Effects: insert_after(before_begin(), x);

void pop_front();

Effects: erase_after(before_begin());

iterator insert_after(iterator position, const T& x);

Requires: position is dereferenceable or equal to before_begin().
Effects: Inserts a copy of x after position.
Returns: An iterator pointing to the copy of x.

void insert_after(iterator position, size_type n, const T& x);

Requires: position is dereferenceable or equal to before_begin().
Effects: Inserts n copies of x after position.

template <class InputIterator>
void insert_after(iterator position, InputIterator first,
InputIterator last);

Requires: position is dereferenceable or equal to before_begin(). first and last are not iterators in *this.
Effects: Inserts copies of elements in [first, last) after position.

iterator erase_after(iterator position);

Requires: The iterator following position is dereferenceable.
Effects: Erases the element pointed to by the iterator following position.
Returns: An iterator pointing to the element following the one that was erased, or end() if no such element exists.

iterator erase_after(iterator position, iterator last);

Requires: All iterators in the range (position, last) are dereferenceable.
Effects: Erases the elements in the range (position, last).
Returns: last

void clear();

Effects: Erases all elements in the range [begin(), end()).

23.2.x.5 slist operations

void splice_after(iterator position, slist<T,Allocator>& x);

Requires: position is dereferenceable or equal to before_begin(). &x != this.
Effects: Inserts the contents of x before position, and x becomes empty. Invalidates all iterators and references to x.
Throws: nothing
Complexity: O(1)

void splice_after(iterator position, slist<T,Allocator>& x, iterator i);

Requires: position is dereferenceable or equal to before_begin(). The iterator following i is a dereferenceable iterator in x.
Effects: Inserts the element following i into *this, following position, and removes it from x. Invalidates only the iterators and references to the spliced elements.
Throws: nothing
Complexity: O(1)

void splice_after(iterator position, slist<T,Allocator>& x, iterator first,
iterator last);

Requires: position is dereferenceable or equal to before_begin(). [first, last) is a valid range in x, and all iterators in the range (first, last) are dereferenceable. position is not an iterator in the range (first, last).
Effects: Inserts elements in the range (first, last)after position and removes the elements from x. Invalidates only the iterators and references to the spliced elements.

void remove(const T& value);
template <class Predicate> void remove_if(Predicate pred);

Effects: Erases all the elements in the list referred by a list iterator i for which the following conditions hold: *i == value, pred(*i) is true. This operation shall be stable: the relative order of the elements that are not removed is the same as their relative order in the original list.
Throws: Nothing unless an exception is thrown by the equality comparison or the predicate.
Complexity: Exactly distance(begin(), end()) applications of the corresponding predicate.

void unique();
template <class BinaryPredicate> void unique(BinaryPredicate binary_pred);

Effects: Eliminates all but the first element from every consecutive group of equal elements referred to by the iterator i in the range [first + 1, last) for which *i == *(i-1) (for the version of unique with no arguments) or pred(*i, *(i - 1)) (for the version of unique with a predicate argument) holds.
Throws: Nothing unless an exception in thrown by the equality comparison or the predicate.
Complexity: If the range (last - first) is not empty, exactly (last - first) - 1 applications of the corresponding predicate, otherwise no applications of the predicate.

void merge(slist<T,Allocator>& x);
template <class Compare> void merge(slist<T,Allocator>& x, Compare comp);

Requires: comp defines a strict weak ordering (25.3), and *this and x are both sorted according to this ordering.
Effects: Merges x into *this. This operation shall be stable: for equivalent elements in the two lists, the elements from *this shall always precede the elements from x. x is empty after the merge. If an exception is thrown other than by a comparison there are no effects.
Complexity: At most size() + x.size() - 1 comparisons.

void sort();
template <class Compare> void sort(Compare comp);

Requires: operator< (for the first version) or comp (for the second version) defines a strict weak ordering (25.3).
Effects: Sorts the list according to the operator<or a Compare function object. This operation shall be stable: the relative order of the equivalent elements is preserved. If an exception is thrown the order of the elements in *this is indeterminate.
Complexity: Approximately NlogN comparisons, where N == distance(begin(), end()).

void reverse();

Effects: Reverses the order of the elements in the list.
Throws: Nothing.
Complexity: Linear time.

23.2.x.6 slist specialized algorithms

template <class T, class Allocator>
void swap(slist<T,Allocator>& x, slist<T,Allocator>& y);

Effects: x.swap(y)