Introduction of std::colony to the standard library

The purpose of a container in the standard library cannot be to provide the optimal solution for all scenarios. Inevitably in fields such as high-performance trading or gaming, the optimal solution within critical loops will be a custom-made one that fits that scenario perfectly. However, outside of the most critical of hot paths, there is a wide range of application for more generalized solutions.

Colony is a formalisation, extension and optimization of what is typically known as a 'bucket array' container in game programming circles; similar structures exist in various incarnations across the high-performance computing, high performance trading, 3D simulation, physics simulation, robotics, server/client application and particle simulation fields (see: https://groups.google.com/a/isocpp.org/forum/#!topic/sg14/1iWHyVnsLBQ).

The concept of a bucket array is: you have multiple memory blocks of elements, and a boolean token for each element which denotes whether or not that element is 'active' or 'erased', commonly know as a skipfield. If it is 'erased', it is skipped over during iteration. When all elements in a block are erased, the block is removed, so that iteration does not lose performance by having to skip empty blocks. If an insertion occurs when all the blocks are full, a new memory block is allocated.

The advantages of this structure are as follows: because a skipfield is used, no reallocation of elements is necessary upon erasure. Because the structure uses multiple memory blocks, insertions to a full container also do not trigger reallocations. This means that element memory locations stay stable and iterators stay valid regardless of erasure/insertion. This is highly desirable, for example, in game programming because there are usually multiple elements in different containers which need to reference each other during gameplay and elements are being inserted or erased in real time.

Problematic aspects of a typical bucket array are that they tend to have a fixed memory block size, do not re-use memory locations from erased elements, and utilize a boolean skipfield. The fixed block size (as opposed to block sizes with a growth factor) and lack of erased-element re-use leads to far more allocations/deallocations than is necessary. Given that allocation is a costly operation in most operating systems, this becomes important in performance-critical environments. The boolean skipfield makes iteration time complexity undefined, as there is no way of knowing ahead of time how many erased elements occur between any two non-erased elements. This can create variable latency during iteration. It also requires branching code, which may cause issues on processors with deep pipelines and poor branch-prediction failure performance.

A colony uses a non-boolean, non-branching method for skipping runs of erased elements, which allows for O(1) amortized iteration time complexity and more-predictable iteration performance than a bucket array. It also utilizes a growth factor for memory blocks and reuses erased element locations upon insertion, which leads to fewer allocations/reallocations. Because it reuses erased element memory space, the exact location of insertion is undefined, unless no erasures have occurred or an equal number of erasures and insertions have occurred (in which case the insertion location is the back of the container). The container is therefore considered unordered but sortable. Lastly, because there is no way of predicting in advance where erasures ('skips') may occur during iteration, an O(1) time complexity [ ] operator is not possible and the container is bidirectional, but not random-access.

There are two patterns for accessing stored elements in a colony: the first is to iterate over the container and process each element (or skip some elements using the advance/prev/next/iterator ++/-- functions). The second is to store the iterator returned by the insert() function (or a pointer derived from the iterator) in some other structure and access the inserted element in that way. To better understand how insertion and erasure work in a colony, see the following images.

Insertion to back

The following images demonstrate how insertion works in a colony compared to a vector when size == capacity.

Non-back erasure

The following images demonstrate how non-back erasure works in a colony compared to a vector.

II. Questions for the Committee

III. Motivation and Scope

Note: Throughout this document I will use the term 'link' to denote any form of referencing between elements whether it be via ids/iterators/pointers/indexes/references/etc.

There are situations where data is heavily interlinked, iterated over frequently, and changing often. An example is the typical video game engine. Most games will have a central generic 'entity' or 'actor' class, regardless of their overall schema (an entity class does not imply an ECS). Entity/actor objects tend to be 'has a'-style objects rather than 'is a'-style objects, which link to, rather than contain, shared resources like sprites, sounds and so on. Those shared resources are usually located in separate containers/arrays so that they can re-used by multiple entities. Entities are in turn referenced by other structures within a game engine, such as quadtrees/octrees, level structures, and so on.

Entities may be erased at any time (for example, a wall gets destroyed and no longer is required to be processed by the game's engine, so is erased) and new entities inserted (for example, a new enemy is spawned). While this is all happening the links between entities, resources and superstructures such as levels and quadtrees, must stay valid in order for the game to run. The order of the entities and resources themselves within the containers is, in the context of a game, typically unimportant, so an unordered container is okay.

Unfortunately the container with the best iteration performance in the standard library, vector^[1], loses pointer validity to elements within it upon insertion, and pointer/index validity upon erasure. This tends to lead to sophisticated and often restrictive workarounds when developers attempt to utilize vector or similar containers under the above circumstances.

std::list and the like are not suitable due to their poor locality, which leads to poor cache performance during iteration. This is however an ideal situation for a container such as colony, which has a high degree of locality. Even though that locality can be punctuated by gaps from erased elements, it still works out better in terms of iteration performance^[1] than every existing standard library container other than deque/vector, regardless of the ratio of erased to non-erased elements.

Some more specific requirements for containers in the context of game development are listed in the appendix.

As another example, particle simulation (weather, physics etcetera) often involves large clusters of particles which interact with external objects and each other. The particles each have individual properties (spin, momentum, direction etc) and are being created and destroyed continuously. Therefore the order of the particles is unimportant, what is important is the speed of erasure and insertion. No current standard library container has both strong insertion and non-back erasure speed, so again this is a good match for colony.

Reports from other fields suggest that, because most developers aren't aware of containers such as this, they often end up using solutions which are sub-par for iterative performance such as std::map and std::list in order to preserve pointer validity, when most of their processing work is actually iteration-based. So, introducing this container would both create a convenient solution to these situations, as well as increasing awareness of better-performing approaches in general. It will also ease communication across fields, as opposed to the current scenario where each field uses a similar container but each has a different name for it.

IV. Impact On the Standard

V. Design Decisions

Each memory block houses multiple elements. The metadata about each block may or may not be allocated with the blocks themselves (could be contained in a separate structure). This metadata should include at a minimum, the number of non-erased elements within each block and the block's capacity - which allows the container to know when the block is empty and needs to be removed from the iterative chain, and also allows iterators to judge when the end of one block has been reached. A non-boolean skipfield is required in order to skip over erased elements during iteration while maintaining O(1) amortized iteration time complexity (amortized due to block traversal, which requires a few more operations). Finally, a mechanism for keeping track of elements which have been erased must be present, so that those memory locations can be reused upon subsequent element insertions.

The following aspects of a colony must be implementation-defined in order to allow for variance and possible performance improvement, and to conform with possible changes to C++ in the future:

However the implementation of these is significantly constrained by the requirements of the container (lack of reallocation, stable pointers to non-erased elements regardless of erasures/insertions).

In terms of the reference implementation the specific structure and mechanisms have changed many times over the course of development, however the interface to the container and its time complexity guarantees have remained largely unchanged (with the exception of the time complexity for updating skipfield nodes - which has not impacted significantly on performance). So it is reasonably likely that regardless of specific implementation, it will be possible to maintain this general specification without obviating future improvements in implementation, so long as time complexity guarantees for the above list are implementation-defined.

Below I explain the reference implementation's approach in terms of the three core aspects described above, along with descriptions of some alternatives implementation approaches.

1. Collection of element memory blocks + metadata

In the reference implementation this is essentially a doubly-linked list of 'group' structs containing (a) memory blocks, (b) memory block metadata and (c) skipfields. The memory blocks and skipfields have a growth factor of 2 from one group to the next. The metadata includes information necessary for an iterator to iterate over colony elements, such as the last insertion point within the memory block, and other information useful to specific functions, such as the total number of non-erased elements in the node. This approach keeps the operation of freeing empty memory blocks from the colony container at O(1) time complexity. Further information is available here.

An alternative implementation could be to use a vector of pointers to dynamically-allocated memory blocks + skipfields in one struct, with a separate vector of memory block metadata structs. This approach would have some advantages in terms of increasing the locality for metadata during iteration, but would create reallocation costs when memory blocks + their skipfields and metadata were removed upon becoming empty.

A vector of memory blocks, as opposed to a vector of pointers to memory blocks, would not work as it would (a) disallow a growth factor in the memory blocks and (b) invalidate pointers to elements in subsequent blocks when a memory block became empty of elements and was therefore removed from the vector. In short, negating colony's beneficial aspects.

2. A non-boolean skipfield which allows for O(1) traversal from each non-erased element to the next

The reference implementation currently uses a skipfield pattern called the Low complexity jump-counting pattern (formerly under working title 'bentley pattern', current version of paper). This effectively encodes the length of runs of consecutive erased elements, into a skipfield, which allows for O(1) time complexity during iteration. Since there is no branching involved in iterating over the skipfield aside from end-of-block checks, it can be less problematic computationally than a boolean skipfield (which has to branch for every skipfield read) in terms of CPUs which don't handle branching or branch-prediction failure efficiently (eg. Core2).

The pattern stores and modifies the run-lengths during insertion and erasure with O(1) time complexity. It has a lot of similarities to the High complexity jump-counting pattern, which was a pattern previously used by the reference implementation. Using the High complexity jump-counting pattern is an alternative, though the skipfield update time complexity guarantees for that pattern are effectively undefined, or between O(1) and O(skipfield length) for each insertion/erasure. In actual practice those updates result in one memcpy operation which resolves to a single block-copy operation, but it is still a little slower than the Low complexity jump-counting pattern. The skipfield pattern you use will also typically have an effect on the type of memory-reuse mechanism you can utilize.

A pure boolean skipfield is not usable because it makes iteration time complexity undefined - it could for example result in thousands of branching statements + skipfield reads for a single ++ operation in the case of many consecutive erased elements. In the high-performance fields for which this container was initially designed, this brings with it unacceptable latency. However another strategy using a combination of a jump-counting and boolean skipfield, which saves memory at the expense of computational efficiency, is possible as follows:

This approach has the advantage of still performing O(1) iterations from one non-erased element to the next, unlike a pure boolean skipfield approach, but compared to a pure jump-counting approach introduces 3 additional costs per iteration via (1) a branch operation when checking the bitfield, (2) an additional read (of the erased element's memory space) and (3) a bitmasking operation + bitshift to read the bit. But it does reduce the memory overhead of the skipfield to 1 bit per-element, which reduces the cache load. An implementation and benchmarking would be required in order to establish whether this approach improves upon the current implementation's performance.

3. Erased-element location recording mechanism

There are two valid approaches here; both involve per-memory-block free lists, utilizing the memory space of erased elements. The first approach forms a free list of all erased elements. The second forms a free list of the first element in each run of consecutive erased elements ("skipblocks", in terms of the terminology used in the jump-counting pattern papers). The second can be more efficient, but requires a doubly-linked free list rather than a singly-linked free list - otherwise it becomes an O(N) operation to update links in the skipfield, when a skipblock expands or contracts during erasure or insertion.

The reference implementation currently uses the second approach, using three things to keep track of erased element locations:

Previous versions of the reference implementation used a singly-linked free list of erased elements instead of a doubly-linked free list of skipblocks, this was possible with the High complexity jump-counting pattern, but not possible using the Low complexity jump-counting pattern as it cannot calculate a skipblock's start node location from a middle node's value like the High complexity pattern can. Using free-lists of skipblocks is a more efficient approach.

One cannot use a stack of pointers (or similar) to erased elements for this mechanism, as early versions of the reference implementation did, because this can create allocations during erasure, which changes the exception guarantees of erase. One could instead scan all skipfields until an erased location was found, or simply have the first item in the list above and then scan the first available block, though both of these approaches would be slow.

In terms of the alternative boolean + jump-counting skipfield approach described in the skipfield section above, one could store both the jump-counting data and free list data in any given erased element's memory space, provided of course that elements are aligned to be wide enough to fit both.

Implementation of iterator class

The reference implementation's iterator stores a pointer to the current 'group' struct mentioned above, plus a pointer to the current element and a pointer to its corresponding skipfield node. An alternative approach is to store the group pointer + an index, since the index can indicate both the offset from the memory block for the element, as well as the offset from the start of the skipfield for the skipfield node. However multiple implementations and benchmarks across many processors have shown this to be worse-performing than the separate pointer-based approach, despite the increased memory cost for the iterator class itself.

++ operation is as follows, utilising the reference implementation's Low-complexity jump-counting pattern:

-- operation is the same except both step 1 and 2 involve subtraction rather than adding, and step 3 checks to see if the element pointer is now before the beginning of the memory block. If so it traverses to the back of the previous group, and subtracts the value of the back skipfield node from the element pointer and skipfield pointer.

Iterators are bidirectional but also provide constant time complexity >, <, >=, <= and <=> operators for convenience (eg. in for loops when skipping over multiple elements per loop and there is a possibility of going past a pre-determined end element). This is achieved by keeping a record of the order of memory blocks. In the reference implementation this is done by assigning a number to each memory block in its metadata. In an implementation using a vector of pointers to memory blocks instead of a linked list, one could use the position of the pointers within the vector to determine this. Comparing relative order of the two iterators' memory blocks via this number, then comparing the memory locations of the elements themselves, if they happen to be in the same memory block, is enough to implement all greater/lesser comparisons.

Additional notes on specific functions

Results of implementation

In practical application the reference implementation is generally faster for insertion and (non-back) erasure than current standard library containers, and generally faster for iteration than any container except vector and deque. For full details, see benchmarks.

VI. Technical Specification

26.3.7 Header <colony> synopsis [colony.syn]

Iterator Invalidation

26.3.14 Class template colony [colony]

26.3.14.1 Class template colony overview [colony.overview]

T - the element type. In general T shall meet the requirements of Erasable, CopyAssignable and CopyConstructible.
However, if emplace is utilized to insert elements into the colony, and no functions which involve copying or moving are utilized, T is only required to meet the requirements of Erasable.
If move-insert is utilized instead of emplace, T shall also meet the requirements of MoveConstructible.

Allocator - an allocator that is used to acquire memory to store the elements. The type shall meet the requirements of Allocator. The behavior is undefined if Allocator::value_type is not the same as T.

Skipfield - an unsigned integer type. This type is used to form skipfields, which are used to indicate which elements are erased. Use of this type by an implementation is not guaranteed.

namespace std {
struct limits
{
   size_t min, max;
   limits(size_t _min, size_t _max);
};


template <class T, class Allocator = allocator<T>, typename Skipfield = implementation-defined>
class colony {
public:

  // types
  using value_type = T;
  using allocator_type = Allocator;
  using skipfield_type = Skipfield;
  using pointer = typename allocator_traits<Allocator>::pointer;
  using const_pointer = typename allocator_traits<Allocator>::const_pointer;
  using reference = value_type&;
  using const_reference = const value_type&;
  using size_type = implementation-defined; // see 26.2
  using difference_type = implementation-defined; // see 26.2
  using iterator = implementation-defined; // see 26.2
  using const_iterator = implementation-defined; // see 26.2
  using reverse_iterator = implementation-defined; // see 26.2
  using const_reverse_iterator = implementation-defined; // see 26.2



  colony() noexcept(noexcept(Allocator())) : colony(Allocator()) { }
  explicit colony(std::limits block_capacity_limits) noexcept(noexcept(Allocator())) : colony(Allocator()) { }
  explicit colony(const Allocator&) noexcept;
  explicit colony(std::limits block_capacity_limits, const Allocator&) noexcept;
  explicit colony(size_type n, std::limits block_capacity_limits = implementation-defined, const Allocator& = Allocator());
  colony(size_type n, const T& value, std::limits block_capacity_limits = implementation-defined, const Allocator& = Allocator());
  template <class InputIterator>
    colony(InputIterator first, InputIterator last, std::limits block_capacity_limits = implementation-defined, const Allocator& = Allocator());
  colony(const colony& x);
  colony(colony&&) noexcept;
  colony(const colony&, const Allocator&);
  colony(colony&&, const Allocator&);
  colony(initializer_list<T>, std::limits block_capacity_limits = implementation-defined, const Allocator& = Allocator());
  ~colony() noexcept;
  colony& operator= (const colony& x);
  colony& operator= (colony&& x) noexcept(allocator_traits<Allocator>::propagate_on_container_move_assignment::value || allocator_traits<Allocator>::is_always_equal::value);
  colony& operator= (initializer_list<T>);
  template<class InputIterator> void assign(InputIterator first, InputIterator last);
  void assign(size_type n, const T& t);
  void assign(initializer_list<T>);
  allocator_type get_allocator() const noexcept;


  // iterators
  iterator               begin() noexcept;
  const_iterator         begin() const noexcept;
  iterator               end() noexcept;
  const_iterator         end() const noexcept;
  reverse_iterator       rbegin() noexcept;
  const_reverse_iterator rbegin() const noexcept;
  reverse_iterator       rend() noexcept;
  const_reverse_iterator rend() const noexcept;

  const_iterator         cbegin() const noexcept;
  const_iterator         cend() const noexcept;
  const_reverse_iterator crbegin() const noexcept;
  const_reverse_iterator crend() const noexcept;


  // capacity
  [[nodiscard]] bool empty() const noexcept;
  size_type size() const noexcept;
  size_type max_size() const noexcept;
  size_type capacity() const noexcept;
  size_type memory() const noexcept;
  void reserve(size_type n);
  void shrink_to_fit();
  void trim() noexcept;


  // modifiers
  template <class... Args> iterator emplace(Args&&... args);
  iterator insert(const T& x);
  iterator insert(T&& x);
  void insert(size_type n, const T& x);
  template <class InputIterator> void insert(InputIterator first, InputIterator last);
  void insert(initializer_list<T> il);
  iterator erase(const_iterator position);
  iterator erase(const_iterator first, const_iterator last);
  void swap(colony&) noexcept(allocator_traits<Allocator>::propagate_on_container_swap::value || allocator_traits<Allocator>::is_always_equal::value);
  void clear() noexcept;


  // colony operations
  void splice(colony &x);

  std::limits block_limits() const noexcept;
  void reshape(std::limits block_capacities);

  iterator get_iterator_from_pointer(pointer p) const noexcept;

  void sort();
  template <class Compare> void sort(Compare comp);
}


template<class InputIterator, class Allocator = allocator<iter-value-type <InputIterator>>>
  colony(InputIterator, InputIterator, Allocator = Allocator())
    -> list<iter-value-type <InputIterator>, Allocator>;

// swap
template <class T, class Allocator, typename Skipfield>
  void swap(colony<T, Allocator, Skipfield>& x, colony<T, Allocator, Skipfield>& y)
    noexcept(noexcept(x.swap(y)));

// advance
template <class T, class Allocator, typename Skipfield, class Distance>
  void advance(colony<T, Allocator, Skipfield>::iterator &it, colony<T, Allocator, Distance n);
template <class T, class Allocator, typename Skipfield, class Distance>
  void advance(colony<T, Allocator, Skipfield>::const_iterator &it, colony<T, Allocator, Distance n);
template <class T, class Allocator, typename Skipfield, class Distance>
  void advance(colony<T, Allocator, Skipfield>::reverse_iterator &it, colony<T, Allocator, Distance n);
template <class T, class Allocator, typename Skipfield, class Distance>
  void advance(colony<T, Allocator, Skipfield>::const_reverse_iterator &it, colony<T, Allocator, Distance n);

// next
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::iterator next(const colony<T, Allocator, Skipfield>::iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::const_iterator next(const colony<T, Allocator, Skipfield>::const_iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::reverse_iterator next(const colony<T, Allocator, Skipfield>::reverse_iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::const_reverse_iterator next(const colony<T, Allocator, Skipfield>::const_reverse_iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);

// prev
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::iterator prev(const colony<T, Allocator, Skipfield>::iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::const_iterator prev(const colony<T, Allocator, Skipfield>::const_iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::reverse_iterator prev(const colony<T, Allocator, Skipfield>::reverse_iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::const_reverse_iterator prev(const colony<T, Allocator, Skipfield>::const_reverse_iterator it, colony<T, Allocator, Skipfield>::iterator::difference_type distance = 1);

// distance
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::iterator::difference_type distance(const colony<T, Allocator, Skipfield>::iterator first, const colony<T, Allocator, Skipfield>::iterator last);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::iterator::difference_type distance(const colony<T, Allocator, Skipfield>::const_iterator first, const colony<T, Allocator, Skipfield>::const_iterator last);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::iterator::difference_type distance(const colony<T, Allocator, Skipfield>::iterator first, const colony<T, Allocator, Skipfield>::iterator last);
template <class T, class Allocator, typename Skipfield>
  colony<T, Allocator, Skipfield>::iterator::difference_type distance(const colony<T, Allocator, Skipfield>::const_iterator first, const colony<T, Allocator, Skipfield>::const_iterator last);

// erase
template <class T, class Allocator, class Skipfield, class Predicate>
  colony<T, Allocator, Skipfield>::size_type erase_if(colony<T, Allocator, Skipfield>& c, Predicate pred);
template <class T, class Allocator, class Skipfield, class U>
  colony<T, Allocator, Skipfield>::size_type erase(colony<T, Allocator, Skipfield>& c, const U& value);

26.3.14.2 colony constructors, copy, and assignment [colony.cons]

26.3.14.3 colony capacity [colony.capacity]

227) reserve() uses Allocator::allocate() which may throw an appropriate exception.

26.3.14.4 colony modifiers [colony.modifiers]

26.3.14.5 Operations [colony.operations]

26.3.14.6 Specialized algorithms [colony.special]

26.3.14.7 Erasure [colony.erasure]

VII. Acknowledgements

Matt would like to thank: Glen Fernandes and Ion Gaztanaga for restructuring advice, Robert Ramey for documentation advice, various Boost and SG14 members for support, critiques and corrections, Baptiste Wicht for teaching me how to construct decent benchmarks, Jonathan Wakely, Sean Middleditch, Jens Maurer (very nearly a co-author at this point really), Patrice Roy and Guy Davidson for standards-compliance advice and critiques, support, representation at meetings and bug reports, Henry Miller for getting me to clarify why the instrusive list/free list approach to memory location reuse is the most appropriate, that ex-Lionhead guy for annoying me enough to force me to implement the original skipfield pattern, Jon Blow for some initial advice and Mike Acton for some influence, the community at large for giving me feedback and bug reports on the reference implementation.
Also Nico Josuttis for doing such a great job in terms of explaining the general format of the structure to the committee.

VIII. Appendices

Appendix A - Basic usage examples

Example demonstrating pointer stability

Appendix B - Reference implementation benchmarks

Benchmark results for the colony v5 reference implementation under GCC 8.1 x64 on an Intel Xeon E3-1241 (Haswell) are here.

Old benchmark results for an earlier version of colony under MSVC 2015 update 3, on an Intel Xeon E3-1241 (Haswell) are here. There is no commentary for the MSVC results.

Appendix C - Frequently Asked Questions

Appendix D - Specific responses to previous committee feedback

Appendix E - Typical game engine requirements

Here are some more specific requirements with regards to game engines, verified by game developers within SG14:

Game developers therefore either develop custom solutions for each scenario or implement workarounds for vector. The most common workarounds are most likely the following or derivatives:

Colony brings a more generic solution to these contexts. While some developers, particularly AAA developers, will almost always develop a custom solution for specific use-cases within their engine, I believe most sub-AAA and indie developers are more likely to rely on third party solutions. Regardless, standardising the container will allow for greater cross-discipline communication.

Appendix F - Time complexity requirement explanations

Insert (single): O(1)

One of the requirements of colony is that pointers to non-erased elements stay valid regardless of insertion/erasure within the container. For this reason the container must use multiple memory blocks. If a single memory block were used, like in a std::vector, reallocation of elements would occur when the container expanded (and the elements were copied to a larger memory block). Instead, colony will insert into existing memory blocks when able, and create a new memory block when all existing memory blocks are full. This keeps insertion at O(1).

Insert (multiple): O(N)

Multiple insertions may allow an implementation to reserve suitably-sized memory blocks in advance, reducing the number of allocations necessary (whereas singular insertion would generally follow the implementation's block growth pattern, possibly allocating more than necessary). However when it comes to time complexity it has no advantages over singular insertion, is linear to the number elements inserted.

Erase (single): O(1)

Erasure is a simple matter of destructing the element in question and updating the skipfield. Since we use a skipfield to indicate erasures to the iterator, no reallocation of subsequent elements is necessary and the process is O(1). Additionally, when using the Low-complexity jump-counting pattern the skipfield update is also always O(1).

Note: When a memory block becomes empty of non-erased elements it must be freed to the OS (or stored for future insertions, depending on implementation) and removed from the colony's sequence of memory blocks. It it was not, we would end up with non-O(1) iteration, since there would be no way to predict how many empty memory blocks there would be between the current memory block being iterated over, and the next memory block with non-erased (active) elements in it.

Erase (multiple): O(N) for non-trivially-destructible types, for trivially-destructible types between O(1) and O(N) depending on range start/end, approximating O(log n) average

In this case, where the element is non-trivially destructible, the time complexity is O(N), with infrequent deallocation necessary from the removal of an empty memory block as noted above. However where the elements are trivially-destructible, if the range spans an entire memory block at any point, that block and it's skipfield can simply be removed without doing any individual writes to it's skipfield or individual destruction of elements, potentially making this a O(1) operation.

In addition (when dealing with trivially-destructible types) for those memory blocks where only a portion of elements are erased by the range, if no prior erasures have occurred in that memory block you can erase that range in O(1) time, as there will be no need to check the skipfield within the range for previously erased elements. The reason you would need to check for previously erased elements within that portion's range is so you can update the metadata for that memory block to accurately reflect how many non-erased elements remain within the block. The non-erased element-count metadata is necessary because there is no other way to ascertain when a memory block is empty of non-erased elements, and hence needs to be removed from the colony's iteration sequence. The reasoning for why empty memory blocks must be removed is included in the Erase(single) section, above.

However in most cases the erase range will not perfectly match the size of all memory blocks, and with typical usage of a colony there is usually some prior erasures in most memory blocks. So, for example, when dealing with a colony of a trivially-destructible type, you might end up with a tail portion of the first memory block in the erasure range being erased in O(N) time, the second and intermediary memory block being completely erased and freed in O(1) time, and only a small front portion of the third and final memory block in the range being erased in O(N) time. Hence the time complexity for trivially-destructible elements approximates O(log n) on average, being between O(1) and O(N) depending on the start and end of the erasure range.

std::find: O(N)

splice: O(1)

Colony only does full-container splicing, not partial-container splicing (use range-insert with std::make_move_iterator to achieve the latter, albiet with the loss of pointer validity to the moved range). When splicing, the memory blocks from the source colony are transferred to the destination colony without processing the individual elements. These blocks may either be placed at the front of the colony or the end, depending on how full the source back block is compared to the destination back block. If the destination back block is more full ie. there is less unused space in it, it is better to put it at the beginning of the source block - as otherwise this creates a larger gap to skip during iteration which in turn affects cache locality. If there are unused element memory spaces at the back of the destination container (ie. the final memory block is not full), the skipfield nodes corresponding to those empty spaces must be altered to indicate that these are skipped elements. Again when using the Low-complexity jump-counting pattern for the skipfield this is also a O(1) operation, hence the overall operation is O(1).

Iterator operators ++ and --: O(1) amortized

Generally the time complexity is O(1), and any skipfield pattern used must allow for O(1) skipping of multiple erased elements. However every so often iteration will involve a transistion to the next/previous memory block in the colony's sequence of blocks, depending on whether we are doing ++ or --. At this point a read of the next/previous memory block's corresponding skipfield is necessary, in case the front/back element(s) in that memory block are erased and hence skipped. So for every block transition, 2 reads of the skipfield are necessary instead of 1. Hence the time complexity is O(1) amortized.

Skipfields must be per-block and independent between memory blocks, as otherwise you would end up with a vector for a skipfield, which would need a range erased every time a memory block was removed from the colony (see notes under Erase above), and reallocation to a larger skipfield memory block when a colony expanded. Both of these procedures carry reallocation costs, meaning you could have thousands of skipfield nodes needing to be reallocated based on a single erasure (from within a memory block which only had one non-erased element left and hence would need to be removed from the colony). This is unacceptable latency for any field involving high timing sensitivity (all of SG14).

begin()/end(): O(1)

For any implementation these should generally be stored as member variables and so returning them is O(1).

advance/next/prev: between O(1) and O(n), depending on current iterator location, distance and implementation. Average for reference implementation approximates O(log N).

The reasoning for this is similar to that of Erase(multiple), above. Complexity is dependent on state of colony, position of iterator and length of distance, but in many cases will be less than linear. It is necessary in a colony to store metadata both about the capacity of each block (for the purpose of iteration) and how many non-erased elements are present within the block (for the purpose of removing blocks from the iterative chain once they become empty). For this reason, intermediary blocks between the iterator's initial block and its final destination block (if these are not the same block, and if the initial block and final block are not immediately adjacent) can be skipped rather than iterated linearly across, by subtracting the "number of non-erased elements" metadata from distance for those blocks.

This means that the only linear time operations are any iterations within the initial block and the final block. However if either the initial or final block have no erased elements (as determined by comparing whether the block's capacity metadata and the block's "number of non-erased elements" metadata are equal), linear iteration can be skipped for that block and pointer/index math used instead to determine distances, reducing complexity to constant time. Hence the best case for this operation is constant time, the worst is linear to the distance.

distance: between O(1) and O(n), depending on current iterator location, distance and implementation. Average for reference implementation approximates O(log N).

The same considerations which apply to advance, prev and next also apply to distance - intermediary blocks between iterator1 and iterator2's blocks can be skipped in constant time, if they exist. iterator1's block and iterator2's block (if these are not the same block) must be linearly iterated across using ++ unless either block has no erased elements, in which case the operation becomes pointer/index math and is reduced to constant time for that block. In addition, if iterator1's block is not the same as iterator2's block, and iterator2 is equal to end() or (end() - 1), or is the last element in that block, iterator2's block's elements can also counted from the metadata rather than iteration.

All read-only operations, swap, std::swap, splice, operator= && (source), reserve, trim	Never.
clear, operator= & (destination), operator= && (destination)	Always.
reshape	Only if memory blocks exist whose capacities do not fit within the supplied limits.
shrink_to_fit	Only if capacity() != size().
erase	Only for the erased element. If an iterator is == end() it may be invalidated if the back element of the colony is erased (similar to deque (26.3.8)). Likewise if a reverse_iterator is == rend() it may be invalidated if the first element in the colony is erased.
insert, emplace	If an iterator is == end() it may be invalidated by a subsequent insert/emplace. Likewise if a reverse_iterator is == rend() it may be invalidated by a subsequent insert/emplace.

colony	Insertion	Erasure	Iteration	Read
Insertion	No	No	No	Yes
Erasure	No	No	No	Mostly*
Iteration	No	No	Yes	Yes
Read	Yes	Mostly*	Yes	Yes

std::vector	Insertion	Erasure	Iteration	Read
Insertion	No	No	No	No
Erasure	No	No	No	No
Iteration	No	No	Yes	Yes
Read	No	No	Yes	Yes

Introduction of std::colony to the standard library

Table of Contents

Revision history

I. Introduction

Insertion to back

Non-back erasure

II. Questions for the Committee

III. Motivation and Scope

IV. Impact On the Standard

V. Design Decisions

1. Collection of element memory blocks + metadata

2. A non-boolean skipfield which allows for O(1) traversal from each non-erased element to the next

3. Erased-element location recording mechanism

Implementation of iterator class

Additional notes on specific functions

Results of implementation

VI. Technical Specification

26.3.7 Header <colony> synopsis [colony.syn]

Iterator Invalidation

26.3.14 Class template colony [colony]

26.3.14.1 Class template colony overview [colony.overview]

26.3.14.2 colony constructors, copy, and assignment [colony.cons]

26.3.14.3 colony capacity [colony.capacity]

26.3.14.4 colony modifiers [colony.modifiers]

26.3.14.5 Operations [colony.operations]

26.3.14.6 Specialized algorithms [colony.special]

26.3.14.7 Erasure [colony.erasure]

VII. Acknowledgements

VIII. Appendices

Appendix A - Basic usage examples

Example demonstrating pointer stability

Appendix B - Reference implementation benchmarks

Appendix C - Frequently Asked Questions

Where is it worth using a colony in place of other std:: containers?

What are some examples of situations where a colony might improve performance?

Is it similar to a deque?

What are the thread-safe guarantees?

Any pitfalls to watch out for?

What is the purpose of limiting memory block minimum and maximum sizes?

What is colony's Abstract Data Type (ADT)?

Why must blocks be removed from the iterative sequence when empty?

Why not reserve all empty memory blocks for future use during erasure, or None, rather than leaving this decision undefined by the specification?

Memory block sizes - what are they based on, how do they expand, etc

Can a colony be used with SIMD instructions?

Appendix D - Specific responses to previous committee feedback

"Why not 'bag'? Colony is too selective a name."

"Unordered and no associative lookup, so this only supports use cases where you're going to do something to every element."

"Do we really need the skipfield_type template argument?"

"Prove this is not an allocator"

"If this is for games, won't game devs just write their own versions for specific types in order to get a 1% speed increase anyway?"

"Is there active research in this problem space? Is it likely to change in future?"

Why not iterate across the memory blocks backwards to find the first block with erasures to reuse, during insert?