Proposal to extend atomic with priority update functions

Document number: WG21 N3696
Date: 2013-06-26
Reply-to: Bronek Kozicki brok@spamcop.net
Subgroup: SG1 - Concurrency

Introduction

Recent research in concurrent programming [1] have identified a useful primitive which may be used to significantly reduce memory write contention in parallel and concurrent programs. A priority update is an operation which reads a memory location, compares its content to the value provided using a predicate, and writes the value to the memory location only if the predicate returns true. Two common predicates employed for this operation are less-than (in which case a priority update would be called write-with-min, however for consistency with existing atomic operations I propose to use name fetch-min) and greater-than (which I suggest to call fetch-max). A generalization of this operation would take an arbitrary predicate provided by the user, alongside with the value.

The operation can be robustly implemented using only memory read and CAS operation, as member functions of std::atomic might look like:

template <typename V>
T priority_update(T value, V predicate)
{
  T read = this->load();
  while (predicate(value, read) {
    if (this->compare_exchange_weak(read, value))
      return read;
  }
  return read;
}

T fetch_min(T value)
{ return priority_update(value, less<T>); }

T fetch_max(T value)
{ return priority_update(value, greater<T>); }

Paper [1] identifies a range of concurrent algorithms which, when implemented using the above described primitive, exhibit very good performance characteristics. If such algorithms were to become more popular in C++ , it would be useful to provide the primitive in <atomic> , rather than rely on the user to "Bring Your Own". This would serve the purpose of establishing a primitive which can be used for reasoning about, writing and reading of such concurrent algorithms, as well as allow users to automatically benefit from the hardware support for certain specializations of these operations, where it is available [2].

Proposal

I propose that a set of new member functions priority_update, fetch_min, fetch_max, and associated set of overloads taking explicit memory ordering parameters, with the behaviour as proposed in the code snippet above, be added to the template std::atomic for all types, that is integral, pointer and user types; as well as corresponding free functions atomic_priority_update, atomic_fetch_min, atomic_fetch_max and atomic_priority_update_explicit, atomic_fetch_min_explicit, atomic_fetch_max_explicit.

The rationale for including pointer types may require some explanation - the primary use of priority update is not to yield a meaningful number (in fact, the return value may often be ignored), but it is to significantly reduce the number of memory writes. For such uses it does not matter what quantity is being compared as long as full ordering is guaranteed, thus comparing certain memory addresses is, from the point of view of algorithm designer, a perfectly valid operation.

Existing practice

Operations Atomic_IMIN, Atomic_IMAX, Atomic_UMIN, Atomic_UMAX in Intel GFX L3 cache [2]

Operations atomic_imin, atomic_imax, atomic_umin, atomic_umax in Microsoft Shader Model 5 [3]

Paper [1] contains large number of references to research on priority updates.

Acknowledgments

Phillip B. Gibbons and Arch Robison encouraged writing of this paper.

References

[1] Julian Shun, Guy E. Blelloch, Jeremy T. Finemany, Phillip B. Gibbons "Reducing Contention Through Priority Updates", February 2013 CMU-CS-13-101 , http://reports-archive.adm.cs.cmu.edu/anon/2013/CMU-CS-13-101.pdf

[2] Intel® OpenSource HD Graphics Programmer’s Reference Manual (PRM) Volume 1 Part 7: L3$/URB (Ivy Bridge), May 2012 , https://01.org/linuxgraphics/sites/default/files/documentation/ivb_ihd_os_vol1_part7.pdf

[3] Microsoft Shader Model 5 Assembly http://msdn.microsoft.com/en-us/library/windows/desktop/hh447232%28v=vs.85%29.aspx