Recent research in concurrent programming [1] have identified a useful primitive which may be used to significantly reduce memory write contention in parallel and concurrent programs. A priority update is an operation which reads a memory location, compares its content to the value provided using a predicate, and writes the value to the memory location only if the predicate returns true. Two common predicates employed for this operation are less-than (in which case a priority update would be called write-with-min, however for consistency with existing atomic operations I propose to use name fetch-min) and greater-than (which I suggest to call fetch-max). A generalization of this operation would take an arbitrary predicate provided by the user, alongside with the value.
The operation can be robustly implemented using only memory read and CAS operation, 
as member functions of std::atomic might look like:
	
template <typename V>
T priority_update(T value, V predicate)
{
  T read = this->load();
  while (predicate(value, read) {
    if (this->compare_exchange_weak(read, value))
      return read;
  }
  return read;
}
T fetch_min(T value)
{ return priority_update(value, less<T>); }
T fetch_max(T value)
{ return priority_update(value, greater<T>); }
Paper [1] identifies a range of concurrent algorithms which, when implemented 
using the above described primitive, exhibit very good performance characteristics. 
If such algorithms were to become more popular in C++ , it would be useful to provide 
the primitive in <atomic> , rather than rely on the user to "Bring 
Your Own". This would serve the purpose of establishing a primitive which can 
be used for reasoning about, writing and reading of such concurrent algorithms, 
as well as allow users to automatically benefit from the hardware support for certain 
specializations of these operations, where it is available [2].
I propose that a set of new member functions priority_update,
fetch_min, fetch_max, and associated set of overloads 
taking explicit memory ordering parameters, with the behaviour as proposed in the 
code snippet above, be added to the template std::atomic for all types, 
that is integral, pointer and user types; as well as corresponding free functions
atomic_priority_update, atomic_fetch_min, atomic_fetch_max
and atomic_priority_update_explicit, atomic_fetch_min_explicit, atomic_fetch_max_explicit.
The rationale for including pointer types may require some explanation - the primary use of priority update is not to yield a meaningful number (in fact, the return value may often be ignored), but it is to significantly reduce the number of memory writes. For such uses it does not matter what quantity is being compared as long as full ordering is guaranteed, thus comparing certain memory addresses is, from the point of view of algorithm designer, a perfectly valid operation.
Operations Atomic_IMIN, Atomic_IMAX, Atomic_UMIN, Atomic_UMAX in 
Intel GFX L3 cache [2]
Operations atomic_imin, atomic_imax, atomic_umin, atomic_umax in Microsoft Shader Model 5 [3]
Paper [1] contains large number of references to research on priority updates.
Phillip B. Gibbons and Arch Robison encouraged writing of this paper.
[1] Julian Shun, Guy E. Blelloch, Jeremy T. Finemany, Phillip B. Gibbons "Reducing Contention Through Priority Updates", February 2013 CMU-CS-13-101 , http://reports-archive.adm.cs.cmu.edu/anon/2013/CMU-CS-13-101.pdf
[2] Intel® OpenSource HD Graphics Programmer’s Reference Manual (PRM) Volume 1 Part 7: L3$/URB (Ivy Bridge), May 2012 , https://01.org/linuxgraphics/sites/default/files/documentation/ivb_ihd_os_vol1_part7.pdf
[3] Microsoft Shader Model 5 Assembly http://msdn.microsoft.com/en-us/library/windows/desktop/hh447232%28v=vs.85%29.aspx