Document number: N4348
Date: 2015-02-02
Project: Programming Language C++, Library Working Group
Revises: N4159
Reply to: Geoff Romer <gromer@google.com>, Samuel Benzaquen <sbenza@google.com>

Making `std::function` safe for concurrency

Motivation and Scope
Design Overview
Alternatives Considered
User Impact
Proposed Wording
Appendix A: A Digression on the Meaning of const

Motivation and Scope

std::function's operator() is a const member, and yet it invokes the target function through a non-const access path (the standard is somewhat ambiguous, but this appears to be the universal existing practice). Consequently, it may mutate the target, if the target is a function object with a non-const operator(). This violates not only basic tenets of const-correctness, but also the data race avoidance guarantees in [res.on.data.races]/p3, which states that "A C++ standard library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-const arguments, including this".

The wording in [res.on.data.races] embodies a best-practice approach to concurrency in C++. One of the fundamental challenges of writing concurrent code is avoiding data races, which the language specifies in terms of reading and writing memory locations. The standard library for the most part treats how it operates on memory locations as a hidden implementation detail, but it must somehow specify its behavior in sufficient detail that the user can easily tell which combinations of library operations may induce data races, and which won't. By far the most common way to do this is to classify operations according to what data they read and what data they write, and specify that a data race can occur between a pair of unsynchronized operations only if they both operate on the same data, and at least one of them writes to it. Notably, the core language takes this approach in specifying the fundamental types. The only other plausible approaches would be to specify that no two operations can race, or to specify that any two operations can race, but these are, respectively, unimplementable and unusable.

So in order to be usable in concurrent code, a library must specify, for every input to every operation, whether that operation writes to it or only reads from it. This is exactly what the const language feature was designed for, and what it is used for: to specify which function parameters are guaranteed not to be written by that function, in a way that can be enforced at compile time (recall that const was spelled readonly in early versions of C++). To be sure, it is logically possible for the library to specify one set of read/write semantics for purposes of static checking, and a separate set of read/write semantics for purposes of run-time program correctness. However, this would make the library harder to specify and harder to understand, and deny users the benefits of static checking exactly where it would be most useful. It would also seriously degrade code reusability, because polymorphic code would be unable to distinguish reads from writes, and therefore be unable to avoid data races other than by aggressively locking all operations.

By the same token, when the library specifies whether it reads or writes to its inputs, it cannot directly make that guarantee in terms of reading and writing memory locations, because those may be hidden implementation details of user types. Instead, the library must specify whether it reads or writes according to the input type's definition of reading and writing— in other words, by whether it invokes any non-const operations on the input.

One corrollary of these points is the widely-accepted design principle that the const methods of well-behaved types must not invoke non-const operations on the object's internal state without synchronization. std::function violates this principle, which would be a defect regardless of whether it violates the strict wording of [res.on.data.races]. It would be bad enough that basic concurrency principles do not apply to such a central vocabulary type, but this is compounded by the fact that a core use case for std::function is as a callback wrapper for concurrency APIs, so it is particularly likely to be used in concurrent settings.

One response to this issue has been to claim that std::function is not defective, but is simply implementing pointer-like "shallow const" semantics (i.e. the target function object is not part of the std::function's internal state). However, std::function does not behave like a pointer in any other respect, so behaving like a pointer in this one will be surprising and confusing. Furthermore, it is fundamentally incoherent for a type to have "shallow" const semantics, but "deep" copy semantics. For a detailed discussion of this point, see the Appendix.

We propose to correct this defect by specifying that std::function invokes its target through a const access path, and that the program is ill-formed if the user supplies a target that cannot be invoked that way. We also present wording for an alternative resolution, explicitly specifying that operator() is exempt from the guarantees of [res.on.data.races]/p3, but we do not recommend adopting that approach.

In N4159 we discussed this problem together with some related issues in std::function's API, in order to ensure they were addressed in a consistent way. We are satisfied that the joint discussion has served its purpose, and the issues are now better handled separately: the issue outlined above is a clear defect in the library as it stands, and so should be addressed by LWG, targeting C++17 as a shipping vehicle. However, the other issues discussed in N4159 are clearly feature requests, and so they belong with LEWG, with a Library Fundamentals TS as a target ship vehicle. Consequently, the other aspects of N4159 will be addressed in a separate paper; this paper will focus on the aforementioned issue.

Design Overview

Our core proposal is to revise std::function's operator() to explicitly specify that it invokes the target via a const access path. This implicitly requires that such an invocation be well-formed; we make that requirement explicit at the points where the user specifies the target (namely the constructor and assignment operator) because it is neither possible nor desirable to postpone the error until operator() is actually called. We introduce the term Const-Lvalue-Callable to refer to callable types that satisfy this requirement.

All known implementations of std::function currently permit it to be initialized with a target that is callable only via a non-const access path, so this change has the potential to break existing user code. Most such breakages should be trivially fixable by adding const to functor definitions (and removing unnecessary mutable qualifiers from lambda expressions), but there will also be a small minority of cases where std::function wraps a target whose operator() genuinely does modify internal state (see below for further discussion of the likely user impact).

In order to provide a smooth migration path for such code, we propose an adaptor std::const_unsafe_fun to coerce any callable type to an otherwise-equivalent Const-Lvalue-Callable type (which effectively holds the target function as a mutable member). Thus, any valid C++14 code which passes a target f to std::function can be mechanically rewritten to pass std::const_unsafe_fun(f) instead, thereby exactly restoring the C++14 behavior.

Of course, code that uses std::const_unsafe_fun will be unsafe in exactly the same way that code using std::function currently is (hence the name); this is a transitional measure rather than a permanent solution. In the longer run, the standard library will need to provide a mutable function wrapper to support such use cases, but as discussed above this should go through the TS process before entering the main standard, and so cannot feasibly be ready for C++17. Once this wrapper is available, any remaining uses of std::const_unsafe_fun should migrate to it, after which std::const_unsafe_fun can be removed from the language. We propose introducing std::const_unsafe_fun in Annex D, to make clear that it is not a best-practice, and to reflect our intention to eventually remove it.

There is one case in which this change can affect the run-time behavior of existing code: if a std::function is passed a functor that has both const and non-const operator() overloads, it will currently target the non-const overload, but with this change it would target the const overload. However, such a combination of circumstances appears to be very rare; we found no occurrences of it in our codebase. Furthermore, for such an overload pair to have observably different behavior, in a way that's detrimental to the program, strikes us as an extremely dubious coding practice. So behavioral changes should be extremely rare, and when they do occur, they will most likely be exposing latent bugs rather than introducing new ones.

Jonathan Wakely observes that if this proposal is adopted, library vendors can begin to prepare their users for this transition immediately, by implementing the proposed SFINAE restrictions on std::function's constructor and assignment operator, and providing complementary SFINAE-controlled overloads for handling functors that are permitted in C++14 but not in C++17. The latter could be flagged with a [[deprecated]] attribute, and conditionally disabled in C++17 mode.

Feature test macro

It will ease the transition if std::const_unsafe_fun is provided in C++14 mode as a vendor extension, so that users can introduce it incrementally, rather than applying it to their code and switching to C++17 in a single atomic step. However, since this practice may not be universal, we recommend that SD-6 provide a feature-test macro for std::const_unsafe_fun, so that portable code can use it conditionally. We recommend the macro be named __cpp_lib_const_unsafe_fun and be defined in the <functional> header. No feature-test macro is needed for the changes to std::function itself, because (apart from the aforementioned rare and dubious exceptions) this change is a pure narrowing of std::function's interface.

Alternatives Considered

The first alternative below is new to this paper. The others were presented in N4159 and discussed in Urbana, but are presented here for completeness, in some cases with updated discussion. LEWG straw poll results from the Urbana discussion can be found here. No straw polls were recorded for the LWG discussion, but there was a clear consensus in favor of the approach described above.

Overloading on constness

In the vast majority of cases, our proposed change will either have no effect, or make existing code ill-formed. However, there is one case where this change can change the run-time behavior of existing code: if a std::function is passed a functor that has both const and non-const operator() overloads, it will currently target the non-const overload, but with this change it would target the const overload.

We could mitigate this problem by giving std::function itself a non-const operator() overload, which would invoke the target via a non-const access path. Thus, std::function would continue to invoke the target via a non-const access path, unless the std::function is itself const. This is still a behavior change, but in a narrower set of cases where the change is more unambiguously correct. This approach would also give users the ability to control how std::function's constness affects the target, by defining and using wrappers (like const_unsafe_fun) that implement their preferred policies using a const/non-const overload pair.

However, std::function currently has the property that it wraps only a single function, not an overload set: the code executed by operator() does not vary with the type, value category, or cv-qualification of its arguments (including the implicit object argument). Adding such an overload would compromise this useful conceptual simplification, and invite the question of whether we should also overload on reference qualifiers, or even on argument qualifiers. Given that it is not feasible for std::function to represent an entire overload set, it seems preferable for it to represent only a single function, rather than attempt to occupy some intermediate position.

Furthermore, as discussed earlier, functors which have both const and non-const overloads with non-equivalent behavior should be extremely rare (and of dubious value). The benefits of this approach therefore seem relatively minimal, compared to the drawbacks.

Run-time error handling

A variant of the above approach would be for std::function to provide both const and non-const overloads of operator(), and accept all the same targets it currently does (so no that code would be rendered ill-formed by this change), and specify the const overload to throw an exception (or perhaps terminate the program) if invoked with a target that is not Const-Lvalue-Callable. This means that more code would continue to work unchanged, but at the cost of introducing many more run-time failures, mostly in code that could be caught and trivially fixed at build time under our proposed approach. It also still has the problem of adding complexity to std::function's interface.

Internal synchronization

In principle, we could solve the concurrency issue (but not the const issue) by adding internal synchronization to std::function. However, this renders it vulnerable to deadlock. LEWG consensus against this was unanimous (the word "unthinkable" was used), so we need say no more about it.

The status quo

A final option is to standardize the status quo, making it explicit that std::function::operator() invokes the target via a non-const access path, and is exempt from [res.on.data.races]/p3. This approach is expedient because it requires no implementation changes, and hence no overt breakage that could be blamed on vendors or the standard. However, we believe this would do users a serious disservice, by passing the concurrency buck to them while withholding a vital tool for avoiding data races.

While we don't wish to make an argument from popularity, it is noteworthy that this option has not attracted any significant support from standard library vendors, who would benefit the most from its expediency advantages.

User Impact

The proposed change can only affect code that initializes a std::function (via construction or assignment) from an object of class type that has a non-const operator() overload that matches the std::function's signature. There are two cases to consider:

If the class does not also have a matching operator() that is const, the code will be ill-formed.
If the class has a matching const operator(), the code will remain well-formed, but its run-time behavior will change. This difference will be observable only to the extent that the two overloads have different behavior.

To assess how common these cases are likely to be in practice, we conducted a ClangMR analysis of a codebase of over 100 million lines of C++ code, identifying every location where a std::function is given a new target. More precisely, we identified every construction or assignment of a std::function where the sole argument (hereinafter the "target expression") is a function, functor (excluding std::function), or function pointer (pointers to member functions, which are fairly rare, were not considered for simplicity), and classified according to whether the target type has a const and/or non-const operator(). Note that the objects produced by std::bind (and, equivalently, boost::bind) may have const/non-const overload pairs for forwarding purposes, but they just forward to the corresponding overloads (if any) of the bound function. So for purposes of this analysis, when a std::function is initialized with a bind wrapper, we treat bind's first argument as the target expression.

The codebase contained over 70,000 distinct target expressions, of which less than 5,000 had a non-const operator(). We manually classified the operator definitions according to what they mutate, because that dictates how to fix them. Less than 80 target expressions genuinely mutated the internal state of the target, and so would have to be wrapped in std::const_unsafe_fun. The remainder can be fixed by adding const to or removing mutable from the operator definition (with no change to the target expression). Less than 150 distinct operator definitions would have to be modified in this way to fix all the remaining target expressions. Other than outputs of std::bind and boost::bind (handled as discussed above), we found no cases at all where a single target expression had both const and non-const overloads.

In short, this codebase would not experience any run-time changes from our proposal, and all compile-time problems could be fixed with less than 250 single-line mechanical edits (compared to over 70,000 initializations of std::function, and over 100 million lines of code total).

Of the operator definitions that did not genuinely modify internal state, roughly 15% mutated nothing at all (almost all were class types, where the author presumably just forgot the const). Another 15% were class types that mutated some state outside the type, and 70% were lambda expressions that mutated external state via reference or pointer captures. One might object that mutated data could be logically internal to the object, even if it is not physically internal. This could be true in principle, but in practice physical and logical constness were equivalent in every case. Our criterion for whether the data was logically internal to the function object was based on the principles discussed in the Appendix: if the function object's copy constructor does not make a copy of the data (e.g. if it shallow-copies a pointer to it), the data cannot be considered internal. For example, consider code like the following:

int i = 0;
auto f = [&i]() mutable {i++;};

We conjecture that the people writing code like this are thinking of constness as representing functional purity (absence of side effects). This is a tempting intuition, but it is defeated by the fact that code with only const access to f can still invoke operator() on a copy of f.

In interpreting this data, we must bear in mind its limitations. Most notably, there are many ways it could fail to be representative of C++ programmers at large. It covers a single codebase, produced (with the exception of a small amount of open-source code) by a single company with a fairly uniform C++ culture. This codebase has a well-established pre-C++11 callback infrastructure, and although std::function is now encouraged over those legacy callbacks, it was forbidden until less than a year ago. Nonetheless, this is the only data we have; we must look for our keys where the light is.

Proposed Wording

Note that this wording incorporates an updated resolution of LWG 2393. All changes are relative to N4296.

Change 20.9.11.2 [func.wrap.func] p2 as follows:

A callable ~~object f of~~ type (20.9.1 [func.def]) F is Const-Lvalue-Callable for argument types ArgTypes and return type R if the expression INVOKE(fdeclval<const F&>, declval<ArgTypes>()..., R), considered as an unevaluated operand (Clause 5), is well formed (20.9.2).

Change 20.9.11.2.1 [func.wrap.func.con] p8 and p21 as follows:

template<class F> function(F f);
template <class F, class A> function(allocator_arg_t, const A& a, F f);
…

8 Remarks: These constructors shall not participate in overload resolution unless fF is Const-Lvalue-Callable (20.9.11.2) for argument types ArgTypes... and return type R.
…
template function& operator=(F&& f);
…

21 Remarks: This assignment operator shall not participate in overload resolution unless declval<typename decay<F>::type&>()decay_t<F> is Const-Lvalue-Callable (20.9.11.2) for argument types ArgTypes... and return type R.

Change 20.9.11.2.4 [func.wrap.func.inv] as follows:

R operator()(ArgTypes... args) const
1 Effects: INVOKE(f, std::forward<ArgTypes>(args)..., R) (20.9.2), where f is a const lvalue expression referring to the target object (20.9.1) of *this.

2 Returns: Nothing if R is void, otherwise the return value of INVOKE(f, std::forward<ArgTypes>(args)..., R).

3 Throws: bad_function_call if !*this; otherwise, any exception thrown by the wrapped callable object.

Add a new section to Annex C as follows:

C.X.X Clause 20: General utilities library [diff.cpp14.func.wrap]

20.9.11.2

Change: std::function requires its target to be callable via a const access path, and always calls it via that path.

Rationale: Without this restriction, std::function cannot satisfy the library's data race avoidance requirements.

Effect on original feature: Valid C++ 2014 code may fail to compile or may change meaning in this International Standard. For example, the following code is valid in both C++ 2014 and in this International Standard, but produces different observable behavior:
struct S {
  bool operator()() { return true; }
  bool operator()() const { return false; }
};
const std::function f(S{});
bool b = f();
// b is true in C++14, but false in this International Standard

Change D.8.2 [depr.function.adaptors] as follows:

1 The adaptors ptr_fun, mem_fun, mem_fun_ref, and their corresponding return types are deprecated. [ Note: The function template bind 20.9.9 provides a better solution. — end note ]

2 The adaptor const_unsafe_fun and its corresponding return type are deprecated.

Add a new section to D.8.2 [depr.function.adaptors] as follows:

D.8.2.X Const-Lvalue-Callable function adaptor [depr.const.unsafe.fun]

const_unsafe_fun takes an argument of a callable type and wraps it in a Const-Lvalue-Callable interface, regardless of whether the argument is Const-Lvalue-Callable.

[ Note: The purpose of this adaptor is to provide a temporary, mechanical migration path for C++ 2014 code that is affected by the changes described in [diff.cpp14.func.wrap]. — end note ]

template <class F> unspecified const_unsafe_fun(F f);

Requires: F shall be a callable type, and shall satisfy the requirements of CopyConstructible.

Returns: A Const-Lvalue-Callable (20.9.11.2) forwarding call wrapper (20.9.2) whose target object is a copy of f. The forwarding step shall access the target object as an lvalue via a non-const access path, and consequently is permitted to modify the target object, even when the wrapper is invoked via a const access path. [Note: this supersedes the data race avoidance requirements specified in 17.6.5.9. The caller is responsible for ensuring that the wrapper is not invoked simultaneously from multiple threads, or that simultaneous invocations of the target object do not lead to data races. — end note ]

Remarks: The return type shall satisfy the requirements of CopyConstructible.

Alternate wording

The following wording resolves the internal contradiction by standardizing the status quo that operator() is an exception to [res.on.data.races]/p3. We do not recommend this approach, but offer wording for it it in case the committee is unwilling to make a breaking change as proposed above.

Change 20.9.11.2.4 [func.wrap.func.inv] as follows:

R operator()(ArgTypes... args) const
1 Effects: INVOKE(f, std::forward<ArgTypes>(args)..., R) (20.9.2), where f is a non-const lvalue expression referring to the target object (20.9.1) of *this. [ Note: This may modify the target object. This function is consequently an exception to the general requirement (17.6.5.9) that a standard library function will not modify objects except via the function's non-const arguments. — end note ]

2 Returns: Nothing if R is void, otherwise the return value of INVOKE(f, std::forward<ArgTypes>(args)..., R).

3 Throws: bad_function_call if !*this; otherwise, any exception thrown by the wrapped callable object.

Appendix A: A digression on the meaning of `const`

This issue raises the basic question of what we intend const to mean. For example, some people have argued for the status quo on the grounds that std::function just has "shallow const" semantics, like a pointer, and so this is not a defect. Similarly, our investigation of the user impact of this change raises the question of how to tell whether the absence of const on a method (or even the presence of mutable on a lambda) is a programmer error.

The language imposes certain constraints on what const means, but it gives the library designer the ability to impose whatever higher-level semantics they want, e.g. "physical" vs. "logical" constness, and "shallow" vs. "deep" constness. Intuition is normally a reliable guide, but in areas of disagreement or uncertainty, it's useful to make our intuitions explicit, so we can reason about them. To that end, we present the following:

The core intuitive meaning of const is "read-only" (indeed, it was spelled readonly in early versions of C++). By declaring something as const, we give up the ability to mutate the underlying object, or in other words to change the object's logical state. Thus, if a member function can leave the object in a different logical state, it must not be marked const (notice that this is a generalization of the rule that the core language already imposes on the physical state). Conversely, if a member function cannot modify the object's logical state, it is wasteful (and frequently pointless, as we will see) to mark it as non-const. A library designer may have wide latitude to choose how they define the logical states of a class, but having made that decision, they have little further discretion in how they const-qualify their members. Thus, for example, the choice is not between "shallow const" and "deep const", but between "shallow" and "deep" notions of the object's logical state (i.e. does the logical state depend on the pointer itself, or on the pointee?).

This is significant, because several other aspects of the design of a class depend on its "logical state". In particular, consider the copy constructor: it has a postcondition that the constructed object has the same logical state as the source object ([utility.arg.requirements] says "equivalent", without further defining that term). It has another, less obvious postcondition: the two objects must be independent, i.e. subsequent changes to the logical state of one will have no effect on the other. Notice that this also is a generalization of a language-level rule about the physical state. This postcondition often goes unmentioned (probably because it is so self-evident at the physical level), but it is nonetheless crucial: the fundamental purpose of copying, as a computing operation, is to decouple subsequent operations on the copies. It's also fundamental to the ordinary meaning of the word "copy": I cannot copy something by holding a mirror up to it, or by creating a new name to refer to it by. A copy is, by definition, an object with an independent existence from the original.

Thus, the copy constructor must copy the entire logical state of the object (or more precisely the representation of it), and the copy must be "deep", with the partial exception that copying of immutable logical state may be "shallow". Conversely, anything that's mutable and shallow-copied cannot be part of the logical state. This brings us back to the observation that it's often pointless to treat a member function as non-const because it mutates something outside the logical state: if the type is copyable, treating the member function as non-const provides no practical benefit, because the restriction can be trivially evaded (even accidentally) by performing the mutation via a copy of the object.

Notice the recurrence of the terms "deep" and "shallow", and the connection to mutation. This is not a coincidence. Like "deep const" and "shallow const", "deep copy" and "shallow copy" are not really independent type design options, but manifestations of the deeper choice between "shallow" and "deep" logical state. And this is the point: both the const semantics and the copy semantics are determined by that choice, so they must be consistent with each other. Const methods cannot modify what the copy constructor copies, and the copy constructor must copy anything that const methods aren't allowed to modify.

It bears mentioning that the equality comparison operators should also be defined in terms of the logical state, and so must be consistent with the copying and const semantics, but that isn't relevant for present purposes.

One corollary of this is that defenders of the status quo are not correct to say that std::function is properly using shallow const semantics (even leaving aside the point that its target() member has deep const semantics). That position would imply that std::function's logical state does not contain the target object, but only some pointer to it. Consequently, the copy constructor's postcondition must be that the two std::function objects point to the same target object, but the copy constructor's current specification clearly contradicts that. Either way, there's a defect in std::function, so by taking that tack, we'd just be deciding to break std::function in a way that creates more transitional pain for a less desirable end state.

As a final note, N3339's proposed value_ptr<T> is sometimes cited as a counterexample to this principle, but we believe it simply violates it. Its logical state consists of a T object, not a pointer, and it would be materially improved if its const members did not provide non-const access to the underlying pointer (and likewise if its comparison operators compared T values rather than T* addresses). Analogies with e.g. shared_ptr<T> are misleading; in effect, value_ptr<T> is a variant of optional<T> that uses heap allocation to support dynamic polymorphism.

Making std::function safe for concurrency