Document number: | P2570R1 | |
---|---|---|
Date: | 2022-11-14 | |
Audience: | SG21 | |
Reply-to: | Andrzej Krzemieński <akrzemi1 at gmail dot com> |
In logic, a predicate is a symbol that represents a property or a relation.
A contract annotation, as proposed in
[P2521R2],
requires an expression of type bool
. In this paper we explore different possible
semantics of programs where the expressions inside contract annotations do not satisfy the
properties of a predicate from logic.
[[unsequenced]]
)
as potentilly useful in constraining contract predicates. See section {dis.uns}.Because the choice of syntax for contract annotations has no consensus yet, in this paper we use a placeholder notation:
int select(int i, int j) PRE(is_even(i)) // precondition PRE(is_even(j)) POST(r: is_even(r)) // postcondition; r names the return value { ASSERT(is_nonngative(_state)); // assertion statement if (_state == 0) return i; else return j; }
Ideally, a contract annotation reads something like the above, and we intuitively assume
that functions is_even()
and is_nonnegative()
are predicates in the logical sense,
describing the properties of the input variabes. Which implies that
if we made the program to evaluate these predicates
and discard the returned value, there would be no way to tell if
these predicates were actually evalated; and if we evaluated
any such predicate two times, the two evaluations would give the
same result.
However, this simple intuition is gone if programmers put arbitrary
expresions in contract predicates:
void f() PRE(++global_call_counter < 100);
Note that side effects may be hidden behind the function call:
void f() PRE(less_than_100_calls_so_far());
void turn_on_lights() PRE(it_is_evening()); // reads std::chrono::system_clock::now()
Note that side effects are only one problem, consider this example:
void g() PRE(global_thread_count < 32); // others can change it asynchronously
Because the above predicates are a possibility, we have to specify what the semantics of the resulting program are.
The range of possible choices includes:
assert
model: allow any expressions in the predicate and let the tools decide how to deal with them:
We want to remain faithful to the objectives outlined in [P2182R0], and called the MVP: while providing a useful — albeit small — addition, ramain open for the widest possible range of possible future directions. In consequence:
We recognize the following three consumers of contract annotiations:
The approach to misbehaving expressions in contract annotations should consider the effect on these three consumers.
Finally, we consider contract annotations to be a feature whose primary goal is to increase safety. This is in the context of the language that has a number of unsafe aspects. For this reason we want this feature to be safe: have as little surprising runtime behavior as possible. Also, for this feature we expect a different trade-off between safety and efficiency/expressibility than for other C++ features; therefore we are prepared to compromise the uniformity with other features, and not necessarily give in to arguments like, "treat contract predcates as any other expressions."
Naturally, we expect the contract support facility to be usable. As a practical criterion, we offer a quesiotn: can the Standard Library functions be used as contract predicates? We recomend three specific predicate examples.
void test1(std::vector<int> vec) PRE(!vec.empty());
This is somewhat artificial becuse it takes a vector by value. This is to avoid (for now) the quesiton of indirection. The quesions asked by this exmple are:
void test2(std::vector<int> const& vec) PRE(!vec.empty());
This exmple additionally asks if indirection through a reference is allowed. This quesiotn is not really tied to the Standard Library.
void test3(auto begin, auto end) PRE(std::all_of(begin, end, is_cute{}));
This is to additionally check the following:
Our intention is that programmers would never put contract predicates whose side effects affect the logic of the program, such as:
void g() PRE(start_device_if_not_started()) // discouraged! { use_device(); }
Nor contract predicates whose side effects affect the program correctness -- understood as compliance with contract annotations:
void f() PRE(less_than_100_calls_so_far()); // discouraged!
However, [P2388R4] lists side effects that could be considered benign:
<cmath>
, which store error results in global (thread-local)
variable errno
. These benign side effects do not affect the mental model for
contract annotations: from the programmer's perspective the predicatses
are still logical predicates not interfering with the program logic.
More, it is often impossible for programmers to even know if the
function they use are referentially transparent. For instance,
the specification of std::vector<T>::size()
does not prevent the implementations from
performing side effects, such as logging.
And in fact implementaions of STL do insert things like safety checks and conditional
logging to operations that are conceptually referentially transparent, in debugging mode. For instance,
consider _ITERATOR_DEBUG_LEVEL
in Visual Studio.
Finally, there are side effects that are only observale inside the invoked function, and cannot possibly have any effect on the calling program:
bool is_greater_than_one(int i) { int num = 0; ++num; return i > num; }
This is what [P2680R0] would call "side-effect-free when seen from the outside of the cone of evaluation".
The C++ Standard currently has the following definition of "side effect" in [intro.execution]:
Reading an object designated by avolatile
glvalue ([basic.lval]), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a subexpression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access through a volatile glvalue is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by thevolatile
access may not have completed yet.
Based on the above, we could introduce the classification of side effects:
The first one is formally defined. We could also provide a definition for the second.
The third class depends on the notion of "program logic" which only the programmer is aware of. For instance, logging into a file may or may not affect the program logic, depending on whether the program later decides to read the contents of the log.
In order to guarante that contract predicates are referentially transparent, we can simply make the program ill-formed when a predicate cannot be demonstrated to be referentially transparent. Conceptually this looks simple, but allows us to use very few predicates. Arithmetic operations on built-in types are fine:
void f(int i, int* p) PRE(i > 0) // demonstrably referentially transparent PRE(p != nullptr); // demonstrably referentially transparent
But the following is not:
void g(int i) PRE(some_array[i] > 0); // accessing object that can be changed externally
The above predicate is not referentially transparent: the result does not depend on function arguments.
As a consequence, if you verify that the function's precondition holds for value i0
,
you cannot be sure that it also holds for the same value the second later:
g(0); // checked that the precondition is satisfied x(); // modifies `some_array`? g(0); // precondition may not be satisfied
The same problem applies to dereferencing a pointer, or making an indirection through a reference,
because a pointer is no different than an index to a global array. And since we often take arguments by
a refernce (to const
), all of them are not suitable.
Interestingly, the following predicate could be considered valid:
struct X { int j; }; void h(int i) PRE(--(int&)X{i}.j != 0); // ok: modifies internal object
While it mutates the satate of an object, this object is internal to the predicate: itz life started after the evaluaiton of the predicate, and ended before the end of the evaluaion of the predicate. No-one eill observe this modidication. For the same reason, the following predicate, however useless, would be valid:
void h(int i) PRE((delete new int, ture)); // ok: no side effects are leaked
One relaxation to the above rules that would make the solution more usable would be to only require that predicates have no external side effetcs, but allow them to read the entire state of the program (not only constants or function parameters).
But this is not much. This doesn't allow us to write a simple precondition like:
void read(std::vector<int> const& vec) PRE(!vec.empty()); // error: calls a function, `vec` is a refernce
It is not clear how to approach the probem of indirection.
Can functions be called in this model? There is a number of ways to allow this, each with its set of problems.
One solution would be to only allow functions that are themselves declared as, and compiler-checked to be,
referentially transparent. But this would leave very little functions usable as predicates, because there are no
funcitons declared as referentially transparent today. And it may not be possible to get many in the future:
even functions as basic as std::vector<T>::size()
may need to do logging for debugging
purposes. Even if adapted, it has another cost: we put another fracture in the family of expressions,
and require even more specifiers to be put on function
declarations. This approach is similar to the one adapted for Transactional Memory support in
[N4302].
We could do what [P2680R0] suggests and allow any function whose definition is reachable and proven by the compiler to be referentially transparent.
This still has its problems. First, again, even std::vector<T>::size()
might fail the test.
Second, should the following be considered referentially transparent?
bool is_nonzero(int i) { if (false) std::cout << "side effect"; // never invoked, but present in code return i != 0; }
But most importantly, it is incompatible with the pracitce characteristic of C++ to put function declarations in header fies and funciton bodies in "cpp" files. Second, it now ties the correctness of the program with things that are traditionally considered implementation details in C++: where the definition of the function is located and if it performs non-referentially-transparent operations.
Imagine that a library author decides to move the implementation of their function libf()
from a header file to a "cpp" file.
Today, library authors can freely do that at their discretion. But if [P2680R0]
were in effect, moving the definition to a "cpp" file would risk breaking the library users' code, if that code were
using function libf()
in contract predicates. The library authors cannot know how their functions are used
in user code, so they would have to assume the worst (usage in contract predicates) and couldn't safey change the location
of function definitions. A similar reasoning applies to side effects. Library authors would not be able to safely add
any side effects to their functions (other than the modification of local variables) because this could break their
users' code. Adding a logging statement to my function might break a build.
It should be noted that even with any of these these extensions in place some intuitive uses of contract annotations will not work. For instance, when functions are called through function wrappers:
int algo(function<int(int)> op, int i) PRE(op(i) > 0); // error: function::operator() is not declared as referentially transparent
A function wrapper like std::function
must be prepared to handle both kinds of functions,
assigned at runtime. Therefore, the wrapper itself cannot be declared as referentially transparent.
Finally, it should be notet that if this strict approach is applied, it remains a possible future direction to simply lift it in the future and allow side effects in contract predicates. This door remains open if we take the conservative approach for the MVP.
Another relaxation to the above model is to allow functions that are declared [[unsequenced]]
as described
in [N2956] (which has been approved by WG14 for C22).
The [[unsequenced]]
means that the author allows the compiler to treat the function as referentially
transparent — have the freedom to replace the call to the function with its result — and the compiler isn't
required to verify this: it is undefined behavior if the function body doesn't satisfy the requirements of referential
transparency.
What is gained with this is that funcitons like std::vector<T>::size()
can be declared as
[[unsequenced]]
and usable in contract predicates, even if they perform logging: the author allows these
side effects to be erased.
Another approach to the problem of side effects in contract predicates is to impose no syntactic constraints on the predicates, but instead call it undefined behavior when the evaluated contract condition has a side effect. This is equivalent to saying that by putting an expression inside a contract annotation, the programmer allows the compiler to treat the expression as if it were referentially transparent.
This has an unsettling consequence of introducing a yet another UB, in the context of a safety feature such as contract support. On the positive side, this allows the implementations to evaluate the contract predicates arbitrary number of times. More, it encourages the implementations to emit diagnostics for side effects planted in contract predicates. The variant of this path is to make the program ill-formed, NDR. This enables the potential future direction of making side effects ill formed.
Another future direction is to assign a well-defined behavior to side effects:
for instance, apply the C-assert
model.
While term "undefined behavior" may be concerning, it should be noted that here we are talking about a different situation than the one that caused controversies over [P0542R5] (the "C++20 contracts"). Aggresive code transformations based on undefined behavior caused by contract annotation violations were, and remain, a source of concern. This is because contract annotations give the exact information that the optimizers need, and optimzations like this are of interest to users and are implemented in the compilers. The danger here stems from the interaction between UB and code optimizations. This has been described in [P1728R0]. We can call it "contract-based optimizations".
On the other hand, regarding the UB resulting from side effects in contract predicates — we can call it "side effect elision/duplication" — there is no known way for them to be used in optimizations. So, the concern of the same nature does not apply here. The remaining concern is that with appering and disappearing side effects. And the risk is even smaller when the predicates do not contain side effects. Note that the problem with "contract-based optimizations" occurs when contracts themselves are correct and only the programmers misuse the components (e.g., call functions out of contract). On the other hand, problems with "side effect elision/duplication" arise only if contract annotations themselves are fishy.
Another possibility is to allow just any expression in contract predicates. Then we have to specify the semantics in the two translation modes: No_eval and Eval_and_abort. Mode No_eval is simple: we guarantee that the predicates are never evaluated: this implies no side effects are effectuted. This mode offers a performance guarantee.
For the other mode — Eval_and_abort — we can see two options. It can express one of the two guarantees:
false
, but we do not guarantee that the side effects of the expression
will be observable, or if the expression is called twice. We will call it abort guarantee.assert()
does: that the expression is simply evaluated, single time, and its side effects are guaranteed
as for any other expression in the language. We will call it eval guarantee.So, our primary question is what the user expects from Eval_and_abort mode: program abort upon a detected bug, or conditional evaluations of expressions? For instance, should the following use case be guaranteed to work?
bool mode_eval_and_abort = false; void log_mode() PRE(mode_eval_and_abort = true) { clog << "running program in mode " << (mode_eval_and_abort ? "Eval_and_abort"sv : "No_eval"sv); }
The argument in favour of using eval guarantee (side effects executed, exactly once) is simplicity.
If there is any bug in the application — not detected by contract annotations, or caused by contract annotations —
compiled in Eval_and_abort mode, when you need to debug the problem, it is easier when you can rely on the fact that
the side effects from contract predicates are evaluated as any other expression. This is a more intuitive model,
in the spirit of an imperative language, such as C++. It follows the existing practice with assert()
:
it allows side effects, but a lot of advice comes with it, saying that side effects in the predicate cannot be relied upon.
The arguments in favor of duplicating or eliminating side effects are:
Some implementation strategies may need to evaluate the same predicate in a precondition twice.
For direct function calls, an implementation can easily insert the instrumentation code in the caller.
This is desired as it gives better diagnostics (the file name and line number of the caller who violated the precondition).
However, this is impossible when a function is called indirectly,
either through a pointer or std::function
: from the pointer signature we do not know if a function called
has a precondition or not. To address that case, one thing an implementation could do is to compile the pre- and
post-condition checks into the function body. This would give the result that the pre-/post-conditions are checked normally
when the function is called through an indirection, but are checked twice when the function is called directly:
once in the caller, and once inside the function body. We may want to enable such implementation strategies.
The consequence for the programmer is that when the predicate has side effects, these effects occur twice.
The implementation might want to provide a "mixed mode" where different translation units are compiled in different modes. For the MVP, we define this situation as IF-NDR, but in the future we may want to allow it. At this point we do not know the implementation difficulties that might arise from the mixed modes, the necessity to duplicate side effects in certain cases might be one of the results.
The mere possibility that the side effects of contract predicates can disappear in Eval_and_abort mode — provided that the programmer is aware of this — should be a strong incentive not put any "essential" side effects (e.g., logging is not such essential side effect) in their predicates. As we cannot easily prevent side effects statically in C++, this might be our best bet.
Note that runtime performance of the program in Eval_and_abort mode is not a motivation for side effect removal.
It should also be noted that the removal of side effects and the duplication of side effects are two decsions that can be made separately.
Also, it should be noted that the side effect elimination/dupication, if deemed useful, need to be added in the MVP and cannot be added later, because then it would be a breaking change for people who rely on these side effects.
One could ask if statcic analysis based on contract annotations must require that contract predicates be referentially transparent. After all, if we can force the compiler to determine if a apredicate is referentially transparent, the static analyzer can do that as well. The static analyzer, upon encountering a predicate, could first determine if the predicate has external side effects. If so, the predicate is ignored, otherwise it is consumed by the analyzer.
Such solution, however, loses one important guarante. If referential transparency is enforced in a contract annotation, the presence of the contract annotation in the code, is a visible guarantee given to programmers who review the code that the predicate is consumable by the analyzer. If this responsibility is assigned to the compiler, the analyzer, which usually reaches resource limits, can be releived of this part.
The compile-time enforcement of referential transparency, if doable at all, also helps prevent bugs where the programmer inadvertantly puts a side effect into their function, for instance by naming the wrong function. Such verification is especially required for contract predicates that are meant to be a tool for enforcing both correcntness and safety. Additional measures should be then taken to make sure that the tool itself does not compromise correctness or safety.
While it is true that some static analyzers could consume contract predicates even with side effects, or at leats some such predicates, enforcing referential transparency, makes the bar lower for static analyzers, and therefore more of such tools would be able to consume the predicates.
A solution that would both allow the above guarantee and the presence of external side effects
would be to allow the programmer to flag
a contract annotation with the information if referential transparency is guaranteed.
In the following example PRE
indicates that a referential transparency check is enforced,
whereas PRE_RELAXED
indicates no referential transparency enforcement:
bool is_nonzero(BigInt i) { i != 0; } // demonstrably free of external side effects bool is_special(BigInt i); // potentially has external side effects BigInt fun(BigInt num, BigInt den) PRE(is_nonzero(den)) // ok, static analyzer will consume PRE(is_special(den)) // error: potential external side effects PRE_RELAXED(is_nonzero(num)) // ok, but static analyzer will not consume even though it could PRE_RELAXED(is_special(num)) // ok, static analyzer will not consume { // static analyzer may guarantee that `is_nonzero(den)` is true // no other precondition guaranteed by static analyzer return num / den; }
With this additional flagging of contract annotations, when it comes to the runtime evaluation of the predicates, we have two options:
PRE
and with PRE_RELAXED
— are runtime-evaluated.PRE
may be proven
to be satisfied by the static analyzer. These do not need to be runtime checked as already proven never to be false
.
The remaining predicates (these marked with PRE_RELAXED
and the unpproven ones marked with PRE
are runtime-checked. Such skipping of runtime checks for analyzer-proven contract predicates is in a way similar to [[assume(x)]]
from
[P1774R8]
with one essential difference: it is not the user who injects the (possibly incorrect) information; instead the information
is inferred by the static analyzer, and safe from human error.
It should be noted that the feature represented by identifier PRE_RELAXED
can be added to the language
independent of the referential transparency enforcement.
This paper is a summary of discussion between SG21 members about side effects in contract predicates. Gabriel Dos Ries suggested and explained the approach of making side effects ill-formed. Ville Voutilainen suggested the approach to mix two kinds of predicates described in {dis.mix}. Herb Sutter suggested the test to what extent the Standard Library components can be used in contract predicates.