Document number: | P2388R4 | |
---|---|---|
Date: | 2021-11-15 | |
Audience: | SG21, SG22 | |
Reply-to: | Andrzej Krzemieński <akrzemi1 at gmail dot com> Gašper Ažman <gasper dot azman at gmail dot com> |
This paper proposes a minimal contract support framework for C++. It is the first phase of the plan in [P2182R1]. As such, it is the Minimum Viable Product (MVP) achieving the following goals:
As a sketch: a function containing an assertion annotation:
bool fun(int a, int b) { [[assert: are_compatible(a, b)]]; // assertion annotation return transform(a, b); }
In Eval_and_abort mode, this renders runtime code equivalent to:
bool fun(int a, int b) { [&]() noexcept { if (are_compatible(a, b)) {} else { CONTRACT_VIOLATION_HANDLER(); // maybe a no-op std::abort(); } }(); // immediately invoked lambda return transform(a, b); }
That is: a runtime check is performed, and if it returns something false
y, std::abort()
is called. If any exception is thrown during the evaluation of the predicate or the the optional contract violation handler, std::terminate()
is called.
In No_eval mode, the generated code is similar, but it does not actually invoke the lambda:
bool fun(int a, int b) { (void)[&]() noexcept { if (are_compatible(a, b)) {} else { CONTRACT_VIOLATION_HANDLER(); // maybe a no-op std::abort(); } }; // lambda not invoked return transform(a, b); }
That is, the predicate is odr-used but not evaluated.
std::unreachable()
.const
function parameters.
const_cast
in section {rat.arg}.
std::abort()
(rather than std::terminate()
)
upon contract violation.
void
.
In this document we use the following terms recommended in [P2038R0] and [P2358R0]:
++INT_MAX
, 1/0
or null pointer dereference.[[pre: i != 0]]
or [[assert: x != y]]
.
Contract annotations can express a subset of function specification.There are three kinds of contract annotations: preconditions and postconditions on function declarations, and assertions inside function bodies.
Contract annotations use an attribute-like syntax:
int select(int i, int j) // first declaration [[pre: i >= 0]] [[pre: j >= 0]] [[post r: r >= 0]]; // r names the return value int select(int i, int j); // subsequent declarations can // repeat or ignore the annotations int select(int i, int j) // the definition { [[assert: _state >= 0]]; if (_state == 0) return i; else return j; }
We specify the semantics of two modes of program translation:
false
,
an implementation-defined contract violation handler is invoked.
We recommend (the ISO word "should") that the default mode is Eval_and_abort.
The semantics of the contract violation handler are implementation-defined, subject to the following constraint:
control never leaves the violation handler.
It can stop the program, hit a breakpoint, or run forever after emailing a sysadmin, for instance.
If the program exits, it must signal unsuccessful termination appropriately for the context, such at returning a non-zero error code (as with abort()
).
A default safe implementation will do nothing in the violation handler, and will just let std::abort()
be called.
This corresponds to what [P2358R0]
calls "Unspecified but never returns" and what [P1606R0]
calls "check_never_continue".
Name lookup for preconditions and postconditions is performed as in the trailing return type.
Access control is performed as if the predicate was evaluated inside the body of the function, making private and protected members of the containing class accessible to predicates of preconditions and postconditions of member and befriended functions.
Preconditions and postconditions are checked (in Eval_and_abort mode) even if the function is called through a pointer.
This proposal does not actively prevent programmers from putting predicates with side effects inside contract annotations. However, doing so may have surprising effects if the program relies on these side effects.
Compilers are allowed to remove or duplicate these side effects (or parts thereof, on subexpression basis) at their discretion, in Eval_and_abort mode. In other words, mode Eval_and_abort guarantees that we will compute the result of a predicate, but not that we will evalueate its side effects.
Implementations are allowed to provide a mixed mode, where some translation units are translated in No_eval mode and others in Eval_and_abort mode. This may be necessary for linking the user program with compiled libraries.
These features are deferred due to unresolved issues. This proposal makes sure not to block any of these being proposed in the future.
assert()
, except that you can put annotations in more places.)We consider these use cases valid and important. Our choice is not to support them in the MVP, but to ultimately enable their support in the future. Their omission from this paper serves one purpose: enable programmers to declare contract annotations as soon as possible.
A library does not have to declare contract annotations (even the Standard Library). If so, this is as now.
Libraries that declare contract annotations and ship compiled binaries will have to ship two versions: one compiled in No_eval mode, and one in Eval_and_abort. This also applies to the Standard Library.
One can also see toolchains supporting mangling the contract mode into the symbol name and emitting code for both into the same fat binary.
The motivation for adding contract support to the language has already been provided multiple times, for example in [N1613], [N1800], [N4110], [N4075], [N4135], [P0380R0], [P2182R1]. In brief, contract annotations allow the programmers to encode what they consider a bug in code. This information can be used by tools in a number of ways.
The motivation for starting with a minimal set of features has been provided in [P2182R1]. In summary, it is to minimize the risk of having the discussion or disagreement about the secondary features of the contract support framework impede or prevent the addition of the primary functionality: the ability for the programmer to communicate what is considered a bug in the program. The plan for the MVP is therefore to find a minimal uncontroversial subset, gather consensus around it, polish, and deliver it quickly.
The secondary features can then be proposed on top of this stable core; but, since it has shipped, the core part of contract support framework will not be at risk of being removed or postponed even if they run into issues.
We have observed that even the portions of [P2182R1] arose controversies. This paper to removes these portions.
There are at least two views on what a contract violation means among the stakeholders.
One view is that contract violation indicates a bug, and this situation is as serious as dereferencing a null pointer or making an invalid access to memory. If this situation is detected at run-time, the only acceptable default action is to immediately abort the execution of the program to prevent any further damage.
The other view is that the contract violation is just a piece of information, usually about a bug (but not always: for instance, in unit tests a programmer may violate a contract on purpose), that can be processed in a number of ways. Program termination is one, but the program might as well just continue with the normal execution or jump to a different location, for example, by throwing an exception.
The two views conflict in whether anything other than program termination should be allowed after a runtime-confirmed contract violation, but they also have overlapping parts:
We believe that this overlapping part is already useful for many groups of programmers. It allows the programmers to declare what they know to be a bug in their components. It allows tools other than compilers — such as static analyzers or IDEs — to diagnose or help diagnose bugs in programs statically. The Eval_and_abort mode gives a guarantee to programmers that if a bug is detected at run-time, the program — apart from aborting — will do no further damage. This enables UB-sanitizers to diagnose not only abstract machine violations ("hard" UB) but also contract violations in a uniform way (provided that this way does not require the program to continue after the reported violation).
Because this proposal is — we believe — a common part of the two views, we hope that proponents of either view should find nothing unacceptable, other than that it is missing features.
It may appear that what is proposed is too small and not usable in many real-life use cases. In particular, the ability to continue the program execution after a runtime correctness check has failed is brought up in this context.
In section {rat.con}, we describe why the design of the continuation mode, such that it avoids introducing surprising new reasons for undefined behavior and gathers consensus, will take a long time, and thus not be ready for C++26.
We can see two options:
The latter option subsumes the former, even if one disagrees that the reduced feature set is "viable" for one's use cases.
We want to provide a uniform notation, usable in different contexts — in this proposal these contexts are function declarations and function bodies — for expressing runtime conditions, the violation of which necessarily constitutes a bug in the program. Such information can be consumed in a number of ways, by many different tools, most of which are outside of the scope of the C++ Standard; we want all the tools to consume the same notation.
We want the proposal — the number of features, the amount of wording changes — to be as small as possible, and remain as open for future extensions as possible. The goal is to deliver a small piece fast. We demonstrate how additional — often requested — features can be added on top of this proposal while preserving backwards compatibility.
We want the proposed to be zero-overhead: programmers should be able to add contract annotations and compile the program in a mode that doesn't add a single instruction to the compiled program.
We want other programmers to get the guarantee that the program never continues once it has been determined, through the correctness annotations, that it has encountered an illegal state. This goal seems somewhat contradictory with the previous one, but we manage to combine them through the introduction of translation modes. This means that a correctness annotation can have different effects on the compiled program, depending on what translation mode it was compiled with.
In addition to the grammar of contract annotations, this proposal includes the facility that uses them: the ability to check them at runtime, and stop the program if the check fails.
int fun(int i) { [[assert: i != 0]]; // correctness annotation return i; } int main() { fun(0); std::cout << "hello\n"; }
Running this program can have two different effects, depending on whether we enable
the generation of runtime-checks. If we don't, the program will output "hello"
and exit 0; but if we enable them, this program will exit via
std::abort()
before printing "hello".
[[assert:
Pred]]
is a correctness annotation:
one that appears inside function body, as a statement. Pred names a
predicate. If the predicate returns false
it means we have a bug.
If it returns true
, we do not know if we have any bug or not.
In order for such a runtime test to work, Pred has to be a valid C++
expression convertible to bool
. However, it differs slightly
from just any other expression.
Another consumer of contract annotations is a static analyzer that is able to perform a symbolic evaluation of the program and determine if any control path leads to a bug. (Correctness annotations provide a formal definition of a bug.)
int get_count(); int fun(int i) [[pre: i > 0]] // correctness annotation { return i - 1; } int main() { int count = get_count(); if (i < 0) return fun(-i); else return fun(i); }
[[pre:
Pred]]
is precondition:
it is another correctness annotation: one that appears in a function
declaration. It says that whenever this function is called —
just after the function parameters have been initialized — the
predicate must hold, otherwise we have a bug.
The static analyzer could determine that if function get_count()
returns 0, then function fun()
is called such that the predicate
in the precondition does not hold.
Here, a precondition "hold"ing implies having a predicate in the mathematical sense: one that returns a true-false answer, and has no effect on the environment, since in mathematics there is no such thing as an "environment". However, in an imperative language like C++, expressions have side effects. The expression in the precondition above doesn't have side effects, so the meaning is clear, but a static analyzer would have a hard time (and we also) if the precondition on the function read:
int fun(int i) [[pre: --i >= 0]] { return i; }
Therefore, while the predicates in contract annotations are C++ expressions, we take some measures to address the situations where these expressions have side effects. Thus, they are not 100% C++ expressions.
Our choice of notation accounts for the fact that we have to address these two, quite different, use cases.
Features added in the future may further increase the difference between predicates in contract annotations and normal C++ expressions. One such extension would be a notation for inexpressible conditions as discussed in [P2176r0].
A postcondition expresses the effects that a function guarantees when:
longjmp
).One of such guarantees is the state of the returned value:
char get_digit() [[post c: is_digit(c)]];
[[post
Name:
Pred]]
is a postcondition: it is another correctness annotation that
appears in a function declaration.
In a function declaration the return value (or reference) doesn't have a name, so there is a way to introduce it. The predicate can refer to this name. The type of the return value is usually visible in the declaration, except when we use a return placeholder type and do not provide the definition from which the type could be deduced:
auto get_digit() [[post c: is_digit(c)]]; // error: decltype(c) unknown
The compiler has to parse the contract annotation in order to determine if the expression is well formed, but this task is impossible when we do not know the type of the object. This problem does not occur in a function template declaration, because the task of verifying the validity of the expression is deferred until instantiation time:
template <int I> auto get_digit() [[post c: is_digit(c)]]; // OK
The problem also does not occur when we provide the body of the function from which the type of the return value can be deduced:
auto get_digit() [[post c: is_digit(c)]] // OK: decltype(c) is char { return '0'; }
This has been discussed in detail in [P1323R2].
A postcondition may describe a relation between the returned value and and function arguments:
int generate(int lo, int hi) [[pre lo <= hi]] [[post r: lo <= r && r <= hi]];
The intention of the above declaration seems quite obvious. The function guarantees that:
This expectation can be reflected by the following assertion in the code
that uses the above function generate()
:
int main() { int l = 1; int h = 100; int m = generate(l, h); [[assert: l <= m && m <= h]]; }
However, this is not what the runtime postcondition test will check in case
the values of objects lo
and hi
have been modified
inside function generate()
.
In order to prevent this situation, we make it ill-formed when a postcondition
references a function non-reference parameter, and this parameter is not declared
const
in the function definition.
The postcondition needs to have access to the returned value, so it is called after the function has initialized it. In order for the postcondition to check the condition that the caller expects, we have to make sure that the function has not modified the values of function parameters. In order to achieve that we require that non-reference function parameters referenced in the postcondition must be declared const in function definition:
int generate(int lo, int hi) // OK: not a definition [[pre lo <= hi]] [[post r: lo <= r && r <= hi]];
int generate(int lo, int hi) // error: hi and lo not const { return lo; } int generate(const int lo, const int hi) // OK { return lo; }
int generate(int& lo, int& hi) // OK: hi and lo are references [[pre lo <= hi]] [[post r: lo <= r && r <= hi]] { return lo; }
If you get an error about the postcondition referring to non-const
non-reference function parameters, you can:
const
.Another way to put this is that postconditions referring to non-reference function parameters make these parameters part of the interface: the function no longer has freedom to use it at will.
This restriction does not apply to references: in case of reference parameters it is clear to the caller that the objects the caller sees can be modified before the function finishes.
Pointers are values, so when they are passed by value, they are subject to the same constraints as other types passed by value.
Side effects are allowed in predicates. Some side effects are desired, as described in section {rat.eff}, but they have to be "self contained": removing them (or a well defined portion thereof) should not affect the program behavior.
Generally, a compiler is allowed to apply code transformations as if the expressions in contract predicates were side-effect-free. We do not require or expect that the predicates are referentially transparent, because it is often desired to refer from them to global variables, even if they are mutable.
The compiler is allowed to replace any subexpression appearing in a contract annotation with just the value of this expression. There are two cases where the compiler has enough information to do this:
bool is_positive(int i) { printf("is_positive() called\n"); // this does not affect the result return i > 0; }
int produce() [[post r: p(r) && q(r)]]; void consume(int i) [[pre: p(i)]] [[pre: q(i)]]; consume(produce());Here, it is enough to call
p()
and q()
only once, and compiler is allowed to reuse these results rather than
calling p()
and q()
again.
However, for the second case to work, an evaluation of any predicate in the sequence cannot have side effects that would affect the results of other predicates in the sequence. This cannot be enforced statically, so we call it undefined behavior.
It is undefined behavior if evaluation of one predicate or subexpression thereof affects the result of any other predicate or a subexpression thereof in the same test sequence. It is also undefined behavior if the evaluation of a predicate modifies the value of function parameters, the returned value, or any other objects odr-used by the predicate. Any other side effects are allowed and valid.
void f(int i) [[pre: ++i]]; // UB if evaluated
The compiler is also allowed to call the predicate twice:
void f(int i) [[pre: p1(i)]] [[pre: p2(i)]];
When function f()
is evaluated, it might cause
the following sequence of precondition tests: p1(i)
,
p2(i)
, p1(i)
, p2(i)
.
This allows the implementations to test the preconditions
both inside and outside the function body. This ability is
useful for generating good error messages from preconditions:
they have to be evaluated inside the function, in case the call
is made through a pointer. But when a call is made directly,
we can evaluate it outside, and then the error message points at
the caller, rather than the callee.
A possible, hypothetical but valid, implementation of a contract annotation check would be to start a transaction, evaluate the predicate, save the result on the side, cancel the transaction (thereby erasing all side effects since the beginning of the transaction), and just using the stored value instead as a result.
Function pointers and references cannot have contract annotations, but functions with contract annotations can be assigned to them:
using fpa = int(*)(int) [[pre: true]]; // error using fptr = int(*)(int); int (int i) [[pre: i >= 0]]; fptr fp = f; // OK fp(1); // precondition is checked
In other words, contract annotations are not part of function type. Thanks to this decision, you can write code like:
int fast(int i) [[pre: i > 0]]; int slow(int i) [[pre: true]]; // no precondition int f(int i) { int (*fp) (int) = i > 0 ? fast : slow; return fp(i); // if fast() is called, its precondition is checked }
The consequence of this behavior is that an implementation cannot check the precondition in the place where function is called. The check has to be performed inside the function. Therefore, it is a reasonable implementation strategy to implement the precondition check inside the function in case it is called indirectly, but where the function is called directly, check it in the caller, in order to provide the source location that is the culprit of contract violation. This may sometimes result in checking the same precondition twice.
The same mechanism works for function wrappers:
using fp = int(*)(int); int f(int i) [[pre: i >= 0]]; function<int(int)> fp = f; // OK fp(1); // precondition is checked
You can put contract annotations on virtual functions, however they behave in a certain way in the overriding functions. You can omit contract annotations in the overriding function. The overriding function then has the same contract annotations as the virtual function in the base class, and the names in the predicates are looked up in the context of the base class.
static const int N = 1; // #1 struct Base { virtual void f() [[pre: N == 1]]; }; template <int N> // N is shadowed struct Deriv : Base { void f() override; }; int main() { Deriv<2>{}.f(); // precondition test passes }
The precondition in the overriding function is N == 1
,
but the name lookup is performed in the context of class Base
,
so it sees the global variable N
declared in line #1.
You cannot declare a contract annotation in the overriding function if the virtual function in the base class doesn't have a corresponding contract annotation.
You can declare a contract annotation in the overriding function, but it has to be identical (modulo the names of function parameters) to the corresponding contract in the overridden function. However, the program is ill formed, no diagnostic required, if name lookup in the predicate finds different entities than if the name lookup were performed in the context of the base class:
static const int N = 1; // #1 struct Base { virtual void f() [[pre: N == 1]] [[post: sizeof(*this) < 100]]; }; template <int N> // N is shadowed struct Deriv : Base { int i[100]; void f() override [[pre: N == 1]] // IF-NDR, finds different N [[post: sizeof(*this) < 100]]; // IF-NDR, inspects different class };
We do not allow the preconditions in the overriding function to be "wider" and the postconditions to be "narrower" than in the overridden function. Inheritance also means inheriting the contract. If you need the overridden function to be called out of contract of the base class, you have to provide a separate function:
struct Base { virtual int fun(int i) [[pre: i >= 0]] [[post r: r >= 0]]; }; template <int N> struct Deriv : Base { int fun_wide(int i) { return i; } virtual int fun(int i) [[pre: i >= 0]] [[post r: r >= 0]] { return fun_wide(int i); } };
The above example also illustrates that a feature request "I want the
postcondition in the overriding to be the same or narrower than the
postcondition in the overridden function" is conceptually wrong.
Function fun_wide()
has a wider postcondition than
fun()
, but it is still substitutable for fun()
.
This is because fun_wide()
will never return a negative number
as long as it does not obtain a negative number as input. A postcondition
can never be treated in isolation from the function's precondition.
First, this proposal allows programmers to define what is considered a bug in their program; therefore tools like sanitizers can insert defensive checks based on correctness annotations, and report them in a way that is uniform with how UB (such as null pointer dereference) is reported. Such insertions are conformant with what we allow in Eval_and_abort mode, as long as the program terminates afterwards.
Second, if:
__builtin_unreachable()
, and__builtin_unreachable()
, and then you can put __builtin_unreachable()
inside your contract
violation handler. This way old UB-sanitizers can report on all declared
contract violations, and no special contract support has to be implemented
inside the sanitizers.
This proposal interacts with
[P0627R6],
which adds function std::unreachable()
, which could be used
for the same purpose. However, the specification in [P0627R6]
allows the following implementation:
[[noreturn]] void unreachable() [[pre: false]] // never satisfied {}
If this implementation was used in combination with the above suggestion to
put std::unreachable()
in a custom contract violation handler,
this would cause an infinite recursion. Therefore, this proposal disallows
the implementations to put contract annotations on std::unreachable()
,
if it is added.
In this proposal the predicates in contract annotations are not in the immediate context of the function. They behave similarly to exception specification:
template <std::regular T> void f(T v, T u) [[pre: v < u]]; // not part of std::regular template <typename T> constexpr bool has_f = std::regular<T> && requires(T v, T u) { f(v, u); }; static_assert( has_f<std::string>); // OK: has_f returns true static_assert(!has_f<std::complex<float>>); // ill-formed: has_f causes hard instantiation error
As a consequence, we may have a function template that works well for a given type, but stops working the moment we add a contract annotation. This also affects how concepts would be taught: a good concept should express not only the operations that are necessary in the implementation of the generic algorithms, but also these that are necessary in the specification of contract annotations in these algorithms.
The syntax that we propose uses the double square bracket notation, but because we require a colon inside, it does not match the syntax of attributes, neither in C++ not in C. While the the specification differs slightly between C and C++, they both agree that an attribute-specifier is a comma separated list of attributes inside double square brackets, each attribute taking the form:
namespace-name::attribute-name(balanced-token-seq)
Thus the notation that we propose:
[[identifier identifieropt : balanced-token-seq]]
Can never be recognized as an attribute based on the C or C++ specification.
The two alternatives to consider for contract annotations would be:
// not proposed int f(int i) [[pre(i >= 0)]] [[post(r: r >= 0)]];
Attributes in C and C++ have the property of "ignorability". It is expressed directly in the C specification ([P2182R1]):
A strictly conforming program using a standard attribute remains strictly conforming in the absence of that attribute.
Contract annotations have an analogous property with respect to bug-free
programs. If your contract annotations compile and never evaluate to
false
in a running program, then you can remove the annotations
from your code and you will not observe any difference. The notion of being
"bug-free" now has a formal definition: a program is bug free when its
contract annotaitons never render value false
for any inuput.
This notion of ignorability makes the contract annotations similar to attributes. Therefore we believe that the double bracket syntax is justified. However, strictly speaking, contract annotations are not attribute-specifiers.
Another reason is pragmatic. Adding new keywords is expensive, requires a lot
of creativity, and often results in solutions that feel compromised. Our
choice is further limited by the desire to use identifier assert
for representing assertions in function bodies. This precludes any solutions
that contain the sequence of tokens assert (
, which would be
parsed as a macro.
One reason for not choosing the attribute syntax, already indicated above
is that [[assert(pred)]]
would be interpretted as a macro
and render a compiler error. This problem would disappear if we used some
other name than assert
. However, name assert
is
a well established name, not only in C++, and abandoning it
would make the learning curve unnecessarily steeper.
The second reason is that the sqoare brackets do not indicate an attribute: they indicate an attribute-speciifier, which is a sequence of attributes. Treating contract annotaitons as attributes would allow users to write things like:
// not proposed int f(int i) [[noreturn, gnu::fastcall, pre(i >= 0)]];
The proposed syntax, due to its similarity with attributes, sends a message
to programmers: these annotations can be ignored if your predicates are
well formed and they always evaluate to true
.
The second reason is social: we do not want to depart from what P0542-contracts settled upon and what EWG agreed to, which has 6+ years of history, teaching and reference implementations.
An alternate syntax might be possible, but the design is subject to a number of concerns already mentioned. Also, providing a novel syntax would be incompatible with our primary goal: to provide the ability to declare contract annotations still in C++26. Experimenting with new syntax, waiting for a reference implementation will surely make the process longer. While doing things fast can never be a primary goal, there is also a risk of improving the design forever. An alternative syntax should offer a clear and significant improvement over the current one.
A concern has been raised that the square bracket notation used for contract annotations, even though not being an attribute-syntax, will be confused with attributes. This can result in the following:
[[
and ]]
, and outputs a diagnostic message. Such implementation
would just skip the parsing of contract annotations, and compromise the
guarantee that expressions in contract annotaitons should be at minimum
checked for syntactic and type-system correctness. The first two concerns boil down to how much you can trust your compiler to
implement the desired check. This proposal requires that the compiler
implements at least one of the two modes: Eval_and_abort or
No_eval. In either case users should expect that their predicates
in contract annotations are syntax-checked. However, this is not generally
enforcable. The standard only requires that an implementation issues at least
one diagnostic for an ill-formed program, and that it producess a binary with
correct semantics for a well formed program (the standard does not mention the quality
of this diagnostic). However, it allows diagnostics
for well defined programs (this is how warnings work), as well as generating
a binary with implementaiton-defined semantics for ill-formed programs
(this is how language extensions work). So, if your implementaiton chooses
to skip anything between [[
and ]]
and issue a diagnostic
"attribute ignored", it is conformant. If the predicate in your contract
annotation was well formed, an unsolicited diagnostic has been produced (legal),
and program compiles fine with no additional behavior, as you would expect
in No_eval mode. If your contract annotation contains an ill-formed
expression, the required diagnostic (albeit with confusing content) is produced,
and the binary that does something is generated (legal for ill-formed program).
So, there is no way to enforce the compiler to refuse to compile and produce
a binary for an ill-formed program.
The same observation applies to features like concepts or
static_assert
. However, this is not a problem in practice,
because we trust that the implementaitons will do what is intended. This
boils down to the QoI issue. If your compiler just skips contract
annotations, you know you cannot trust it for enforcing contracts. We could
not prevent the compiler from ignoring your annotations: even if they were
expressed with new keywords.
The reason that contract annotaitons expressed with double sqare bracket
syntax is particularly
subject to concerns, is that this approach to skipping anything between
[[
and ]]
has been suggested as a plausible
implementation of a C compiler, following the informal, but apparently common, reasoning that anything between
[[
and ]]
can be considered less important, secondary information.
The supporters of this view would naturally expect that contract annotations
do not use the double angle bracket syntax. While WG21 doe snot directly deal with
the C programming language, creating an unnecessary incompatibility with C,
(which plans to add contract support in the future) is a legitimate concern.
Before we consider switching to a different syntax, we would like to lerarn
if there exists an implementation
that actually makes use of the above view.
Addressing the third concern, the notion of appertaining to function type
is not understood well enough to be able to make definitive statements
if contract annotations apply to function types or to individual function
declarations. There are no standard attributes that appertain to function
type. We have non-standard ones, but because each of them has its own
implementaiton-defined semantics, no clear conclusion could be drawn from
them. For instance, Clang supports attribute [[gnu::fastcall]]
which appertains to function type, and affects the type system,
so that a function declared with with [[gnu::fastcall]]
cannot
be cast to a function pointer declared without [[gnu::fastcall]]
and vice versa. But this behavior is speciffic to this single attribute,
and there is no way to draw a conclusion that the same should apply to any
attribute that appertains to function type. There is nothing that prevents
a different attribute from appertaining to function type and not affecting
the type system at the same time.
The following is illegal under our proposal:
// not proposed using DivFun = int(int num, int den) [[pre: den != 0]];
One could make the case that because this is illegal, contract annotations do not appertain to function type. But because each attribute has its own semantics, including its own specification of where it can appear and where not, we again claim that this fits within what is allowed for an attribute that appertains to function type.
There is no stict criterion here, so we can only refer to an intuition, and the intuition is inconclusive also: one could say that given the declaration:
int divide(int num, int den) [[pre: num != 0]];
The precondition appertains more to a function type, as in: "this is one of those functions that disallow zero in the second parameter." Of course, this reasoning only makes sense only if "appertain to" was a property of contract annotations; but it is not, because "appertains to" is only defined for attributes.
Because we do not require contract annotations to be repeated on subsequent function declarations, it is possible that two translation units can see different sets of contract annotations for the same function:
// translation unit 1 // ------------------ // first declaration in this TU: void f(int i) [[pre: i > 0]]; void f(int i) { // ... } |
// translation unit 2 // ------------------ // first declaration in this TU: void f(int i); // no precondition int main() { f(0); // precondition checked or not? } |
In this context, the question whether pre- and postconditions "belong" to the caller or the called function, obtains a new dimension. A similar issue occurs for default function arguments. Ideally, from the user perspective, we would like the above situation to be an error diagnosable by the implementations. However, because this cannot be diagnosed from a single translation unit, it may be too burdensome to require of the implementations to diagnose it.
This looks similar to ODR violation. The consequences in the run-time behavior in Eval_and_abort mode are that the wrong set of contract annotations gets executed or that both sets are executed, if the implementation decides to test the pre-/postconditions both inside and outside the function.
However, inconsistent contract annotations have consequences on static analysis. What conditions should a static analyzer look for? It may turn out to be incompatible with the runtime checks.
We can see a couple of ways to address this.
We choose option 1, because this most clearly reflects what such situation is: simply having a function with two different sets of contract annotations is an error, even if this error cannot be diagnosed by the implementation.
const
non-reference
function parameters in postconditions ill-formed?
{rat.arg} As indicated in section {des.arg}, a non-reference
function parameter referenced in a postcondition must be declared
const
in function definition. This is to prevent any
modification of the parameter.
In order to show why this is necessary, consider the following example:
// declaration that the user sees: int generate(int lo, int hi) [[pre lo <= hi]] [[post r: lo <= r && r <= hi]]; // definition that only the author sees: int generate(int lo, int hi) { int result = lo; while (++lo <= hi) // note: lo modified { if (further()) ++result; // incremented slower than lo } return result; }
Because lo
is modified as the function is executed, we get an ambiguous situation where:
lo
;lo
as observed upon the exit from the function.We could just followed the rule: postcondition is evaluated at the end of the function, and whatever values function parameters have at that point, we read them. This would make the model for postconditions "simple", but it would give the semantic meaning to the postcondition that is clearly against the intent of the contract:
int generate(int lo, int hi) [[pre lo <= hi]] [[post r: lo <= r && r <= hi]];
The consumers of the contract only see the declaration. And the declaration clearly states:
This is how humans will read this declaration; and this is how the tools (like static analyzers) will read this declaration. So, in order for both tools and the C++ runtime checks to have the same meaning, we have to exclude some otherwise valid implementations of the function.
A similar problem, albeit more difficult to spot, is when the function parameter is implicitly moved from:
// declaration: string forward(string str) [[post r: r == str]]; // definition: string forward(string str) // disallowed in our proposal { // ... return str; // implicit move } // postcondition reads the moved-from state
If we allow this to compile in Eval_and_abort mode, we end up with the
postcondition reading the moved-from state. A programmer may not be even aware
that a move is taking place. We prefer a compiler error to Undefined Behavior
here. The programmer always has an option to fix that, either by dropping
the postcondition or declaring the function parameter const
.
const_cast
to cast away const
-ness? This falls under general rules in C++ for casting away const
-ness.
[dcl.type.cv]/4:
Any attempt to modify a const object during its lifetime results in undefined behavior.
This is a good opportunity to remind best practices of using const_cast
:
You can only use it safely when you already know, by other means, that the object you cast a
pointer/reference to is not const
.
There is no need to. Reference function parameters clearly indicate in the declaration that we are interested in modifying the objects that the caller sees. In fact this may be the desire of the caller to have the called function modify these objects, and this can even be reflected in the postcondition:
void fix_limits(int& lo, int& hi) [[post: lo <= hi]];
Is this measure enough? What about pointers? Even if they are const, the value they point to can be mutated.
The answer is the same as for the references: you can clearly see from the declaration that the function is allowed access to objects from your scope, so you are prepared and maybe expect them to change:
void fix_limits(int* lo, int* hi) [[post: *lo <= *hi]];
It has to be kept in mind that pointers, including smart pointers, are value-semantic types: their value is the address: not the value they point to.
The protective measure with adding const
does not work for
types that violate value semantics:
class Book { shared_ptr<string> _title = make_shared<string>(); // poor choice public: string const& title() const { return *_title; } void set_title(string t) { *_title = std::move(t); } friend auto operator <=>(Book const&, Book const&) = default; };
When we copy objects of this type, the copy is tied to the original, so when we modify the copy, this automatically modifies the original. We believe it is a fair compromise to not take care about types like this.
The downside of the solution with adding const
is that
when you add a postcondition to a function declaration, it can cause
the definition of the function to be come ill-formed. Adding const
may not suffice, because the implementation might be modifying function
parameters. There are some combinations of types and functions for which an alternative
implementation may not exist if we make function parameters const
.
Consider the following example by Ville Voutilainen:
struct X { int v = 0; }; void f(propagate_const<shared_ptr<X>> x) // [[post: x->v == 42]] { x->v = 42; }
For such cases, programmers will not be able to declare a postcondition.
The positive part of this proposal is that you get an explicit error rather than a possibly surprising runtime behavior.
Other options for solving non-reference parameters in postconditions have been considered. P0542-contracts allow them, but call it undefined behavior if such parameter was modified during the evaluation of the function. This included the situation where the parameter was implicitly moved from.
That solution encouraged too many silent bugs. The present solution looks superior. First, potential bugs are detected at compile time. Second, we leave more doors open for the future. An illegal declaration can be turned into a legal declaration with undefined behavior. The reverse is not possible.
Another option would be to, rather than requiring the programmer to manually
type const
on function parameters, make these parameters implicitly
const
:
int generate(int lo, int hi) // OK [[pre lo <= hi]] [[post r: lo <= r && r <= hi]] { int result = lo; while (++lo <= hi) // error: lo is implicitly const { if (further()) ++result; } return result; }
This spares some typing, but has downsides. It is not as
future proof as the proposed solution: the syntax is well formed with well defined semantics,
so we cannot change it in the future. Second, this alternative can cause
confusion. When an parameter is changed implicitly to const
,
overload resolution starts selecting different overloads, and the programmer
cannot easily figure out why, as there is no const
in sight.
Another option: disallow postconditions to reference function parameters.
This is possible, but the proposed solution is superior. Referencing
non-const non-reference parameter is already an error; we just make it easier
for people who can afford to put const
in the definition.
Another option: silently make a copy of every non-reference parameter referenced in the postcondition in Eval_and_abort mode. This copy would be only visible to the postcondition, so there is no way a function can possibly change it.
While attractive, this has certain disadvantages.
Book
in the example above — that do not correctly
implement value semantics, and modifying a copy causes the modification in
the original.Another option is a variant of the above: silently make a copy of a postcondition-referenced parameter in Eval_and_abort mode, and in No_eval mode fail to compile if this object's type is not copyable. This removes the compile-time surprise when switching between modes; but it does not remove the other two problems: unexpected copies and problems with types that do not have value semantics.
It is, however, a possible future opt-in extension, as described in section {fut.old}.
Finally, there is an option to remove postconditions from this proposal. This will however make the same problem reappear once we try to add postconditions in the future.
It should be kept in mind that making the situation with non-reference parameters ill-formed in the MVP is a good starting point for the future extensions: in the future this could be extended to most of the above alternatives.
The D programming language, which has quite a similar model to C++ has a
similar problem with referencing non-const
parameters in
postconditions. However, it does not impose any constraints on the function
parameters. As a result, postcondition runtime checks give counterintuitive
results, as can bee seen in this Compiler Explorer example:
https://d.godbolt.org/z/8xr1oqdq5.
The behavior of runtime-checking, logging but not aborting has a number of surprising and non-obvious effects.
First, in case the precondition is guarding against the Abstract Machine Violation (hard UB), reaching the point of the abstract machine violation might result in what would be observed as removing the log entry for the previous contract violation. This is discussed in detail in appendix A.
Second, a mechanical transformation of every contract annotation to "check, log and continue" can introduce a new Abstract Machine Violation (hard UB) if the programmer assumes the short-circuiting behavior for subsequent contract annotations. Consider the following example.
int f(int * p) [[pre: p]] [[pre: *p > 0]] { if (!p) throw Bug{}; // safety double-check if (*p <= 0) throw Bug{}; // safety double-check return *p - 1; // (*) business logic }
This program upon p == nullptr
behaves in a way that (1) does not cause abstract machine violation and
(2) guarantees that the business logic, indicated with *, is never reached. This happens for both No_eval
and Eval_and_abort mode. However, in a mode that allows the program to continue, this causes abstract machine violation
upon runtime-checking the second precondition. This is explained in detail in appendix A.
We believe that "the continuation mode" is a useful feature that is necessary for some essential applications (like adding contracts in libraries in stages). Our motivation for omitting it from the MVP is the timing concerns: we want to deliver a small but relatively useful feature quickly.
We propose to disallow throwing upon detected contract violation because it falls outside of one of the presented programming models: the one that says that it is unacceptable to allow the program with a detected bug to continue.
By disallowing throwing we also avoid exploring and describing many aspects observable behavior that this would trigger:
noexcept
.This makes the proposal smaller, and therefore more likely to progress faster through the WG21 process.
Not proposing the ability to install user-provided violation handlers is again motivated by timing constraints. This way we avoid the necessity to specify the interface for the violation handler and its constraints.
However, the way we specify the handler (mostly implementation defined) allows for things like programmer installing a callback in an implementation-defined way.
We intuitively expect that predicates in contract annotations have no side effects. This expectation is reflected in terms like "if the precondition holds". However, it is impossible to enforce such constraint in an imperative language like C++.
It is often impossible for programmers to even know if the
function they use has any side effects. For instance,
the specification of std::vector<T>::size() const
does not prevent the implementations from
performing side effects, such as logging.
Some side effects are practical to have, and they do not affect the reasoning about predicates in the mathematical sense. These side effects include:
<cmath>
, which store error results in global (thread-local)
variable errno
. Our choice follows the existing practice with assert()
: it allows side effects, but a lot of advice
comes with it, saying that side effects in the predicate cannot be relied upon.
[P0542R5] specified that invoking any side effect inside a contract annotation predicate is undefined behavior. This allowed certain practical optimizations, such as not performing two consecutive evaluations of the same predicate, which is often the case when postcondition of one function is the same as the precondition of another function. It also allowed checking the same predicate twice, for instance once inside the function body, and the second time in the calling context.
While we drop this undefined behavior on any side effects in a contract predicate, we allow similar code transformations, and introduce narrower cases that trigger undefined behavior: in situations where the programmer is clearly doing something nasty.
Why not just follow [P0542R5] and say that any side effect inside a predicate is undefined behavior? Because we do not want programs with "benign" side effects to be penalized, and render additional surprises. Only "malignant" side effects should trigger undefined behavior. We have a pretty good definition of "benign" side effect in a predicate: it does not modify any object that it odr-uses and it does not affect the result of any other subsequent predicate in the same correctness test sequence.
Why not just use simpler wording and say that a predicate listed in a contract annotation can be evaluated an unspecified number of times (including 0 times)? While this would send a clear signal to the users, that the side effects cannot be relied upon, it would not allow the predicate transformations that we described earlier.
We allow the implementation to pick any subexpression form a predicate, and discard all (as opposed to some)
its side effects. This allows the implementation in the following example to call p1
and p2
only once, even
though the preconditions and the postcondition are not ODR-identical:
int produce() [[post r: p1(r) && p2(r)]]; void consume(int i) [[pre: p1(i)]] [[pre: p2(i)]]; consume(produce());
Given the above provision, is it safe to evaluate a predicate that has to lock a mutex to read the value and then unlock the mutex? The answer is yes, as long as it all happens inside one function:
bool p1(int i) { lock_global_mutex(); int c = read_global_counter(); unlock_global_mutex(); return c <= i; }
Because in the contract annotation p1(i)
is the most nested
subexpressions, the entire function body has to either be executed
or the entire call to p1(i)
skipped.
On the other hand, the following is dangerous, and may cause concurrency issues, if the compiler decides to elide side effects per subexpression:
void consume2(int i) [[pre: (lock_mutex(), read_counter(), unlock_mutex(), counter() <= 1) ]];
Is such provision for per-subexpression side effect elision safe? Our response is: yes, because the chunks of side effects that the compiler will potentially be removing are always clearly visible to the programmer inside a contract annotation in the form of subexpressions.
First, it cannot be added later, because then it would be a breaking change. User may start to rely on the mandated side effects in Eval_and_abort mode, as per the Hyrum's Law, and these would suddenly disappear in future releases.
Second, it gives a strong encouragement to the users not to put side effects in their predicates. Their side effects may disappear, even in Eval_and_abort mode; or they can be duplicated.
Programming guidelines often recommend that in contract predicates of public member functions one should only use the public interface of the class. This is in case the class user needs to manually check if the contract is satisfied for an object whose state is not known. However, this is only a guideline, and enforcing it in the language would break other use cases that do not subscribe to the above advice.
In general, the users must ensure that the precondition of the called function is satisfied. If they do that, they do not have to check the precondition.
Allowing the access to protected and private members enables a practical usage scheme. In general, function precondition is something that cannot be fully expressed as C++ expression. The implementers choose how much of the function precondition they want to check. They may choose to check some parts of the precondition by accessing private members that they do not want to expose to the users, for instance, because the private implementation may change over time or under configuration:
class callback { #if !defined NDEBUG mutable int _call_count = 0; #endif // ... public: void operator() const // real contract: this function can be called no more than 8 times, // so the precondition is that the function has been called 7 or less times #if !defined NDEBUG // attempt to check the precondition [[pre: _call_count <= 7]]; #endif };
In the above example, the precondition can only be checked in debugging mode.
Once NDEBUG
is defined, member _call_count
is removed
and there is no way to test the precondition.
Also, a hypothetical constraint to use only public members in contract predicates could result in programmers turning their private and protected members into public members only to be able to express the pre- and postconditions, which does not sound like a good class design.
This has been described in detail in [P1289R1], and in fact adopted by EWG.
We recommend that the default mode is Eval_and_abort, but we do not require this of the implementations. The reason for that is that we believe that it is not possible to mandate this behavior in this International Standard.
Our ideal is that users get safety by default and performance on request. ("Safety" in this context is understood as "do not let the program with a bug execute".) But what the users often use is an IDE. Even if we were able to require of the compiler that its default mode is Eval_and_abort, the IDE can provide its own defaults which map on non-default values of compiler switches.
This has been discussed in detail in [P1769R0].
const
qualification Ideally, we would like contract predicates to be referentially transparent
(have no side effects, depend on nothing else but the objects directly
referenced in the expression). This goal is not realistically attainable.
We could get closer, however, if we required that all objects referenced
in the predicate are treated as if they were const
. Thus,
only operators and functions (including member functions) that take
arguments by value or a reference to const
are allowed.
This could cause some discrepancies in case we have two overloads:
one for type T const&
and the other for T&
doing different things:
struct X { bool is_fine() const { return /* impl 1 */; } bool is_fine() { return /* impl 2 */; } friend void fun(X& x) [[pre: x.is_fine()]] // uses impl 1 { [[asset: x.is_fine()]]; // uses impl 2 } };
Also, there exist referentially transparent functions that do not
mark their reference parameters as const
. While programmers are
often advised to mark any functions that do not modify their
arguments by contract as const
, this is just and advice, and does
not have to be followed. And it is not clear if such functions
should be excluded from contract predicates.
Because of these issues, and unclear gains, we do not propose
to treat objects as const
inside contract annotations.
constexpr
functionsDuring core constant expression evaluation in Eval_and_abort mode when any function precondition, postcondition or assertion is violated, the program is ill-formed.
Ideally, we would like such evaluation to be ill-formed even in No_eval mode. However, we are not sure if this is easily implementable on all platforms. This proposal allows but not requires this.
P0542-contracts has wording "There should be no programmatic way of setting, modifying, or querying the build level of a translation unit." Technically, we could add a similar requirement for the translation mode.
We decided to drop this requirement, as there is no way to actually test if it is satisfied, and it is unrealistic to expect that of the implementations. For instance, A program can open a configuration file of the project that lists the translation mode. Should programs be banned from opening and reading such file?
It is clear that at some point there will have to be a way to annotate contract annotations with additional information controlling how contract annotations should be interpreted, such as:
std::abort()
.We could provide the placeholder syntax for these annotations still in the MVP, so that C++ compilers can start parsing it as soon as possible, before we standardize individual "meta-annotations". This way, programs written in C++30 would compile in C++26 compilers (assuming this paper is accepted for C++26).
The serious concern about this idea is that while these meta-annotations will be parsed in C++26, they will have different semantics than in the future revisions of C++. This silent change of semantics is unacceptable. For this reason, we do not propose the placeholder syntax for meta-annotations.
We would end up in the situation, where the same program compiles on both older and newer contract support frameworks, but has different semantics. Consider the following program, with possible future extensions:
void f(const char* s) [[pre{axiom}: is_null_terminated(s)]]; // name must not odr-used bool binary_search(container<Val>& c, Val const& v) [[pre{audit}: is_sorted(c)]]; // must not be evaluated bool foo(int i) [[pre{review}: i >= 0]]; // must not abort bool bar(int i) [[pre{default}: i >= 0]]; // must evaluate and abort
No single semantic that doesn't understand the additional syntax can accommodate these expectations.
This proposal allows to declare preconditions and postconditions in arbitrary order:
int f(int x, int y) [[post r: p(r)]] [[pre: p(x)]] [[post r: q(r)]] [[pre: p(y)]];
We could be stricter about is and require that no precondition annotation can follow any precondition annotation. However:
int f(int x) [[pre: neat(x)]] [[post r: neat(r)]] [[pre: nice(x)]] [[post r: nice(r)]];
abort()
vs terminate()
{rat.end} In this proposal, throwing from the predicate calls std::terminate()
while a failed runtime check
calls std::abort()
. (We simply forbid the implementations
to end the violation handler by throwing an exception: it is up to implementations how this requirement is satisfied.)
The above distinction reflects the fundamental difference between the two situations.
Throwing from the predicate is a random, unpredictable, but correct situation in the program. Maybe a comparison
had to allocate memory, and this allocation failed, because today the server is exceptionally busy. We want to
handle it the way we usually handle exceptions when there is no suitable handler: std::terminate()
is an exception handler, with its unique control flow, however harsh.
In contrast, failing a runtime correctness test is an indication of a bug, and it is not clear if std::terminate()
,
which is the second level of exception handling mechanism, is a suitable tool. The call to std::terminate()
either calls std::abort()
or calls a terminate handler installed by the user. In case the contract is
violated, and we can be sure the program contains a bug, calling a user-installed function may be unsafe,
and can pose a security risk.
[[assert:]]
is not an expression {rat.exp} In this proposal [[assert:]]
can only appertain to a null statement, which effectively makes it a statement,
and it cannot be used as an expression. This is a downside compared to macro assert()
, which can be used
as a subexpression of a bigger expression and therefore be used to protect things like initialization of variables:
MyClass::MyClass(int i) : member((assert(i > 0), i)) {}
In the future [[assert:]]
could be extended and become an expression.
However, this is not in scope of the Minimum Viable Product.
This proposal does not allow code transformations based on assumption that
predicates in correctness annotations always evaluate to true
.
Such transformations could change the program (even parts far before the
contract annotation) in a surprising way.
We believe that removing this provision increases the level of consensus.
On the other hand, we believe that the Standard does not have to explicitly
allow such transformations. First, they can never be guaranteed or required:
a compiler vendor must choose to make an effort and implement them. If a
compiler vendor implements such transformations, it can offer the feature
as a non-conforming extension, such as __builtin_unreachable()
.
In fact, this proposal indirectly allows the programmer to achieve the same
effect, provided that the compiler vendor provides the necessary parts.
This proposal says that what happens in response to contract violation is
implementation defined. A conforming implementation may allow the programmer
to install a callback to be called on such occasion (a contract violation
handler). The programmer may then choose to put __builtin_unreachable()
inside the handler. The implementation then would have to guarantee that
the handler is inlined.
This should not be alarming to people focused on safety, because the above
scenario requires the usage of __builtin_unreachable()
, which
is easily detectable through simple mechanical checks, and cannot be installed at runtime.
This revision of the paper does not require or encourage any error message to be displayed to standard diagnostic stream, or anywhere in Eval_and_abort mode. There are two reasosn. First, there is no standard diagnostic stream on freestanding implementations, and we want contract support to be available on those platforms. Second, for security reasons. When an application is in a confirmed incorrect state, performing IO operations may pose a security risk. As the primary focus of this proposal is safety, we choose a conservative approach.
This decision is a consequence of modelling contract annotations after noexcept
.
Alternative designs would be:
Discarding the first alternative simplifies our model: we can say that contract annotations do not affect the type system. The second alternative could cause runtime surprises, where the programmer might be let to believe that their contract is enforced, when it is not.
The solution that we chose also comes with a surprise: "my program stopped compiling only because I added a contract". However, this is a compile-time surprise, which we find prefereble to runtime surprises. We also believe that this potential surprise will be mitigated by guideline that concepts constraining the generic functions should also include operations whose only use is to check the function preconditions:
template <Arithmetic T> T scale(T a, T b) [[pre: b != T{0}]] { return a / b; } template <typename T> concept Arithmetic = requires(T a, T b) { { a / b } -> std::convertible_to<T>; // other operations T{0}; // in case we need to specify function contract };
Another possibility would be to provide a way for the function author to say, "check this contract annotation only when it is well formed." We do not pdopose this for the MVP. If such functionality is needed, it can be emulated by adding a concept-constrained overload:
template <Arithmetic T> T scale(T a, T b) [[pre: b != T{0}]]; template <Arithmetic T> requires RichArithmetic<T> T scale(T a, T b) [[pre: b != T{0}]];
While we propose to drop a number of features from the MVP, this proposal does not close the door for adding them in the future, once (if) the MVP has been agreed upon and baked.
To guarantee backward compatibility as we add features in the future, all semantics added in the MVP must not change. Therefore all new features will have to be opt-ins: enabled either through new syntax/annotations, or through new translation modes.
The ability to install a custom violation handler (along with the handler's interface) can be provided as a future extension, with the behavior mandated by the MVP being the semantics of the default violation handler.
The ability to continue after logging the contract violation has two use cases:
In the first case, the application of the "continue" semantics is not a default behavior, and would require that a person who assembles the program instructs the compiler to use the special behavior. This can be added in the future as a third mode of translating the source code with contract annotations (in addition to No_eval and Eval_and_abort).
In the second case, there is a need to discriminate the newly added contract annotations from the stable ones. This would require some additional syntax to mark such annotations, for instance:
int f(int * p) [[pre: p]] // stable annotation [[pre: *p > 0; new]] // new annotation ;
This can be added in the future by introducing a new syntax for the newly added annotations and allowing the programmer to control separately what runtime code should be generated from these "new" annotations.
The syntax space for additional information in contract annotations is quite broad. The alternatives include:
[[post r: r > 0: new]] [[post{new} r: r > 0]]
Regarding the throwing violation handlers, the only known and
well explored use case is for "negative" unit-testing.
We note that this ability would only be used in special programs:
ones that execute unit tests. It would also require of the tested
functions not to be marked as noexcept
. For this
special case it seems reasonable to expect of the person that assembles the
program that they instruct the compiler in an explicit way
that a unit-test program is built. This ability could be added as a
future extension by introducing a yet another translation
mode where exceptions thrown from violation handlers are not
immediately turned into a call to std::terminate()
(but they can still be turned into std::terminate()
when noexcept
functions are executed in unit-test programs).
It is also possible to do negative testing of preconditions and
postconditions by offering a set of reflection features tailored
to that eventuality, without even running the function. A strawman
proposal could be std::are_preconditions_satisfied(f, x, y, z) -> bool
.
This might look like a lot of modes of translation, but it should be kept in mind that the perspective of a person that looks
at the code will not have modes: a declaration starting with [[pre:
is always a precondition: something that
evaluates to true
in correct programs. The modes will be visible only to the person assembling the program,
and in this case, having a lot of fine grained control is desired.
The point of this section is to illustrate that dropping features like violation handlers, continuation after a failed runtime check or throwing violation handlers from the MVP does not build any technical barriers that would prevent the addition of these features in the future revisions of the contract support framework. We also expect that once the syntax for declaring contract annotations has been standardized, compiler vendors will offer non-standard extensions that will allow the users to experiment with additional features and become a basis for the future standardization.
It is sometimes desirable to express in the postcondition how the state
of the program upon return from the function is different from the state
that the program had when the function was entered. One notable example
of this is when we want to check that after a successful
push_back
, the size of the sequence container upon return is
greater by one than the size of the container upon function entry.
However, by the time the precondition is evaluated, the previous state of
the container is no more. We would have to make a copy of the
portions of the program state that would be later inspected in the
postcondition.
There could be at least two ways to express this. One is to introduce even more names in the postcondition declaration, and use a syntax similar to the one used in lambda-captures:
int f(int& i, array<int, 8>& arr) [[post r, old_i = i: r >= old_i]] // value of i upon entry [[post r, old_7 = arr[7]: r >= old_7]]; // value of arr[7] upon entry
The other option is to use a new keyword (or token) for this purpose,
like oldof
proposed in
[N1962]:
int f(int& i, array<int, 8>& arr) [[post r: r >= oldof(i)]] // value of i upon entry [[post r: r >= oldof(arr[7])]]; // value of arr[7] upon entry
This future addition would also address the case where a postcondition
refers to the non-reference function parameter (see section
{rat.arg}), but the function cannot
afford to add const
:
template<class ForwardIt, class T> ForwardIt find(ForwardIt first, ForwardIt last, const T& value) [[post r: distance(oldof(first), r) >= 0u]] // `first` not referenced [[post r: distance(r, last) >= 0u]] { for (; first != last; ++first) { if (*first == value) { return first; } } return last; }
This proposal is open for either of these syntax variants (in the future).
It has been suggested that postconditions could have a better syntax for introducing the name of the returned value. Any such exploration of syntactic space should take into account that postconditions might need to introduce more names, as described in section {fut.old}.
int f(int& i, array<int, 8>& arr) [[post r, old_i = i: r >= old_i]] // value of i upon entry [[post r, old_7 = arr[7]: r >= old_7]]; // value of arr[7] upon entry
An alternate syntax for postconditions that does not preclude the option above in the future would be to use parentheses:
int f(int& i, array<int, 8>& arr) [[post(r): r >= 0]] // for the MVP [[post(r, old_i = i): r >= old_i]] // future [[post(r, old_7 = arr[7]): r >= old_7]]; // future
There has been a number of concerns raised against the usage of attribute-like syntax for contract annotations. This paper does not provide rationale for the choice of syntax. The syntax was inherited from P0542-contracts. The future revisions will either come with a different syntax, or a strong rationale for using the attribute-like notation.
The discussion about the choice of attribute-like syntax is provided in [P2487R0]. An alternative notation for contract annotations is explored in [P2461R1].
There is a number of ways to specify what visible changes to side effects in contract predicates a compiler is allowed to apply in Eval_and_abort mode. We are still investigating which of those ways to chose.
The notion of preconditions, postconditions and invariants originated from functional programming languages, where terms "value" and "predicate" make a lot of sense. In this paper we try to introduce preconditions and postconditions to an imperative language that doesn't have "values" or "predicates": it has "objects", "states" and "functions". We observed that we are are still applying terms "value" and "predicate" when describing the semantics of contract annotations. This is likely to lead to bugs in the design. The symptom of one such bug has recently been revealed and described in section {rat.arg}. This is a clear indication to the authors that the model for describing semantics of contract annotations needs to be revisited.
Generally, a precondition is the minimum set of requirements necessary for the invoked function to work within its contract. However, if contract support is added to C++ in the form of preconditions and postconditions, but with missing class invariants, a passionate programmer may implement the idea that "a class invariant is a precondition on every public member function of a class", by checking the invariant as a precondition in every pubic member function:
class Container { // ... public: bool invariant() const { return (size() == 0) == empty(); } int size() const [[pre: invariant()]]; bool empty() const [[pre: invariant()]]; };
One can easily see how this could lead into infinite recursion. The problem here is that the programmer tries to check anything that is potentially interesting as opposed to the minimum function requirements.
We can think of two ways to approach this. One is to educate the programmers that this is a dangerous thing to do, and trust them that they will avoid declarations like the above. The other is to disable the runtime checking of contract annotations when we are inside the evaluation of another contract annotation. The consequence of this technique would be that you cannot call functions in preconditions that themselves declare preconditions; and this would mean that all your preconditions must have a wide contract:
bool a(T v) [[pre: true]]; // wide contract bool b(T v) [[pre: a(v)]]; bool f(T v) [[pre: b(v)]]; // bad bool g(T v) [[pre: a(v) && b(v)]]; // good
This second approach has been implemented in Boost.Contract library ([CAMINITI]). The first approach has been implemented in D: https://d.godbolt.org/z/86zv45TnK.
Wording changes are against N4892.
The text in bluish font indicates comments that are intended to easily map the wording onto the overview part of the proposal, and to give a rationale for the wording choices. We use the following terms:
In 6.3/13 [basic.def.odr], add a new bullet after bullet 13.13:
if
D
invokes a function with a contract annotation, or is a function that contains an assertion or has a contract annotation (9.12.4.2), it is implementation-defined under which conditions all definitions ofD
shall be translated using the same translation mode (9.12.4.3); and
The above permits the implementations to allow the co-existence of two definitions of the same function compiled in two different translation modes. The above takes into account that postconditions, much like preconditions, can also be evaluated outside of their function.
Modify the beginning of 6.9.1/11 [intro.execution] as follows:
When invoking a function (whether or not the function is inline), every argument expression and the postfix expression designating the called function are sequenced before the function's precondition test sequence (9.12.4.3); the function's precondition test sequence is sequenced before every expression or statement in the body of the called function.
In 11.4.1/7 [class.mem.general], add a new bullet after bullet 7.4:
— contract annotation (9.12.4.2), or
In 7.6.1.3 [expr.call], add a new paragraph after paragraph 7:
The function's precondition test sequence (9.12.4.3) is sequenced after the initialization of function parameters. The function's postcondition test sequence (9.12.4.3) is sequenced before the destruction of function parameters.
In 7.7/5 [expr.const], add a new bullet after bullet 5.12:
— an unsuccessful correctness test (9.12.4.3), or
In 9.12.1/1 [dcl.attr.grammar], modify the production for attribute-specifier as follows:
…
attribute-specifier:
- [ [ attribute-using-prefixopt attribute-list ] ]
- correctness-specifier
- alignment-specifier
…
Contract annotations are not regular attributes, but we still define them in [dcl.attr]
because (1) they share some characteristics of attributes, and (2) we follow the precedent of
alignas
.
Add a new section 9.12.4 Correctness specifiers [dcl.correct] ([dcl.attr.deprecated] becomes 9.12.5).
Add a new section 9.12.4.1 Syntax [dcl.correct.syn]:
1. Correctness specifiers are used to specify preconditions, postconditions, and assertions for functions.
correctness-specifier:
- [ [ pre : conditional-expression ] ]
- [ [ post identifieropt : conditional-expression ] ]
- [ [ assert : conditional-expression ] ]
2. A correctness-specifier using pre is a precondition. The specifier may be applied to the function type of a function declaration. [Note: A precondition expresses a predicate, which can be evaluated upon entry into the function. If evaluated and the result is
false
, this indicates that the program contains a bug. — end note].
Term "bug" seems fine inside a non-normative note. This formulation in the note looks more unambiguous than an alternative that would say "function's expectation".
3. A correctness-specifier using post is a postcondition. The specifier may be applied to the function type of a function declaration. [Note: A postcondition expresses a predicate which can be evaluated when the function returns to its caller. If evaluated and the result is
false
, this indicates that the program contains a bug. — end note] A postcondition may introduce an identifier to represent the glvalue result or the prvalue result object of the function. When the declared return type of a non-templated function contains a placeholder type, the optional identifier shall be present only in a definition. [Example:int f(int& p) [[post: p >= 0]] // OK [[post r: r >= 0]]; // OK auto g(auto& p) [[post: p >= 0]] // OK [[post r: r >= 0]]; // OK auto h(int& p) [[post: p >= 0]] // OK [[post r: r >= 0]]; // error: cannot name the return value auto h(int& p) [[post: p >= 0]] // OK [[post r: r >= 0]] // OK { return p = 0; }— end example]
4. If a postcondition introduces an identifier and the return type of the function is
void
, the program is ill-formed.5. A correctness-specifier using assert is an assertion. The specifier may be applied to a null statement (8.3). An assertion correctness test (9.12.4.3) is performed as part of the evaluation of the null statement the assertion applies to. [Note: An assertion expresses a predicate. If evaluated and the result is
false
, this indicates that the program contains a bug. — end note]6. Preconditions, postconditions, and assertions are collectively called correctness annotations. The conditional-expression in a correctness annotation is contextually converted to
bool
(7.3); the converted expression is called the predicate of the correctness annotation. [Note: The predicate of a correctness annotation is potentially evaluated (6.3). — end note]
The requirement that the predicates are potenitally evaluated follows from [basic.def.odr]/p2, which says that any expression is potentially evaluated, unless it is explicitly listed as unevaluated.
Add a new section 9.12.4.2 Contract annotations [dcl.correct.anno]:
1. A contract annotation is a precondition or a postcondition. A contract annotation may be applied to the function type of a function declaration. The first declaration of a function D shall specify all contract annotations (if any) of the function. Other declarations of the function reachable (10.7) from D shall either specify no contract annotations or the same list of contract annotations; no diagnostic is required if corresponding conditions will always evaluate to the same value. If declarations of function F appear in two translation units, the first declarations of F in either translation unit their lists of contract annotations shall be the same; no diagnostic required; also no diagnostic is required if corresponding conditions will always evaluate to the same value. If a friend declaration D is the first declaration of the function in a translation unit and has a contract annotation, that declaration shall be a definition and there shall be no other declaration of the function or function template which is reachable from D or from which D is reachable.
The above means that if a friend function declaration introduces a new function and has contract annotations, it has to be a "hidden friend". This requirement was copied from [P0542R5].
2. Two lists of contract annotations are the same if they consist of the same contract annotations in the same order. Two contract annotations are the same if their predicates are the same. Two predicates contained in correctness-specifiers are the same if they would satisfy the one-definition rule (6.3 [basic.def.odr]) were they to appear in function definitions, except for renaming of parameters, return value identifiers (if any), and renaming of template parameters.
3. [Note: A function pointer cannot include contract conditions. [Example:
typedef int (*pfa)() [[post r: r != 0]]; // error: contract annotation not on a function declaration int g(int x) [[pre: x != 0]] [[post r: r != 0]]; int (*pf)(int) = g; // OK int x = pf(5); // contract annotations of g are tested— end example] — end note]
4. The predicate of a contract condition has the same semantic restrictions as if it appeared in the noexcept-specification of the function it applies to, except that the return type of the function is known in a contract condition appertaining to its definition, even if the return type contains a placeholder type. [Example:
class X { private: int m; public: void f() [[pre: m > 0]]; // OK friend void g(X x) [[pre: x.m > 0]]; // OK }; void h(X x) [[pre: x.m > 0]]; // error: m is a private member— end example]
5. A contract annotation is considered to be needed when:
- in an expression, the function is selected by overload resolution ([over.match], [over.over]);
- the function is odr-used (6.3) or, if it appears in an unevaluated operand, would be odr-used if the expression were potentially-evaluated; or
- the function is defined.
The conntract annotations of a specialization of a function template or member function of a class template are instantiated only when needed.
This implies that contract predicates are not in the immediate context in overload resolution.
6. A precondition test (9.12.4.3) is performed immediately before starting evaluation of the function body. [Note: The function body includes the function-try-block (Clause 14) and the ctor-initializer (11.9.3). — end note] A postcondition test is performed immediately before returning control to the caller of the function. [Note: The lifetime of local variables and temporaries has ended. Exiting via an exception or via
longjmp
(17.13.3) is not considered returning control to the caller of the function. — end note]7. The basic precondition test sequence of a given function
f
is the sequence of precondition tests corresponding to the list of preconditions off
. For two precondition annotations P1 and P2 applied to a function, if P1 appears lexically before P2 then the precondition test corresponding to P1 is sequenced before the precondition test corresponding to P2. If a function has multiple postconditions, their tests (if any) will be performed in the order they appear lexically. The basic postcondition test sequence of a given function is the sequence of postcondition tests corresponding to the list of postconditions of the function. For two postcondition annotations P1 and P2 applied to a function, if P1 appears lexically before P2 then the postcondition test corresponding to P1 is sequenced before the postcondition test corresponding to P2. [Note: A basic precondition test sequence can be empty if the function has not preconditions. — end note] [Example:void f(int* p, int*& q) [[post: q != nullptr]] // #4 [[pre: p != nullptr]] // #1 [[post: *q >= 0]] // #5 [[pre: *p >= 0]] // #2 { q = p; // #3 }The numbers indicate the order in which expressions in preconditions, postconditions and function body are evaluated. The basic precondition test sequence consists of two precondition tests corresponding to #1 and #2. The basic postcondition test sequence consists of two postconditions corresponding to #4 and #5. — end example]
8. If a predicate in the postcondition odr-uses (6.3) a non-reference parameter, this parameter shall be defined
const
inside the function definition. [Example:int f(int i) [[post r: r == i]]; int g(int i) [[post r: r == i]]; int f(const int i) // OK { return i; } int g(int i) // error: i is not declared const { return i; }— end example]
9. The postcondition test sequence (9.12.4.3) is sequenced after the initialization of the returned value, and after the destruction of objects with automatic storage duration declared in the function body. The postcondition test sequence is sequenced before the destruction of function arguments. [Note: Therefore, a postcondition can inspect the state of the return value and function arguments. — end note]
Add a new section 9.12.4.3 Testing contract annotations [dcl.correct.test]:
1. A translation may be performed in one of the following translation modes: ignore, or enforce. The mechanism for selecting the translation mode is implementation-defined. The translation of a program consisting of translation units where the translation mode is not the same in all translation units is conditionally-supported with implementation-defined semantics.
2. Recommended practice: If no translation mode is explicitly selected, enforce should be the default translation mode.
P0542-contracts have wording "There should be no programmatic way of setting, modifying, or querying the build level of a translation unit." We omit is as explained in section {rat.sec}.
3. An ignored correctness annotation test performs no operation. [Note: The predicate is potentially-evaluated (6.3). —end note]
4. An enforced correctness annotation test is performed as follows. The predicate is evaluated. If the evaluation exits via an exception,
std::terminate()
is called. If the evaluation exits via a call tolongjmp
(17.13.3) the behavior is undefined. An implementation is allowed to substitute the evaluation of a predicateP
that returns to the caller with an alternative predicate that returns the same value asP
but has no side effects. An enforced correctness annotation test where the evaluation ofP
returnsfalse
is called unsuccessful. For an unsuccessful correctness annotation test the contract violation handler is invoked. The contract violation handler does not exit via an exception or via a call tolongjmp
(17.13.3). The semantics of the contract violation handler are implementation defined.5. [Note: The contract violation handler does not have to be a function with any linkage. An implementation may provide a way to customize the behavior of the contract violation handler. — end note]
However, an implementation might want to implement it as a global pointer to function, lest when in the future the Standard starts to require an installable violation handler, this should cause an ABI breakage.
6. After the contract violation handler is executed, the program exits and an implementation-defined form of the status unsuccessful termination is returned. No destructors for objects of automatic, thread, or static storage duration are executed. Functions passed to
atexit()
(6.9.3.4) are not called.7. In translation mode enforce all correctness annotation tests are enforced. In translation mode ignore during constant evaluation (7.7) it is implementation-defined whether correctness annotation tests are ignored or enforced. In all other situations in ignore translation mode all correctness annotation tests are ignored.
8. If a correctness annotation test modifies the value of any object odr-used (6.3) in its predicate, the behavior is undefined.
9. Let
PRE
denote the basic precondition test sequence of a given functionf
. When performing enforced contract annotation tests, it is unspecified ifPRE
is performed one or two times. This is called a precondition test sequence. LetPOST
denote the basic postcondition test sequence of a given functionf
. When performing enforced contract annotation tests, it is unspecified ifPOST
is performed one or two times. This is called a postcondition test sequence. [Note: This allows the implementations to test the precondition and postcondition of a function both inside and outside of the function body. — end note]10. When the results of functions
fs
are used to initialize function arguments of a functiong
, the combined test sequence is a sequence of postcondition test sequences offs
followed by a precondition test sequence ofg
. [Example:int f1() [[post r: p1(r)]]; int f2() [[post r: p2(r)]]; void g(int i1, int i2) [[pre: p1(i1)]] [[pre: p2(i2)]]; g(f1(), f2());The combined test sequence of this function call graph is
p1(r)
,p2(r)
,p1(i1)
,p2(i2)
. — end example] If two expressionsE1
andE2
appearing in a combined test sequence are same (9.12.4.2) andE1
is sequenced beforeE2
the implementation is allowed to replace the evaluation ofE2
with the result ofE1
. If such substitution changes the result of the test containingE1
and te test is performed, the behavior is undefined. [Example:int f1() [[post r: p1(r)]]; int f2() [[post r: p2(r)]]; void g(int i1, int i2) [[pre: pg()]] [[pre: p1(i1) && p2(i2)]]; g(f1(), f2());The implementation is allowed to skip the evaluation of
p1(i1)
andp2(i2)
and instead use the values previously returned byp1(r)
andp2(r)
. — end example] [Example:int f1() [[post: q()]]; int f2() [[post: q()]]; void g(int i1, int i2); g(f1(), f2());The implementation is not allowed to substitute any evaluation of
q()
with the result of the other, because their corresponding tests are indeterminately sequenced. — end example]
In 11.7.3 [class.virtual] add a new paragraph after 11.7.3/17:
18. If an overriding function specifies contract annotations (9.12.4), it shall specify the same list of contract annotations as its overridden functions; no diagnostic is required if predicates of corresponding contract annotations always evaluate to the same value. Otherwise, it is considered to have the list of contract annotations from one of its overridden functions; the names in the contract annotations are bound, and the semantic constraints are checked, at the point where the contract annotations appear. Given a virtual function
f
with a contract annotation that odr-uses*this
(6.3), the class of whichf
is a direct member shall be an unambiguous and accessible base class of any class in whichf
is overridden. If a function overrides more than one function, all of the overridden functions shall have the same list of contract annotations (9.12.4); no diagnostic is required if predicates in corresponding annotations will always evaluate to the same value. [Example:struct A { virtual void g() [[pre: x == 0]]; int x = 42; }; int x = 42; struct B { virtual void g() [[pre: x == 0]]; } struct C : A, B { virtual void g(); // error: precondition annotations of overridden functions are not the same };— end example]
In 13.8.3.3/3 [temp.dep.expr] add a bullet after bullet 3.6:
— an identifier introduced in a postcondition ([dcl.correct]) to represent the result of a templated function whose declared return type contains a placeholder type,
In 13.9.4 [temp.expl.spec] add a note after 13.9.4/13:
14. [Note: For an explicit specialization of a function template, the contract annotations (9.12.4) of the explicit specialization are independent of those of the primary template.— end note]
In 13.10.3.1 [temp.deduct.general] add a note after 13.10.3.1/7:
14. [Note: The equivalent substitution in contract annotations (9.12.4.2) is done only when the contract annotation is instantiated, at which point a program is ill-formed if the substitution results in an invalid type or expression.— end note]
In 14.5.1/1 [except.terminate] add a new bullet after bullet 1.7:
— when a correctness annotation test (9.12.4) exits via an exception, or
Add a new entry to the [tab:cpp.predefined.ft] in table [cpp.predefined]:
Macro name Value __cpp_contracts
XXXXYYL
If [P0627R6] is accepted,
add the following to the specification of unreachable()
.
Remarks: The declaration of this funcion shall not contain contract annotations. [Note: This ensures that this function can be used inside the contract violation handler (9.12.4.2). — end note]
[P1680R0] by Andrew Sutton and Jeff Chapman describes the implementation in gcc of a variation of contract support. It is a superset of this proposal. Quoting from [P1680R0]:
The implementation is hosted online at
https://gitlab.com/lock3/gcc-new.
The implementation lives in the branch contracts
.
This compiler can also be used online through Compiler
Explorer (https://godbolt.org/) as x86-64 gcc (contracts).
[LOPEZ19] describes another implementation for Clang of P0542-contracts, which is a superset of this proposal. The implementation can be found as https://github.com/arcosuc3m/clang-contracts/.
Daveed Vandevoorde offered useful feedback on the syntax of contract annotations.
Joshua Berne reviewed this document, including the wording, and offered many useful suggestions.
Walter Brown reviewed the document, and offered many a correction, clarification and improvement.
Nathan Myers offered insights and guidance on the tricky corner cases of contract annotations.
Jens Maurer reviewed the proposed wording and offered significant improvements.
Ville Voutilainen offered numerous useful suggestions to improve the quality of the paper.
The following people also reviewed the proposal and offered suggestions, corrections and improvements: Aaron Ballman, Peter Brett, Ben Craig, Thomas Köppe, John McFarlane, Henry Miller.
We are aware of two unintuitive consequences of continuing violation handlers. First, in case the precondition is guarding against the abstract machine violation (hard UB), logging and then reaching the point of the abstract machine violation might result in what would be observed as removing the log entry for the contract violation. This has been described in detail in [P2339R0] as well as in [P1494R2]. This is no worse than disabling runtime checking altogether (which is uncontroversial), but can really fool whoever troubleshoots the bug: we see no contract-violation log entry, so we think control never reached this point, even though it did. Thus, the continuing violation handler has the potential to deceive whoever uses contract violation logs. There is no agreement on how realistic the possibility of encountering this effect is. The solution presented in [P1494R2] has the potential to address the above issue. But until this is explored, any wider contract proposal that allows continuation would be blocked on [P1494R2].
Second, the continuation mode can introduce new abstract machine violations. Going back to the example provided earlier:
int f(int * p) [[pre: p]] [[pre: *p > 0]] { if (!p) // safety double-check throw Bug{}; if (*p <= 0) // safety double-check throw Bug{}; return *p - 1; }
If this function is invoked in a program that doesn't runtime-check contract annotations,
it behaves in a tolerable way for p == nullptr
: it throws an exception.
But when runtime checking is enabled and the program is allowed to
continue after the failed check, this code is equivalent to:
int f(int * p) { if (!p) // (1) log_violation(); if (*p <= 0) // (2) log_violation(); if (!p) // safety double-check throw Bug{}; if (*p <= 0) // safety double-check throw Bug{}; return *p - 1; }
If p
happens to be null, check (1) will fail and the
violation will get logged. The program will proceed
to check (2) and there, it will dereference the null pointer causing
an Abstract Machine Violation (hard UB).
The key observation here is that the defensive checks that throw
exceptions (or return) have the "short circuiting" property: if one
fails,
the subsequent ones are not executed:
int f(int * p) { if (!p) throw Bug{}; if (*p <= 0) // null `p` never dereferenced throw Bug{}; return *p - 1; }
Short-circuiting is also guaranteed for subsequent precondition annotations, provided that the contract violation
logging ends in calling std::abort()
. But short-circuiting is gone,
when the handler allows the program flow to continue.
Splitting a precondition into smaller chunks is used for at least two purposes:
Until this problem is addressed, the continuation after a runtime-evaluated check is a potentially dangerous feature that can introduce an Abstract Machine Violation (hard UB) on top of a BizBug (a programmer bug). While this problem may be solvable, the analysis and solution will take time, which will delay the adoption of the minimum contract support if it allows the continuation.