Value constraints

ISO/IEC JTC1 SC22 WG21 N4160 2014-10-03

Andrzej Krzemieński, akrzemi1@gmail.com

Project: Programming Language C++, EWG

Introduction

In this paper we want to analyse how a support for contract programming-like features could be implemented in C++. We use a different term "value constraints" to stress that we do not necessarily want to copy solutions that other imperative programming languages adapt. We want to change the focus from how broken contracts are responded to in run-time, to how contract violations can be detected at compile-time. This is a high-level overview of the problem, and we are not even covering the things like sub-contracting or other interactions with OO features.

Table of contents

Comparison with other contract-programming proposals

While [N1962] is a fairly complete proposal for adding contract-programming support to C++, this document offers an analysis of the problem domain rather than making a specific proposal. We focus on identifying expectations, potential implementation difficulties and costs.

All other contract programming proposals that we are aware of — [N4075], [N4110] — start from the assumption that the support for preconditions must be provided in the from of evaluating preconditions before function calls, occasionally disabling these precondition evaluations, and installing broken contract handlers. In this paper, we do not take this assumption for granted. Run-time support is only a subset of the scope of our analysis. We explore in more detail an alternative approach: the focus on static analysis (also indicated in [N1962]).

What do we need?

The 'features' we describe here can be collectively called value constraints. We want to use them to express in the language that certain combinations of values of certain objects at certain times are invalid and are intended (and expected) never to occur.

Specify function's domain

The problem goes back to mathematics. Certain functions are well defined only for a subset of values of the input type. For instance, a square root returning real numbers is not defined for negative numbers. What does it mean for a function in a programming language like C++ that it is not defined? Currently there are two approaches. One is to detect the value of the parameter(s) that the function should not be prepared for and execute a different logic: returning a special value, throwing an exception, etc. Using our example with a square root, a corresponding function sqrt() could be defined as follows:

double sqrt(double x)
{
  if (x >= 0.0) {
    // do proper algorithm
  }
  else {
    return numeric_limits<double>::signaling_NaN();
    // or throw an exception
  }
}

What it effectively does is to change the function's domain. Now the functions do not restrict its domain anymore. It is well defined for every value of an input type, and does other things than only the "proper algorithm". This has an unintended consequence: our function can be used for detecting negative numbers:

double is_negative(double x)
{
  return isnan(sqrt(x));
}

Another way of approaching the function domain problem is to informally 'announce' that the function is not defined for certain values of input types and implement it with the assumption that the undesired values are never passed to the function:

double sqrt(double x)
// requires: x >= 0.0
{
  // proper algorithm:
  double y = 1.0;
  double prev_y;
 
  do {
    prev_y = y;
    y = (y + x / y) * 0.5;
  }
  while (!closeEnough(y, prev_y));
 
  return y;
}

This uses Newton's Iteration algorithm. The loop will terminate when values y and prev_y become close enough. They will only do that when x is non-negative. Supplying a negative value will make the loop run forever and hang the program. Another adverse effect of supplying an argument the function is not prepared for is an undefined behaviour (e.g., when providing a too large index to vector::operator[]). Another is the risk of function returning a result that has no meaningful interpretation.

Note that in the example above even a clever analyser is unlikely to detect from the function body that passing a negative input to this function is something wrong. Thus we cannot say that checking such condition is only the Quality-of-Implementation issue. But everything is fine as long as no negative number is passed in. 'Announcing' that the function is not prepared for just any input is informal: it is either written in the comment, or in paper or html documentation, or only spread by word, or assumed to be obvious. This risks an unintended negative program behaviour, like a halt, or an undefined behaviour. However, it is often preferred because of performance and inability to come up with a reasonable fall-back action. Consider std::vector::operator[], it does not check the bounds for performance reasons. If it was used in the following code:

for (int i = 0; i != vec.size(); ++i) {
  if (i > 0) cout << ", ";
  cout << vec[i];
}

The condition in the loop already checks if the index is within bounds. If operator[] was also checking the same condition inside, we would be wasting time on doing redundant checks. Also, for some functions it is not possible to think of any fall-back action. For instance std::vector::swap must not throw exceptions and returns void, yet, it is not well defined when two vectors' allocators do not compare equal.

Not checking for argument being within domain and delegating the check to the callers is desirable, especially that the caller is in a better position in verifying if the domain of the function. Consider:

return sqrt(abs(x));

The caller can be sure that no negative value is passed to function sqrt() even though no check is performed at all. Function domain defined this way is a contract between function author and function callers. The author defines the domain and assumes he will not get values from outside the domain. The caller guarantees that she will not pass values outside the domain. Out of all contracts described here, this one is the most likely to be broken, because it is the only one where the person responsible for guaranteeing is different than the person declaring the contract. The user may not be aware that a function has requirements on the values of parameters, or may misunderstand what the requirement is. Therefore this contract requires the most serious attention and support in the language.

What we need is the ability to declare the function's domain along with the function, so that it is visible to the callers and to the automated tools. How the tools can handle this is a secondary issue, but the behaviour can be: (1) automated documentation generation, (2) additional warnings from static analysers, (3) additional instrumentation injected by compilers, (4) compiler optimizations based on domain declarations, (5) the standard can adopt this notation to more formally specify its preconditions.

Specify function's range

Here, we use term 'range' in the mathematical sense: the set of all possible values that the function can return (a subset of function's co-domain). In the above example with function abs(), a hypothetical tool could detect that expression sqrt(abs(x)) is always fine only if it knew that the range of abs is equal (or smaller) than the domain of sqrt. We need to be able to constrain the allowed function output in order for automated tools to be able to verify if other functions' domain requirements are satisfied.

In this case a function author specifies the contract and (likely) the same author guarantees it. Function's user can rely on the guarantee. It is much less likely that this contract is broken, because it is the same person that declares and later ensures the obligation. And conversely, if the author makes an error in the function, he is equally likely to make an error in specifying the contract.

Block-level assertions

This type of value constraint is quite familiar to many C++ user, as it is implemented with macro assert and some similar tools in many libraries. If we think of it as a contract, this time a function author defines it, he ensures that it holds, and he is the benefactor of the contract: he guarantees something to himself, namely that certain state of a block-local variable or set of variables shall never occur at the point of the assertion. At the same time the function author guarantees something that he can be held accountable for, and relies on the guarantee expressed with the assertion.

Class-level assertions

A similar constraint on variables is often required for (non-static) data members of objects of a given class. Consider the following example:

class WeighedAverage
{
  double wgt1_ = 0.5;
  double wgt2_ = 0.5;
public:
  double average(double val1, double val2) const; 
  // ensures: answer is between val1 and val2
  { return wgt1_ * val1 + wgt2_ * val2; }

  void set_1st_weight(double w1); 
  // requires: w1 between 0.0 and 1.0
  { wgt1_ = w1; wgt2_ = 1.0 - w1; }

  // always true: wgt1_ between 0.0 and 1.0
  // always true: wgt2_ between 0.0 and 1.0
  // always true: wgt1_ + wgt2_ == 1.0
};

Note the three comments at the bottom. They are something that could be expressed with macro assert, except that assert is an expression and cannot be put at the class scope. Note also that members wgt1_ and wgt2_ are private. So, we are not declaring anything to the outside world. Word "always" needs to be more formal. It applies to any object of this class whose life time has begun and have not yet ended, except that member functions can temporarily compromise it provided that they restore it upon exit (either via exception or a normal return).

Class-level assertions (or class invariants) could be used as a criterion for good abstractions, much like axioms in concepts. If you cannot specify an invariant, it might be an indication that the design/abstraction may be poor or this should not have been made a class.

Specify equivalent expressions

Often library interfaces offer more than one way of doing the same thing. For instance, checking if a container contains at least one element can be checked in two ways: cont.empty() and cont.size() != 0. The latter is more general and could be used to check for emptiness, but there are occasions where the former can execute faster. This justifies the existence of the two forms. However, the job of matching one function's domain with other function's co-domain would be severely impeded if either specified the contract with a different (albeit equivalent) expression. Consider:

template <typename T>
class vector
{
  void resize (size_t s); // ensures: this->size() == s
  const T& front() const; // requires: !this->empty()
};

vec.resize(1);
return vec.front();       // safe?

Can we match what resize guarantees with what front expects? The two expressions are different. It would be much easier if we were able to explicitly declare that for any std::vector expressions vec.empty() and vec.size() != 0 are equivalent. This is somewhat similar to class-level assertions, but the declaration applies to more than one class, now we are only interested in the public interfaces, and we can express equivalence between expressions that have side effects. for instance, we can specify that vec.clear() is equivalent to vec.resize(0). Out of all "constraints" described in this section, this is the only one where term "value constraint" is inadequate. This is a constraint on the semantics of expressions. But we need it for the automated tools to make an effective use of other value constraints.

Loop variants

Note that this is not the same as loop invariant (which can be handled by a block-level invariant). This is somewhat similar to an assertion. It can help convince oneself that a non-trivial loop will ultimately terminate. Sometimes it is not easily seen, because there is no loop counter. We may for instance inspect elements in the collection by skipping some elements, increasing or shrinking, the collection. A loop variant is an expression that evaluates to an integral non-negative number. It is expected that in each loop iteration evaluating this expression renders a number smaller than in the previous iteration. It is also expected that when the value renders zero, the loop terminates.

A framework for supporting value constraints need not address all the above needs. We believe that only supporting the declaration of function domain is a big help in itself. Note that while assertions loop variants need to work with expressions, the features for specifying function domain and co-domain need not necessarily use expressions. We address this in detail in the later sections.

Relaxations of implied function domain

Consider std::set_intersection: it takes two "input" ranges and one "output" range as its input. The Standard requires that neither of the "input" ranges shall overlap with the "output" range. However, it is possible to make an implementation that works even if such overlap exists. Suppose one vendor wants to offer the implementation that allows overlapping on input, and he wants to commit to allowing it in the future releases of the library. He "relaxes" the requirements that every user may (an should) suspect are there (even if not mentioned explicitly). It is a useful annotation that can help the user decide to use the algorithm (if the user doesn't need a cross-platform code).

What use can be made of value constraints?

Value constraints exist, even though there is no language support for them in the language. We have seen in the above examples that comments were used to convey the information. In this section we list the benefits obtained form standardizing or formalizing value constraints this way or the other.

In order for an automated tool to understand the constraint, we have to provide a special-purpose syntax. Some tools expect that it is put inside source code comments. For instance [ACSL] uses the following notation:

/*@ requires x >= 0;
    ensures \result >= 0; 
*/
double sqrt(double x);

But for structured comments like the above, C++ has an alternative: [[attributes]]. Attributes are akin to comments because they intended not to affect the semantics of correct programs. On the other hand they can be used as hints for optimizations, warnings and any kind of program analysis. If the expectation of value constraints is to affect the program behaviour, attributes will not do, and a special language feature would need to be devised.

An ideal — and unrealistic — support for value constraints is for compiler to check any possible flow in the program and signal a compiler error whenever a breach of the constraint is going to happen. Because this is not doable, the we aim at the support that is not entirely satisfactory, but believed to be better than none.

Improved documentation

One of the obvious uses of formalized value constraints is the automated generation of documentation. Even if developers do not use any tool for generating documentation, there is still a gain. When a developer sees the declaration of the function she is going to use, she can immediately see the value constraints as well. If they are a language feature, developers are more encouraged to use them (even though comments would do). Since this does not affect program semantics [[attributes]] are sufficient.

Static analysis

Value constraints could enable static analysers to detect a potential breach of value constraints and issue warnings. Again, using value constraints this way does not affect program semantics, so [[attributes]] would do.

Constraint-based compiler optimizations

In that case, compiler is able to arbitrarily change the behaviour of the program in case where a value constraint has been violated. So, adding a value constraint may change the meaning of the program. In this case attributes will not suffice, and we need a language extension for specifying value constraints.

Auto-generation of runtime checks

This is what Eiffel provides and what [N1962] proposes. Value constraints need to be C++ expressions. An implementation can (optionally) inject, at well defined places, a code that evaluates at run-time expressions representing value constraints. If any (boolean) expression evaluates to false, a certain action (like calling std::terminate) is performed. The insertion of additional logic requires that value constraints are introduced as a language feature. In fact even more language and library features is required to control when and how the run-time checks are performed and responded to.

Run-time evaluation of value constraints

In this section we focus on one possible approach: treating value constraints as expressions and evaluating them at run-time.

Side effects of the expressions

Using expressions with side effects in value constraints is generally discouraged, but sometimes it might be difficult to spot that we have a side effect. For the purpose of this discussion we also consider run-time overhead, and especially the run-time complexity as a side effect. To minimize the possibility of invoking a function with a side effect [N1962] proposes that only const-qualified member functions are allowed; but even these can modify non-member variables, and it is not only member functions that may need to be used to express value constraints. Ideally, we would like to use only pure (referentially transparent) expressions, but C++ as of today does not offer the possibility of detecting if an expression is pure. Although the definition of relaxed constexpr functions makes a step towards pure functions.

A practical question to be answered, given the syntax from [N1962], is if the following to be a legal code for detecting if run-time value constraint checks are enabled?

int preconditions_on = false;
void test() precondition{ preconditions_on = true; }
{
  cout << "preconditions are on: " << preconditions_on;
}

Note that we assign a new value to a global object in the precondition. One possible way to address this issue is to say that any modification to the program state in value constraints renders an undefined behaviour. Another is to say that this is correct a program, but the expression may or may not be evaluated. This makes it an unspecified behaviour.

And similarly, is the following a reliable way of checking if a number is negative?

double sqrt(double x) precondition { x >= 0.0; };

double is_negative(double x)
{
  set_precondition_broken_handler(&throw_exception);
  try { sqrt(x) };
  catch (exception const&) { return true; }
  return false;
}

We probably do not want to encourage such contrived usages, but can this be expressed in the standard? If we say that it is unspecified whether value constraints are evaluated or not, we would make it clear that the code as above is not reliable. But then again, someone else may be disappointed that his precondition is not guaranteed to be checked.

If we consider value constraints with side effects an 'invalid' code, do we force the compiler to reject every such case with a diagnostic message? If it is impossible in the general case (and so it seams), we have to accept that such code is valid, but do we allow the compiler to reject these cases where it can prove that we have a side effect?

Value constraints in overloaded functions

Consider the following declaration:

template <typename InputIter>
void displayFirstSecondNext(InputIter beg, InputIter end);
// requires: std::distance(beg, end) >=2

Is function template std::distance referentially transparent? The answer is: it is not templates that can or cannot be pure but functions. Some functions instantiated from this template will be pure, others will not — it depends what iterator type the template will be instantiated with. Consider InputIterator. In the worst case (of std::istream_iterator) incrementing the iterator invalidates other iterators referring to the same stream. This is a tricky behaviour: by changing our internal copies of objects (iterators), we alter (invalidate) other, external objects. Function std::distance does increment iterators. If our precondition was to be evaluated, this might cause an undefined behaviour in the program.

Thus, we have an expression; it is pure and natural to use for some instantiations, and has severe side effects for other instantiations. We would like the predicate to be evaluated for forward iterators and never evaluated for other iterators. How do we want a concept-programming framework to address the cases like this one?

One option is to say "this case is special: you will not be able to express a precondition". Another is to require of the authors to provide two overloads: one for InputIterators, the other for ForwardIterators. They will do exactly the same thing except that one will have a precondition and the other will not. Note that you cannot just say "if the predicate is constant evaluate it otherwise don't evaluate it". In either case we take iterators by value. It is just that input iterators do not expose value semantics — this cannot be checked by the compiler.

The above case can be solved by introducing a yet another feature: an annotation that a precondition is only evaluated when some compile-time condition is met: in our case only if the iterator is a ForwardIterator. Note that one global switch (similar to NDEBUG) saying "disable/enable all value constraint evaluations" will not be enough.

Inadvertent increase in run-time complexity

There are other reasons for having a more fine grained control of which checks should be enabled. Consider a binary search algorithm. Its runtime complexity is O(log n). It has an obvious precondition that the range we search through is sorted. Performing the sorted check has complexity O(n). Thus by evaluating the precondition, we change the algorithms complexity. This might be unacceptable, even for some debug/testing builds. Runtime complexity is a sort of side effect of an algorithm, and it can be silently added to the function, especially when following a good practise a developer adds an obvious precondition, which would be his primary task, and forgets the secondary task of specifying (using some contract-specific sub-language) the conditions under which this precondition shall be evaluated.

Run-time response to a broken contract

[N1962] as well as [N4075] propose that the action taken upon violating a value constraint should be configurable by the programmer. While we agree this should be configurable, we object to the way of achieving this by std::set_terminate-like registering function. Language allows that std::set_terminate can be called multiple times from multiple places: different, possibly dynamically-loaded/shared libraries. This makes sense for std::terminate. Every part of the system may acquire a critical resource: one that has to be released even if "exception handling must be abandoned for less subtle error handling techniques". In that case the component also needs to register an additional clean-up function. In case of broken contract situation, we do not need or want that flexibility. We believe only the person that assembles the final program from the components and libraries, should be empowered to to decide how broken contracts are handled. Only him does have the knowledge if this is a test or a retail build; if the slow-down of some run-time checks can be afforded. Libraries do not know that: they do not know in what environment they will be used.

Additionally, the std::set_terminate mechanism has a certain flow. Unless you apply some syntactic contortions, you cannot set the handler for functions executed before main().

We are not aware of any mechanism currently available in C++ that would facilitate our needs. But since function main has the property of being required to be defined only once across the executable program, a solution we are leaning towards is to apply some special annotation around function main that would indicate what the user's preference is on the handling of broken contracts.

What can a broken contract handler do?

The most reasonable default answer appears to be std::terminate, which means "release critical resources and abort". One may wish to override the default in order to do special logging. We believe that it is not a good idea to throw from a broken contract handler. First, throwing an exception is often an action taken inside a function,in a situation where it cannot satisfy the postcondition, to make sure that class invariants are preserved. In other words, if you catch an exception or are in the middle of stack unwinding, you can safely assume that all objects' invariants are satisfied (and you can safely call destructors that may rely on invariants). If it were possible to throw from the handlers, his expectation would be violated. Also, you can imagine functions that are marked as noexcept and still impose a value constraint (which happens to fire). [N4110] argues that "the exception can be safely thrown in this case (even if the function is noexcept) as it would be thrown in the caller side and not in the callee side". We disagree with this reasoning. Consider this example:

bool positive_predicate(int i) noexcept /* requires: i > 0 */;
bool predicate(int i) noexcept  /* requires: true */
{
  if (i < 0)
    return negative_predicate(i);
  else
    return positive_predicate(i);

Now imagine that we make a call predicate(0). Due to a bug in the function, we call positive_predicate with a broken precondition. It doesn't help that the throw from the handler is made from the context outside of function positive_predicate, because it is still made inside the caller function predicate, which is also noexcept.

The recommendations from [N3248] (that functions with narrow contract should not be defined as noexcept) may also apply in the case of value constraints development. We consider it an open issue.

Disabling run-time checks

Unlike with the static analysis approach, evaluating value constraints adds run-time overhead. Wile the overhead is acceptable in some cases, it is desirable to be able to globally disable the checks in other cases. This issue is similar to C-style asserts, which can be retained in debug/test builds but removed in release builds. C-style asserts are disabled by recompiling the entire program; this includes all 3rd party libraries, or linking with the alternative pre-compiled versions. Can this be made better for language-level value constraints? [N1800] suggests an implementation (for preconditions) that could be capable of handling such requirement: each function has two entry points: first that checks the precondition and then evaluates the function, second that skips the evaluation of the precondition. Once you have the two entry points it is just the question of changing the addresses. This solution results in bigger libraries although the final executables could be optimized.

Assisting static analysis

Automated tools already perform static analysis, even without any support from the programmer. However, additional hints from developers could enable even further analysis, or enable vendors to add analysis in compilers at lower expense. In the example like the following, a compiler can fairly easily detect that a potential undefined behaviour is at hand:

int* produce()
{
  if (cond) return &global;
  else      return nullptr;
}

void consume(int* p)
{
  use(*p);
}

consume(produce());

But even here, certain problems occur. What if the two functions are compiled separately; e.g., if produce is part of a separate third party library? Even if compiler performs a per-file analysis, it may not have sufficient resources (RAM) to perform a global link-time analysis. Even if the analysis is successful and detects a potential undefined behaviour, it doesn't tell us who of the three: author of produce, author of consume or the guy who assembles them, is responsible for the situation and should fix the problem. This is somewhat similar to the situation we face today with template instantiation error messages. Compiler sees that there is a syntax/type error, it can give you the entire context, but it cannot tell at which level of instantiation process the source of the problem lays.

Also, as shown in the example with sqrt it is sometimes not possible to tell from only observing the reads from a variable what the function's requirement is. In the case of sqrt it is not the undefined behaviour that we want to avoid but an infinite recursion.

Another reason where assisting the static analysis may prove useful is when there is more than one way to express a given condition. Consider the following code:

class Vector
{
  const T& front() const; // requires: !this->empty()
  // ...
};

void test(Vector & vec)
{
  if (vec.size() != 0)
    use(vec.front());
}

You know that vec.size() != 0 and !vec.empty() mean the same thing, but compiler may not, especially if it cannot look inside the definitions of member functions (because they are compiled separately).

However, an analysis like this can produce too many false warnings. This calls for another [[attribute]] for indicating that the programmer doesn't want certain parts of the code to be warned about potential value constraint violations.

How long does a condition hold?

We illustrate the issue in question with the following two examples.

Iter test(Iter b, Iter e, Value_type<Iter> v)
{
  std::sort(b, e);
  fun();
  std::binary_search(b, e, v); // requires: is_sorted(b, e)
}

Assuming that a static analyser can somehow figure out that sort leaves elements in the state that satisfies the condition is_sorted, can it be sure that the condition still holds when fun is finished? After all fun could be manipulating iterators that alias b and e. Apparently, static analysis cannot give us a 100% bug-free guarantee. It can only helps detect certain bugs. This is where run-time checks can still prove useful even after a heavy static analysis.

Second, it is obvious that a certain condition does not last forever. How is the analyser supposed to know when a given condition ceases to hold? One easy answer would be to say that when a non-const operation is called on the object all conditions it might have satisfied are now considered not satisfied. But this would break even simple use cases:

void test(std::vector<int> & vec)
auto b = vec.begin(); // non-const: invalidates all conditions
auto e = vec.end();   // non-const: invalidates all conditions
std::sort(b, e);      // requires: valid_range(b, e)

Even if the analyser knows that vec.begin() and vec.end() form a valid range, they are a non-const operations and invalidate any assumptions about their value/state. Any language feature for assisting static analysis must provide a way of specifying which operations invalidate which conditions (e.g., a way of declaring some functions as non-modifying even when they are non-const). But this is likely to make the specifications very complicated, and discourage everybody from using the feature.

"Properties": a hypothetical language feature

In this section we describe a language feature that would enable defining value constraints in a somewhat different way than the "imperative" approach described in [N1962]. We call it properties, which have been described in [N3351]. A property is a new kind of "entity" in the language: it is neither an object, nor a function, it is just something else. It is introduced by an invented keyword "property":

template <typename T>
property bool is_non_empty (std::vector<T>);

A property is something that an object, or a group of objects can acquire at a certain point in time, and then loose at a different point in time. There is no function-body associated with the property: it only has a name. A property can be used in specifying pre- and postconditions:

template <typename T>
class vector
{
  void push_back (T const&); // post: is_non_empty(*this);
  const T& back() const; // pre: is_non_empty(*this);
};

vector<int> v;  // doesn't have property is_non_empty yet.
v.push_back(1); // acquires property is_non_empty
int i = back(); // required property is present
v.pop_back();   // looses property is_non_empty

How do we know when a property is lost? In the most simple case we could say that every mutating (non-const) function called upon our object discards all its properties. In our example, we know that property is_non_empty is not always lost after the call to pop_back, but in order to keep the specification language fairly simple we may need to accept this simplification. But the rule "discard every property on non-const function call" would imply that in our case even calling back() discards the property, so a more complex way of specifying how the properties are preserved may be required.

So, what do we buy using properties rather than considering any boolean expression such a property? First, properties are pure: they have no side effects. Second, we can express value constraints that are not expressible with expressions:

template <typename Iter>
property bool valid_range(Iter begin, Iter end);

template <typename T>
class vector
{
  // invariant: valid_range(this->begin(), this->end());
};

template <typename Iter>
void sort(Iter begin, Iter end) // pre: valid_range(this->begin(), this->end());

In this model of static analysis, one function's preconditions are matched with other function's postconditions. But because properties have no body, it is not possible for the static analyser to determine if a function that declares a property in its postcondition really satisfies it. Properties can be mixed with normal boolean predicates, though. The author decides if his value constraint is implementable or not.

We do not propose to add such properties to the language. We just want to show a different perspective on value constraints. [N1962] also offers this perspective, but we fear that the focus on broken contract handlers and the order of evaluation of different value constraints, might have obfuscated this idea. A similar effect (as having properties) could be achieved by annotating some value constraints (which use normal expressions) that they must never be evaluated.

Interactions with other C++ features

When should precondition hold?

A common answer to this question is "after function parameters have been initialized, before its first instruction is executed." While it looks good in typical simple examples, consider the following case:

void fun(Tool * const tool, int const i)
// requires: tool != nullptr
// requires: tool->value() >= 0;
{
  line_1: int j = transform(i); 
  line_2: int k = process(tool->value());
}

A precondition constrains the state of a remote object of type Tool, its state can change asynchronously. Suppose we can evaluate the precondition upon entering the function. We determine that it is satisfied. We successfully execute instruction line_1, we will now proceed to instruction line_2. It is only now that we really need the precondition to hold, but by now the state of the remote object referred to by pointer tool might have already changed (perhaps function transform modified it). While the precondition held upon entry, the same condition caused an undefined behaviour later on. Even if one argues that function must not break the precondition itself, one should still consider the possibility that while executing line_1 another thread may concurrently update the object referred to by tool. So, should precondition hold for the entire duration of its function? Or is it a bad idea to express a constrain on remote objects that we do not obtain by value? We do not have a ready answer for this question, but the question is very important, because this kind of interface is used in STL algorithms: they obtain and return iterators, which are handles to remote parts likely aliased and accessible by other threads.

On the other hand, the following scenario appears to be all fine:

double fun(double x)
// requires: x >= 0.0
{
    double v = sqrt(x);
    if (v > MAX)
      x = -x;
    return do_something_with(v, x);
}

In the conditional assignment we set the value of x to something that would have broken the precondition on entry; but now that we are reusing the variable we no longer care about the constraint.

constexpr functions and literal types

constexpr functions can be evaluated at run-time or at compile-time. Ideally, a compile-time evaluation of such function should result in compile-time error when its precondition or postcondition is broken. This is possible if expressions in preconditions and postconditions are core constant expressions. It appears reasonable to require that constexpr functions have only constexpr value constraints.

A similar question needs to be answered for literal types and invariants. For literal types it is required that they have at least one non-copy/move constexpr constructor. This constructor can be used to create a compile-time constant, whose invariant is expected to hold. This may require that the invariant should also be constexpr, as otherwise it is impossible to verify it at compile time. On the other hand, the interface of user-defined literal types can be divided into compile-time (constexpr) and run-time part. The effects of the other one may alter other parts of the program state that may only be invariant-verified with run-time expressions. More, some constructors are constexpr even for non-literal classes: their purpose is to guarantee the static initialization of global objects. It is not clear if this should affect the constexpr requirement on the class invariant.

Function's public interface

For the purpose of specifying the semantics of value constrains, we may need to redefine the concept of class's public interface. Consider the following definition:

class Account
{
public:
  void modify(Param p);
  friend void swap(Account& a1, Account& a2);
private:
  // ...
};

Function swap, although it is not a public member function, clearly is part of Account's public interface. I.e., we allow that it temporarily breaks the invariant of objects a1 and a2. The technique with defining operator-like free functions as friends is not just a trick. In fact it is sometimes recommended for operators like operator== when the two arguments are to be treated symmetrically.

At this point we are not sure if it is sufficient to define class's public interface as the set of its public member functions and its friend functions. An additional way of specifying what is class's public interface may be necessary. This is a problem not limited to value constraints. Such specification could be useful for enabling a more fine-grained function lookup rule. Current ADL is known to pick too many overloads.

When should a class invariant hold?

When is class invariant supposed to hold? Objects are manipulated via class's public interface. Functions from class's public interface (public functions or friend functions) often need to compromise the invariant, but they always put the invariant back when they finish -- even via an exception. This means that as long as no function from the object's public interface is being invoked on it, or any of its subobjects manipulated, the object's invariant should hold. Consider:

Tool t;
_1:;
fun1()
_2:;
fun2();
_3:;
t.fun();
_4:;

In run-time checking approach it is verified that the invariant of t holds at points _1, _3 and _4. But we also claim that the invariant should hold at point _2. It is impractical to inject a run-time check in there, but is static analyser detects that the invariant may be broken at _2, it should definitely report it as invariant breach.

This is different from the common notion that invariants should hold before entering any public function or destructor and after leaving any public member function or constructor. For instance, inside a public function you can invoke other public functions and around these nested calls you do not expect the invariant to hold. And conversely, an invariant should hold at any time between the calls to public interface. Putting the run-time checks in thus identified places is impossible, which proves that the approach "check before and after the call to any public member functions" are the consequence of implementation limitations -- not the definition of a class invariant. This also shows that run-time checks are not enough to guarantee the safety of the program: static analysis may be required to complement them.

Composing invariants

Given some class D, apart from its "direct" invariant, we expect that invariants of D's bases classes and D's data members also hold. This can be implicit, you do not have to make recursive definitions yourself, similarly like you do not have to specify in the destructor which subobject's destructors need to be called. But this does not cover all cases. Consider class optional (similar to Boost.Optional), it can be implemented as follows:

template <typename T>
class optional
{
  bool is_initiailzed_ = false;
  std::aligned_storage_t<sizeof(t), alignof(T)> storage_;
  T* storage() { return *static_cast<T*>(&storage_); }
public:
  explicit operator bool() const { return is_initialized_; }
  T& operator*() { 
    assert (is_initiailzed_);
    return *storage(); 
  }
  optional(const T& v) {
    new (storage()) T(v); 
    is_initiailzed_ = true;
   }
  ~optional() { 
    if (is_initiailzed_)
      storage()->T::~T();
  } 
};  

Obviously, we store (sometimes) an object of type T inside, but it is neither a base class nor a member sub-object. Yet, if it is there (if optional object contains a value), we want its invariant to hold, and compiler may not be able to deduce it. We might need to express a "conditional" invariant:

// invariant: !is_initialized_ || invariant(*storage());

Similarly, a vector of Ts (std::vector<T>) can be thought of as being composed of the Ts it contains. This is reflected in how its equality comparison is defined and how const-ness is propagated. A value of the container is composed of the values of its elements. In a similar manner, we would like to be able to express invariants on composed types, i.e., the vector's invariant holds iff some vector-specific conditions are satisfied and all vector elements' invariants hold. We are not quite sure, if this needs to be specified explicitly, or is it not enough for the machine to observe that if there is some nested T accessible from the vector, its invariant must hold. But will the machine know that the result of vec.back() should have its invariant hold? after all, calling back() is not always allowed, and may render a undefined behaviour.

This issue does not appear if we adopted the view that invariants hold immediately before and after the call to public interface functions. But we do not constrain our analysis to this "run-time checks" view.

Another open question is if we should check invariants of member subobjects if they are references (and therefore do not pertain to the master object’s value).

Protected members

Can protected member functions be called in pre- and post-conditions? In public functions, this makes no sense: the user of the class may not have access to protected functions; and it is expected that the user in order to verify if a precondition holds should be able to evaluate the condition herself.

On the other hand, protected functions may also need to specify their preconditions as they constitute an interface for deriving classes (which can also be considered users), and for them calling other protected members appears reasonable.

Similarly, it is possible that class's protected interface may assume an invariant that is narrower than the class's public interface's invariant. Protected members may require less of the program state than the public members, but they still may require something. In other words, we claim that a class exposes two interfaces: one for external callers and the other for derived classes; the two interfaces may require two sets of value constraints. Not sure if this use case is worth complicating the language.

Constraining STL

As a usefulness test, any proposed value constraint framework should be checked if it is able to express value constraints in STL. We consider some selected examples. We do not insist that a proposed framework should be able to handle them, but consider it rather a test of expressive power. The goal is to help determine where we want to stop in expressing the semantics in the language.

Case 1: a valid range

This is probably the most difficult one to handle. Nearly any algorithm on STL ranges requires that end is reachable from begin. It cannot be validated by executing any expression. This is why it is required as a precondition: the caller may have better means of ensuring that this requirement is satisfied. For instance, it is obvious that c.begin() and c.end() called on any STL container form a valid range. Is a value constraints framework capable of expressing that the values of c.begin() and c.end() satisfy the requirements of, say, std::for_each?

Case 2: sub-ranges

Similarly, in the call m = std::find_if(b, e, p), assuming that b and e form a valid range, can it be verified or even expressed that b and m as well as m and e form valid ranges and that these combinations should satisfy the requirement of subsequent STL algorithms about valid ranges?

Case 3: sorting

Consider functions sort and binary_search from the Standard Library (we consider only the overloads with the predicate). The former guarantees that it leaves the range sorted with respect to the given predicate. The later requires that the range is partitioned with respect to the given predicate and the given element. Being partitioned is something less than being sorted. for instance range {3, 1, 2, 5, 9, 8, 7} is partitioned with respect to predicate [](int i){return i<5;} but is not sorted.

Solution with properties and axioms

Below we show how the above task can be solved with sophisticated features like properties (described above) and axioms (similar to those described in [N2887]).

Case 1

template <typename I>
property bool is_valid_range(I b, I e);

template <typename T>
axiom vec_range(std::vector<T> v)
{
  is_valid_range(v.begin(), v.end());
}

template <typename Iter>
void sort(Iter b, Iter e)
precondition{ is_valid_range(b, e); };
  
vector<int> v = {/*...*/};
sort(v.begin(), v.end());

Case 2

template<typename I, typename V>
I find(I b, I e, V v)
precondition{ is_valid_range(b, e); }
postcondition(f) {
  is_valid_range(b, f);
  is_valid_range(f, e);
  // ... more
};

vector<int> v = {/*...*/};
auto i = find(v.begin(), v.end(), 7); // axiom vec_range
auto j = find(i, v.end(), 7);         // postcondition of find

Case 3

template <typename It, typename T, typename Comp>
bool is_partitioned_for_element(It b, It e, T v, Comp c)
{
  return is_partitioned(b, e, [&v](auto&& e) {return cmp(e, v);})
      && is_partitioned(b, e, [&v](auto&& e) {return !cmp(v, e);})
      && all_of(b, e, [&v](auto&& e) {return !cmp(e, v) || !cmp(v, e);});
}

template <typename It, typename T, typename Comp>
axiom sorted_is_partitioned(It b, It e, T v, Comp cmp)
{
  is_sorted(b, e, p) => is_partitioned_for_element(b, e, v, cmp);
}

template <typename Iter, typename Comp>
void sort(Iter b, Iter e, Comp cmp)
postcondition{ is_sorted(b, e, cmp) };

template <typename It, typename T, typename Comp>
bool binary_search(It b, It e, T v, Comp cmp)
precondition{ is_partitioned_for_element(b, e, v, cmp) };

Can the above tasks be accomplished with "run-time checks" approach? The biggest problem is that is_valid_range is not implementable in the general case as a function (template). The only way to check if two iterators form a valid range is to increment the first one and check if we reach the second one. But in case we get an invalid range, such function rather than returning false will run forever or render an undefined behaviour. Rather than introducing a new notion of a "property", we could annotate (with attributes) some functions as "not defined on purpose". If the compiler sees such function in a value constraint it skips the run-time check (but may still perform static analysis). This way we could mix normal functions with non-implementable functions in preconditions.

Again, we do not propose to standardize the above, but we show that preconditions in STL algorithms are expressible with a sufficiently complex facility.

What do we need to standardize?

The language feature of value constraints can be divided into two main parts. (1) What value constraints we define and how, (2) how do we handle run-time response to a broken value constraint.

Defining value constraints

At this point it is too early to decide on the details, but one thing appears fairly uncontroversial. Value constraints support needs to accommodate two fairly opposite requirements: (1) no run-time overhead in cases like (operator[] for vectors), (2) guaranteed (in certain situations) evaluation of the predicates (where performance requirement is not critical).

This calls for the following semantics:

  1. If a piece of code (a function or a block) relies on a given value constraint C (e.g., a function relies on its precondition, a public member function relies on class invariant), and this code is executed but value constraint C would return false if it were evaluated (but no evaluation is required), the behaviour of the program is undefined. — this accommodates the situations ranging from buffer overflow to calling terminate handlers.
  2. If the control reaches the point where a value constraint could be evaluated (for preconditions this moment is just before function call, etc.), it is unspecified whether the value constraint is evaluated once or not evaluated at all. — this handles the behaviours ranging from performing all run-time checks to disabling them all by a compiler switch or some [[attributes]].

One thing that appears non-controversial is that declarations of preconditions and postconditions need to be part of function declaration, so that they are accessible by the function users who may not have access to function definitions. We leave it open whether a value constraint declaration should affect function's signature or type or affect the overload resolution mechanism; although it looks like affecting function's type would make it impossible to retrofit value constraints into the existing code.

Run-time response

One option is not to standardize what should happen when we detect at run-time a broken value constraint (whether to throw or terminate or abort or ignore), and let the vendors decide how they want to handle these cases. With this approach a minimum conforming implementation only needs to syntax- and type-check the declarations.

But there may be parts of programs that require a guaranteed run-time evaluation of a precondition in order to prevent the subsequent function execution and a guaranteed release of critical resources. The minimum requirement then is to annotate "critical" value constraints and a call to std::terminate, whose goal is already to collect the critical resources whose release cannot be skipped.

It may be useful to standardize a set of [[attributes]] for giving hints on what conditions to evaluate value constraints, e.g., "evaluate only if T satisfies some requirements", or "this constraint has complexity O(N), evaluate it only when O(N) and bigger complexities are enabled", "never evaluate this precondition", or "never evaluate this function if it is in a value constraint".

Concerns

Static analysers exist today and apparently they do not need these value constraint declarations. To some extent they can infer the constraints from the implementation. It is not clear to how extent value constraints will help the analysis. Even if constraints for STL can be formalized, it is not clear if any analyser will want to perform this deep analysis. We cannot predict how useful this will be in practice. On the other hand, giving names to constraints may render the messages from the analyser more friendly to the users.

Another concern that we have is that if we combine all kinds of possible function contract, like concepts, value constraints, conditions on these constraints, the declarations could become unbearably long. Reconsider the example of std::find from above. In order to make it short, we made it incorrect. The postcondition cannot be defined for non-ForwardIterator types. If we tried to really constrain std::find, (using some invented syntax and assuming Concepts Lite, and the new Range design being proposed (see [N4128])) it would go:

template<InputIterator I, Regular S, Regular V>
  requires EqualityComparable<I, S> && EqualityComparable<Value_type<I>, V>
I find(I b, S e, const V& v)
precondition{ is_valid_range(b, e); }
postcondition(f) {
  is_valid_range(b, f)
    requires ForwardIterator<I>;
  is_valid_range(f, e);
  [[contract::never_evaluate]]
  none_of(b, f, [](const Value_type<I>& x){x == v})
    requires ForwardIterator<I>;
  [[contract::complexity(constant)]]
  f == e || *f == v 
    requires ForwardIterator<I>;
};

Does this declaration not look scary?

Implementability

In this paper we have discussed two uses of value constraint declarations: run-time checks and static analysis. Auto-generation of run-time checks has already been implemented in a couple of languages, e.g., in Eiffel and D. To a great extent it has even been implemented in C++ as a library (see [CONTRACTPP]). However, the idea of inserting time-consuming run-time checks to a performance-critical program may not be the right route for C++. Regarding static analysis, some languages for defining and exploiting value constraints have been used for languages like C (see [ACSL]), we are not aware of any attempt at implementing a useful system it for C++ with its templates and the Standard Template Library with its constraints. Of course, the minimum implementation would only need to type-check the declarations, but for that we do not need to standardize the constraints. We do not know if the constraints would really help improve static analysis beyond what current analysers already do (with no help from programmers), and we will not know until a tool like this with its own language for constraints is built.

Acknowledgements

Lots of content in this paper is inspired by work in [N1962], [N2887] and [N3351].

Lawrence Crowl reviewed the paper, highlighted issues and suggested improvements.

Gabriel Dos Reis suggested the "STL test" for a precondition framework.

Joël Lamotte and Tomasz Kamiński offered a number of useful suggestions that improved the quality of the paper.

References