N4248 Library Preconditions are a Language Feature

ISO/IEC JTC1 SC22 WG21 N4248 - 2014-10-12

Alisdair Meredith, public@alisdairm.net

Introduction
What are Library Contracts?

Contract Violations
Contract Violations are not Undefined Behavior

Contracts as a Programming Construct

A Simple Proposition
Concern with Complexity
Benefits of Language Support

Proposed Solution

Straw Man Syntax
Multiple Build Modes
Checking From The Caller
Interaction With noexcept

Preconditions are always noexcept(false)
Preconditions are ignored
Precondition checks depend on build mode

Let The ~~Wookie~~User Win

Further Issues

One Definition Rule
Declaration vs. Definition
Querying for Contracts
Cost of Evaluation
Simple and Elegant Syntax
Not All Preconditions Are Amenable To Checking

Summary
Appendix A: Existing Support for Contracts in C++14

const Correctness
Non-null Pointers
unsigned Integer Types
Ownership Patterns And RAII
Pure Virtual Functions
override Specifier In C++11
static_assert
Concepts
noexcept
[[noreturn]]

Acknowledgements

Introduction

This paper is motivated by the confluence of several discussions over the last few years, and draws together several ideas trying to find the best point in the (language) design space given competing tensions. The most obvious inspiration is the ongoing debate over appropriate use of noexcept specifications in the standard library, but this also draws on the experience of the recent paper on a better assert facility, N4075, and the early work in the area of contract programming by Thorsten Ottosen and Lawrence Crowl, N1962.

The key recommendation of this paper is to add support for a feature to declare preconditions on function parameters that is visible to the compiler, and address the most important design constraints on that feature. It does not provide formal wording yet, nor even a preferred syntax. For the sake of exposition, the keywords from the concepts TS are overloaded to suggest what this might look like, but are not intended to be a formal suggestion for the final syntax.

What are Library Contracts?

For the sake of this paper, Contracts are the agreement between a library and its callers, that describe the outcome of a function call given valid inputs. One of the important aspects of a contract is to define which subset of inputs are valid. When all possible inputs for the function parameter data types are valid, a contract is said to be wide, otherwise there is some notion of out-of-contract behavior and the contract is said to be narrow. Commonly, some subset of valid inputs can produce error conditions, and the contract describes how such error conditions are reported, commonly through exceptions or error codes.

In addition to defining the range of values permissible for function parameters, a contract for a template must also call out the requirements on the template parameter types. Often, such a contract will mandate that some failures to satisfy these requirements will be met with a compilation failure, such as by a static_assert, and my go further to indicate that such failures would cause a failure subject to SFINAE.

In the ideal world, all aspects of a contract are documented and available for users to rely on. In common practice, documentation frequently does not keep up with evolving software, and for some projects the code itself is deemed documentation enough, where the contracts must be inferred from the current implementation, making it difficult to diagnose bugs when it is not clear which side of the contract has been violated.

One particularly relevant case for WG21 is the C++ Standard Library which is entirely defined by contract, and has multiple implementations all trying to satisfy the same set of contracts.

Contract Violations

There are several categories of contract violation. The simplest to describe is the case that the implementation does not deliver the contracted behavior. This is generally a bug by the implementor of the library, and ideally would be caught and reported by their test suite. There are two simple ways to resolve such violations:

Correct the code to satisfy the contract
Amend the contract to agree with the code

Generally speaking, it is preferable to correct the code once a library has been deployed, as it is difficult for existing customers to audit all of their code to be sure it conforms to the amended contract. However, if the library has not yet been deployed and the incongruence if reported by a test suite, for example, then amending the contract may indeed be a viable approach.

The second kind of violation is when a caller of the libraries supplies function arguments that are outside the range of contracted behavior. This is an error on the part of the library user, and will frequently result in undefined behavior. Indeed, the standard library specification uses exactly that phrase, and for all intents and purposes, for a library customer, violating the library contract should be treated as invoking undefined behavior.

However, a contract violation is not undefined behavior as far as the compiler is concerned, as the compiler cannot see the library contract. A common feature of high quality libraries is some sort of diagnostic or debug mode that will help users find their own violations of library contracts. These often involve redundant checks on function arguments that are disabled in release builds for production, but enabled in debug builds for testing and validation. The C++ Standard Library provides the <cassert> header to support this library practice, and the proposal N4075 and its successors seek to enhance this approach, based on existing practice.

Contract Violations are not Undefined Behavior

One of the key observations of this paper is that contract violations are not undefined behavior, as the core language means the term, but will often result in such undefined behavior. After a contract violation, the library author is still in control, and has the option to detect and report that violation in any manner they choose, including using only well-defined behavior as far as the compiler is concerned.

This distinction becomes clearest when considering a constexpr function where the language is required to diagnose all undefined behavior in a constexpr evaluation:


auto divide(double a, double b)
  // Return a/b.  The behavior is undefined if 0.0 == b
{
  if (0.0 == b) {
    // out-of-contract, return a NaN for now but subject to change
    return numeric_limits::quiet_NaN();
  }
  return a / b;
}

constexpr double d = divide(1.0, 0.0);  // error?

If a contract violation is undefined behavior, then the compiler is required to diagnose the error in definition of d. However, as this particular implementation does not rely on any core undefined behavior, then the compiler has no way to see that the contract has been violated, regardless of whether the library documents undefined behavior - unless this is seen as a contractual requirement on the library to ensure core undefined behavior in such a call.

Contracts as a Programming Construct

A Simple Proposition

This paper makes a simple proposition - that we should introduce a syntax so that libraries can more clearly declare their contracts in a manner visible to the compiler. In practice, the details are rarely as simple as the proposition. For example, the concepts TS is a detailed attempt to introduce a language to declare contracts on type parameters in function templates.

Appendix A gives several examples where C++ already has explicit support for several common kinds of contracts that we deemed such important special cases, they got their own specific support in the language.

Previous motivation for an elaborate feature was presented by Ottosen/Crowl in N1613, and N1800.

Concern with Complexity

My main concern with the original proposal is that the complexity of the feature, especially use of the feature, outweighs the benefit. My standard example for this comes from Kevlin Henney, who asked me to define the contract for a simple sort function. Almost everyone describes the basic post-condition that the destination range is sorted according to the predicate supplied to the function (or the default predicate, usually operator<). Most people miss the important post-condition that the output range is a permutation of the input range. This is the kind of precision that is necessary for the language of post-conditions and invariants to pay off, and we have not even talked about validating the is-permutation postcondition without violating the algorithmic complexity constraint of a good sort algorithm.

Benefits of Language Support

There are real benefits to including language support for contracts, and specifically for pre-condition contracts. While invariants and post-conditions are the responsibility of the library implementer, and can be adequately checked with a good test suite, pre-condition violations are the responsibility of the library caller. Anything we can do to make it easier for library users to find their own errors in calling 3rd party code should lead to more reliable software.

In addition, by exposing (selected) contract details to the compiler, they become accessible to the optimizer. While the effect is not as significant without the benefit of post-conditions and invariants to eliminate redundant code and checks, some progress is still to be expected. For example (see appendix A) the relatively common case of non-null pointers is inferred in compilers today for parameters that pass by reference.

If we can move the pre-condition checks to the caller's side of the function call, then they become amenable to additional optimizations, from simple instruction reordering to dead code elimination and beyond.

A further benefit of exposing contract details directly in the language is that they become accessible to a wider variety of tools than just the compiler. Such tools could help the user find errors in their code, provide hints in IDEs and text editors, help direct test coverage tools, and more.

Proposed Solution

The solution formally proposed here is to add a simple syntactic extension to support declaring pre-conditions in function contracts, and specifically to not extend that syntax to support post-conditions and invariants. If a contract violation is detected at runtime, a user-installable violation handler will be called, using the facility already documented in N4075.

Straw Man Syntax

For the sake of the design discussion, I propose the simplest syntax that I can imagine, but expect both simpler and more details variants will emerge if this direction is deemed worth pursuing. For now, I suggest extending the requires clauses from the Concepts Lite TS to allow additional non-constexpr expressions that depend on function parameters, including the implicit this pointer.


template <class T, class Alloc>
class vector {
   // ...

   bool empty();

   T& front() requires !empty();
};

Multiple Build Modes

One of the key assumptions of this feature is that there will be multiple build modes in the compiler. In the checked build mode, then all preconditions are checked on each function invocation, and if any check failed, then the user supplied (or default) contract violation handler is called. In an unchecked build mode, then violating a checked precondition is core undefined behavior. It is expected that this will lead to optimizing away the checks, in addition to allowing additional inferences to be drawn for further optimization.

The Core Working Group have long been resistant to formalizing build modes in the standard and for good reason. A proliferation of build modes yields a much more complex language, as we actually have multiple languages according to all valid combinations of build modes. However, I believe the existing assert facility has already challenged that notion through the preprocessor changing the user's code underneath them. This proposal merely attempts to make those existing build modes accessible to the compiler, so that they can be reasoned about more effectively.

Checking From The Caller

The intent is to move the pre-condition check out of the function body where it would be today as an assert, and evaluate before the function from the user's calling context. Hence, if an error is detected, the file and line number supplied to the violation handler will be those of the user's code, and not the library, greatly simplifying bug tracking.

The author imagines at least two techniques that could be used to accomplish this. The first is to inline the precondition checks directly into each call site, much like expanding an inline function. This is expected to yield the best opportunity to the optimizer to efficiently handle the code, at the expense of potential code bloat. The second technique would be to add a second entry point to each function having a precondition and having the caller pick the entry point appropriate to their build mode. Note that __FILE__, __func__ and __LINE__ need to be captured before calling the pre-amble in this case.

Interaction With `noexcept`

One of the motivating factors to produce this paper now is the ongoing discussion in the Library Working Groups about appropriate use of noexcept in the standard library specification. My hope is that this paper gives most folks, most of the features they want, while not granting absolutely everything to any one direction.

The heart of the issue is that noexcept is actually a post-condition contract built into the language. The use of noexcept in a library specification constrains how a library implementer may respond to a contract violation, as they cannot (easily) go outside the core language to deliver their effect. This leads to a natural tension between those who value marking up their contracts for the benefit of the compiler/tools/readers, and those who value their ability to respond to contract violations in the manner most helpful to their own users. The current status quo is that use of noexcept on narrow contracts in the library is a quality of implementation feature, and not specified by the library standard.

Once we add preconditions to function declarations though, move a large part of detecting are reporting contract violations to the caller's side of the equation, where they are visible to the compiler. This still begs the question: How do precondition clauses interact with noexcept. There are three common suggestions.

Preconditions are always `noexcept(false)`

In this variation, the existing of any precondition clauses signifies that there are some build modes where a precondition violation will be detected and handled by a user supplied callback, that may in turn throw exceptions. To guarantee predictable behavior in all build modes, any call to such a function should always be treated as-if marked with noexcept(false), even if the function itself is decorated with noexcept(true).

Preconditions are ignored

In this variation, the question considers where the precondition checks are inserted in the expression being queried by the noexcept operator. For example:


template <class T, class Alloc>
class vector {
   // ...

   bool empty();

   T& front() noexcept
      requires !empty();
};

vector<int> v;
constexpr bool b = noexcept(v.front());

In this example, the noexcept expression evaluates whether the call to v.front() can throw exceptions. We know that the definition of this function does not throw, but where are the additional checks for the contract inserted? If they are inserted before the noexcept check, then they have no bearing on the result in any build mode, and so can be completely ignored.

Precondition checks depend on build mode

In the third variation, we ask the same question as in the second - where do the precondition checks go in relation to the noexcept operator, and pick the other response. In this view, the precondition checks do affect whether or not exceptions can be thrown, but are an artifact of the build mode. This allows a release build to realize library optimizations in addition to compiler optimizations at the risk of taking different code paths than are validated in the instrumented test environment.

Let The WookieUser Win

The recommendation of the author is that trying to pick between these three options is wasted effort. Different software development efforts will want to pick different answers based on their own prevailing design philosophy. There are different groups where each of the three answers may be preferable, and groups that may even want different answers in different parts of their project. By putting this behavior under another compiler switch, the library owner does not need to pick a single answer as the choice is made by the end user building their own software.

Further Issues

There remain a number of issues not yet addressed by this paper that will need more effort if this paper is to go forward, with expert input from others more informed on the specific topics.

One Definition Rule

The most obvious problem with any conditional code is violation of the ODR, the One Definition Rule - see ISO 14882 section 3.5 [basic.link]. This is true of code using the assert macro today and would be true for any code relying on a new precondition language feature.

The simplest way to manage the problem, as today, is for users to guarantee that all of their code is compiled and linked in the same build mode, with the same compiler flags and predefined macros. This is not always practical though, and as ODR violations are not required to be diagnosed, the typical result is that code will build and link fine, but some translation units might be missing expected checks, or have additional checks inserted, depending on which definition is selected by the linker.

The problem is more serious for inline function and function templates, as use of macros will clearly violate the token-for-token identical requirement for multiple definitions. The language feature described about would satisfy the token-for-token requirement, but may reintroduce the problem that the token-for-token requirement was intended to solve. This topic needs discussion with experts from the core working group to see if there is a way to define this notion of 'benign' ODR violation in the special cases of concern to contract validation.

Declaration vs. Definition

Is a precondition a part of the function type? Must it be repeated at both declaration and definition, or even at every declaration?

This topic needs a little more research before nailing down an answer. Provisionally, requiring consistency everywhere seems the simplest choice, and easiest for users to understand as well. However, I suspect we probably don't want to mangle this into the type system, leaving a result similar to the old-style exception specifications, which do not appear a successful precedent to follow.

I believe that the key difference with the old exception specifications comes down to how the feature is expected to be used. The exception specifications were a post- condition check that pessimized code for the sake of catching library misuse, and could not be disabled in a conforming build. The proposed pre-condition checks are finding problems in the caller's code, which is of much more interest to the library consumer, and can optionally be disabled.

Encoding the precondition expressions into the function type would solve link-time issues with the ODR, but mangling arbitrary expressions may cost more in terms of complexity and symbol name than the feature buys us.

The other problem in this space is function pointers and function references. With the old exception specifications, the specification applied to a specific function pointer object, rather than the type of that object, and restricted functions that could be assigned to that function pointer/reference as having compatible exception specifications. Determining arbitrary compatibility for functions with preconditions could be a complex problem (although perhaps no worse than exception inference) and again, might not be information that is deemed semantically meaningful in this context. It may be that the best design really is to ignore the preconditions when dealing with function pointers and references, or to bind the such pointer/references to a call point that performs the precondition check before evaluating the function. (I.e., to adopt the second suggested implementation strategy, at least when binding to function pointers or references.)

Querying for Contracts

Much like the noexcept operator for exception specifications, there have been suggestions that there should be an operator to query if a function has a wide or narrow contract, and in some cases to query the details of the compiler-visible contract. The author does not believe this feature would be helpful, but it would significantly increase the complexity of the proposal. There is no suggestion (at this time) to add such a feature.

Cost of Evaluation

one of the important features of N4075 was a recognition that not all checks are equally desirable in all build modes, and some users can afford a greater level of checking in some build modes, such as when trying to track down a particularly difficult to find bug. This initial proposal does not include a feature to vary which set of preconditions are evaluated in different build modes, other than a binary on/off. It would be simply enough to use the macro techniques suggested in that paper to implement a similar costing mechanism using the proposed new facility, but one of the benefits of this approach is to move away from conditional compilation through macros, and directly express the ideas to the compiler. One idea might be to mark up precondition expressions with attributes reflecting cost, another might be a richer syntax that simply hijacking requires as in the straw-man syntax example presented here. This remains a topic for future discussion.

Simple and Elegant Syntax

The straw man syntax is deliberately simple, using an existing reserved keyword in a manner similar to how it is already planned to be exploited in the Concepts Lite TS. One concern with the straw-man is that whether or not an expression is a precondition or a concepts comes down to whether the evaluation is constexpr or not, which is a very subtle, yet significant, change of meaning. This may turn out to be less of a concern than is feared, much like the terse notation of the Concepts Lite TS appears to overload plain functions with types, and function templates with concepts.

Another feature that might be desired is a simpler syntax for expressing simple constraints on a single function parameter, that might be expressed directly on that parameter in the function parameter list. A common example would be the not null constraint on pointer-like arguments.

Not All Preconditions Are Amenable To Checking

There remain categories of preconditions that must still be handled by documentation, and may yield better to traditional assert-like treatment in their function definitions.

One classic problem that does not yield to either approach is the notion of an invalid object. Good examples are invalidated iterators (passed by reference) or stale pointers. There is nothing a library author can do to easily validate the state of such objects, so the contractual preconditions are typically not verified.

A second category of precondition is those that leak implementation details. Declaring preconditions exposes a certain amount of information in the header where a function signature is declared. It may be that the library author wishes to avoid introducing a dependency on an expensive header, or wishes to preserve their freedom to change an implementation at a later date without changing the contract. In such cases, they would use traditional assert like techniques in the function implementation. Note that this was one of the problems with the early experience of concepts for C++0x. Constrained function templates typically leaked all of their implementation details through concepts describing all of the expressions that must be valid when a template is instantiated. That is less of a problem for this proposal, as the library author chooses with parts of their contract to expose to the compiler, rather than being forced to expose the whole implementation.

Summary

Contracts form the basic interaction between a library and their users. As such, it is valuable to be able to annotate functions in a way that can expose contract information to the compiler and other tools. However, the final arbiter of what contract checking is performed, if any, at runtime and how that interacts with other contracts expressed in the language should be ultimately decided by the end user assembling the final software product, who can pick a set of options that conform with their particular development philosophy. There is no single approach this rings true for all.

Appendix A: Existing Support for Contracts in C++14

There are a number of features in the language that already convey contractual information from the library to the compiler, to help the user find errors early in their development.

`const` Correctness

The most obvious example is const correctness. This provides a compile- time check at the call site that a non-modifiable object is not passed into a function that is declared to take a pointer or reference that may modify the referenced argument. This also comes with some post-condition (compile-time) checking, as any attempt to modify an object from a const function member will be caught as an error.

Non-null Pointers

One of the most commonly encountered narrow contracts is that a pointer not be null. This case is supported directly in the C++ language (but not C) through the use of references, which are not allowed to bind to null pointers. A function written with references in its interface clearly advertises that it must be passed a valid object, and the compiler can optimize based on the knowledge that the hidden pointer cannot be null.

`unsigned` Integer Types

A function that cannot handle negative values can indicate this by taking its arguments as unsigned parameters. Often this is merely simplifying a contract to have a single upper bound, rather than both upper and lower bounds, but may be sufficient to widen a contract completely.

Note that there is still a strong school of thought that it is easier to detect misuse of passing a negative value through an asserted precondition than specifying a single large upper bound. The unsigned quantity is mostly a valuable hint for tools examining functions for inferred contracts.

Ownership Patterns And RAII

One of the key contracts built into the language is that for any successfully constructed object, unless it has dynamic storage duration, it destructor is guaranteed to be called. This is commonly exploited in library design. A common idiom used in library contracts is the notion of a smart-pointer, where the class confers additional ownership semantics on the owned pointer, allowing ownership of dynamic objects to be easily tracked across function boundaries by users and tools alike.

Pure Virtual Functions

Declaring a pure virtual function guarantees that objects of a functionally incomplete base class cannot be accidentally created. This catches a category of user error at compile time, early in the development cycle.

`override` Specifier In C++11

The override specifier in C++11 allows a library user to inform the compiler that they believe they are implementing a contract on a supplied library by correctly overriding a specific virtual function. This allows the compiler to check and flag an error to the user, granting greater confidence in the client code, and catching unexpected changes to the underlying library in a fast-evolving code base.

`static_assert`

The static_assert language feature was a direct result of some of the early work on contracts for C++11. This declares to the compiler a condition that must always be true. It can be used to track problems building on different architectures (e.g., size or alignment of a type is not as expected) or when instantiating templates with user-supplied types.

Concepts

The pending Concepts Lite TS provides a rich feature that is all about defining contracts in the type system for generic code and templates.

`noexcept`

The noexcept exception specification is a post-condition contract that the compiler can query through the noexcept operator. By understanding this important post-condition, generic code can be tuned to take an optimal choice of library implementation. In addition, the compiler can perform some mechanical optimizations when it sees no expression inside a block can throw.

`[[noreturn]]`

The [[noreturn]] attribute is another post-condition contract that the compiler can use to perform certain local optimizations, but mostly to better inform tools when a code-path will not return without requiring it to solve the halting problem. This leads to better diagnostics, helping the user find problems earlier, or at least have confidence that the tools are not reporting false-positives on impossible conditions.

Acknowledgements

The author would like to thank John Lakos for explaining the importance of contracts to libraries, and in particular for coining the terms for narrow and wide contracts that simplify an important aspect of discussions in this area. Thanks also to Chandler Carruth who helped clarify why a library contract violation is a distinct category from general undefined behavior. Additional thanks to Thorsten Ottosen and Lawrence Crowl for their pioneering work bringing this topic to the attention of the committee almost a decade ago! Finally, thanks to Kevlin Henney for showing me that proof-checking detailed contracts in the language, while great in principle, rarely works as well in practice. While they are all sources of great inspiration, this paper in no way attempts to characterize their own opinions on the subject, and should not be read as an implicit endorsement on their part!

N4248 Library Preconditions are a Language Feature

Table of Contents