P3565R0: Virtual floating-point values

Audience: SG6
S. Davis Herring <herring@lanl.gov>
Los Alamos National Laboratory
January 10, 2025

Introduction

The standard says very little about the actual results of floating-point evaluations. [basic.fundamental]/12 says the “accuracy of operations” is implementation-defined; [expr.pre]/6 makes the situation even less clear by suggesting that floating-point operands and results are somehow not even values of their types. Indeed, it is of practical value that implementations often interpret expressions involving floating-point types as mathematical expressions in order to improve the performance and accuracy of the computation of its overall result. Common techniques include fusing multiplications and additions, discarding canceling terms, and temporarily using extra precision (e.g., using x87 registers). Strict application of well-defined floating-point operations is of course critical to other numerical algorithms; the footnote in /6 suggests that “The cast and assignment operators” may be used for the purpose.

These ideas are derived from C, which additionally defines the FLT_EVAL_METHOD macro to describe the implementation’s choices about such transformations. Matthias Kretz presented Floating-Point Excess Precision to SG6 and EWG seeking guidance on how to most consistently interpret these ideas in the context of C++’s stronger type system, constant evaluation, and the larger set of contemporary floating-point types. No clear direction has yet been reached, suggesting that further research may be needed.

Discussion

The idea that an operator result of a type does not have one of the values of that type is obviously problematic from the perspective of defining semantics for such an operator. Moreover, the idea that assigning a variable forces extended precision to be discarded is problematic in C++ because of the routine use of wrapper class types in mathematical expressions. The creation of every such object involves the initialization of its member variable, which seems to be just as strong as assignment in terms of incompatibility with extralinguistic extended precision.

An alternative approach is to extend the set of values for a floating-point type beyond those that can even theoretically be stored in the memory that an object of that type occupies. The result of the subexpression in a * b + c (all doubles) might then have a value outside the set of values that can be stored in a double’s space in memory (typically the binary64 set); the choice of that value conveys the additional information needed to obtain the correctly rounded result of the overall expression as of course implemented by an FMA instruction. Similar careful choices of value from a larger set might capture the bits (or finiteness) lost in a + b - b; for x87 implementations, the larger set is simply the values supported by the register format.

The crucial specification technology is the same as used for pointer provenance: the values are put in a many-to-one correspondence with the value representations of the type. (The presence of multiple rounding modes might require the formal duplication of values based on the representable value to which they round, but this matters only if the value representation is examined.) Note that every operation and object still has a single value. Aside from merely being tractable semantics, the stability of values prevents unfortunate practical results like taking both branches in

const double x = /* ... */, y = x + epsilon / 4;
if(x < y) {     // X87 comparison
  // ...
}
// further operations that cause spilling...
if(x == y) {    // binary64 comparison
  // ...
}

or failing the assert in

float id(float f) {return f;}  // no computations
void call() {
  const float x = /* ... */;
  assert(3 * x == 3 * id(x));
}

Note the implication that if id is not inlined x must be given a representable value for consistency; passing +x would avoid that coupling but might fail the assert.

For obvious practical reasons, a value that escapes past an optimization frontier cannot actually store information beyond its bit pattern. The stability requirement implies that any such value must be normalized to its “memory type” upon computation. However, even the member variables of wrapper objects can have an extended value if the optimizer sees their entire lifetime (as is typical for temporaries in the sorts of expressions we want to optimize freely), because they truly are members of the type. Similarly, assignment does not need the normalization effect described in the [expr.pre]/6 footnote; even a value updated in a loop may be extended so long as its intermediate values do not escape. Values passed to and returned from standard-library mathematical functions can also be extended.

As there are no opaque functions (merely insufficiently clever optimizers), it is only prudent to retain an explicit means of requiring normalization; static_cast is the obvious candidate (the other part of the footnote), although std::memcpy would also have the effect of selecting the canonical value associated with a value representation. (For pointers, std::memcpy needs to be able to preserve the abstract-machine information to prevent undefined behavior, but here it would be unnecessary and difficult to specify since it does affect observable behavior.)

Proposal

Modify the definition of trivially-copyable types to allow floating-point types to have multiple values (one of which is canonical) per value representation. Specify that acquiring a value representation gives a floating-point object the corresponding canonical value.

Replace the “greater precision and range” provision ([expr.pre]/6) with a note about the contextual dependence of rounding. Specify that unary + can, and that static_cast does, round floating-point values to their corresponding canonical values.