Reproducible floating-point results

1. Abstract

The C++ standard, hereafter referred to as [C++], has very little to say about floating-point specification, which means that it is extremely hard to write floating-point expressions that, given the same inputs and rounding mode, will produce the same results on all implementations. This paper proposes the introduction of a new type which specifies sufficient conformance with the provisions specified in ISO/IEC 60559:2020 (which is itself imported from IEEE Std 754-2019, unmodified except for editorial adjustment), hereafter referred to as [559], to guarantee this reproducibility.

2. Revision history

R0 2024-09-09

Presented to the BSI C++ panel on 2024-09-30

R1 2024-10-08

  • Extended the contributor list. Many thanks, all of you.
  • Clarification of the relationship between IEE-754:2019 and ISO/IEC 60559:2020.
  • Added revision list.
  • Expanded motivation, clarifying non-associativity and varying error requirements.
  • Added banking use case.
  • Clarification of the need for bitwise identical results in simulations for games.
  • Added geometry use case.
  • Highlighted para 20 of Annex F.3 of the C standard.
  • Outlined non-goals.
  • Additional section on the provision of correctly rounded mathematical functions.
  • Added bibliography to the references for background reading.

3. Motivation

From [559] Clause 11:

Programs that can be reliably translated into an explicit or implicit sequence of reproducible operations on reproducible formats produce reproducible results. That is, the same numerical and reproducible status flag results are produced.

Reproducible results require cooperation from language standards, language processors and users. A language standard should support reproducible programming.

C++ does not support reproducible programming. In fact, the standard does not specify floating point behaviour at all. In [C++], [basic.fundamental] paragraph 12:

Except as specified in 6.8.3, the object and value representations and accuracy of operations of floating-point types are implementation-defined.

Even in 6.8.3 [basic.extended.fp], we read:

If the implementation supports an extended floating-point type [basic.fundamental] whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary16, then the typedef-name std​::​float16_t is defined in the header <stdfloat> and names such a type, the macro __STDCPP_FLOAT16_T__ is defined [cpp.predefined], and the floating-point literal suffixes f16 and F16 are supported [lex.fcon].

The subsequent paragraphs continue with the other binary interchange formats specified in Clause 3 of [559]. All that is specified is that the interchange format is supported. Behaviour is not specified and thus remains implementation defined.

C++ programmers might expect that any code they write shall behave identically on any platform, in much the same way that 2 + 2 will always equal 4. However, the very broad latitude granted here means that this is not necessarily the case, and that floating-point calculations are not necessarily reproducible on all platforms.

Worse than that, even the same function within the same executable compiled with the same compiler flags but called in different contexts can produce subtly different results due to different optimisations after inlining and due to link time optimisation. Even moving a function out of a header file to a source file is not guaranteed to help matters.

Of course, it bears clarifying that floating point numbers can only represent a small subset of their range. The results of operations are correct within specific rounding. How programs deal with this error is up to the programmer, and different domains have different error requirements. Additionally, floating point arithmetic is not associative: a + (b + c) is not necessarily equal to (a + b) + c. Since compilers may reorder instructions for optimisation purposes, this can have an unwanted effect on accuracy. Several domains would benefit from floating-point reproducibility.

Banking

The global banking sector is worth over $6 trillion [BAN]. Banks have very strict regulatory requirements relating to pricing and risk models. It is very common for software systems in banks to be deployed to several different platforms, including Windows and Linux, but also on different hardware platforms where the same models are used on AMD and Intel CPUs, but also on GPUs.

The fact that arithmetic on different platforms produces different results is an enormous challenge. When the business department signs off a specific set of figures generated on a specific platform, then any different behaviour on a different platform invalidates those sign-offs. This results in a huge challenge for many financial systems and having a possibility to get reproducible numbers across platforms would be invaluable in such contexts.

Game development

The video game industry is worth over $340bn worldwide [VGI]. Games typically provide a simulation which players interact with, sometimes from different machines over the internet. As each player interacts with the simulation, those interactions must be reflected on all machines so that the players continue to interact with the same simulation. This can only be achieved with bitwise identical results, otherwise the effects of errors propagate chaotically through these iterative function systems.

Games can be deployed to multiple platforms, typically to mobile phones, consoles, desktop and laptop computers. These platforms all enjoy different tool chains. Since simulations are specified using floating-point arithmetic, this prevents players on different platforms playing together unless the simulation is extremely basic, and the entire simulation can be transmitted trivially across the internet to each player. The divergence in floating-point results across implementations means that the simulation cannot remain synchronised across different platforms.

Reproducibility eliminates this divergence, opening up the possibility of all players being able to compete with one another regardless of their hardware platform.

Scientific computing

Since behaviour is not specified, implementations are free to fuse and reorder instructions to benefit execution speed. Sometimes this can have catastrophic results. From [GOL] 1.4:

Catastrophic cancellation occurs when the operands are subject to rounding errors. For example, in the quadratic formula, the expression b2 – 4ac occurs. The quantities b2 and 4ac are subject to rounding errors since they are the results of floating-point multiplications. Suppose they are rounded to the nearest floating-point number and so are accurate to within 1/2 ulp. When they are subtracted, cancellation can cause many of the accurate digits to disappear, leaving behind mainly digits contaminated by rounding error. Hence the difference might have an error of many ulps.

The paper then goes on to describe some strategies for mitigating this problem, involving rearrangement of operations. However, since there is no guarantee in the standard that the order of evaluation will be honoured, such rearrangement may be futile and catastrophic cancellation cannot be avoided.

Reproducibility requires the order of evaluation to be specified unambiguously. As a result, catastrophic cancellation can be avoided.

Geometry

Relevant to game development but also to the wider mathematical community, a typical problem in geometry is containment: does a point lie inside or outside of a polygon. One algorithm is to find the centroid of the polygon, find the nearest edge to the point, and find the line that connects the centroid, the point and the edge. If the distance from the point to the centroid exceeds the distance of the edge from the centroid, then it is outside of the polygon.

If the difference between these two values is very small, then rounding, fusion or reordering can yield different truth values for whether or not the polygon contains the point. This can be a very bad bug.

4. Status quo

The obvious question is β€œwhy are floating-point operations not reproducible on different implementations?”

Consider the following example available on [CEX]:

int main() {
	float line_0x(0.f);
	float line_0y(7.f);

	float normal_x(0.57f);
	float normal_y(0.8f);

	float px(10.f);
	float py(0.f);

	float v2p_x = px - line_0x;
	float v2p_y = py - line_0y;

	float distance = v2p_x * normal_x + v2p_y * normal_y;

	float direction_x = normal_x * distance;
	float direction_y = normal_y * distance;

	float proj_vector_x = px - direction_x;
	float proj_vector_y = py - direction_y;

	std::cout << "distance:      " << distance << std::endl;
	std::cout << "proj_vector_y: " << proj_vector_y << std::endl;
}

When built with version 24.5 of the Nvidia C++ compiler, without specifying any compiler options, the output is:

distance:  	   0.0999997
proj_vector_y: -0.0799998

When built with version 18.1.0 of the clang C++ compiler, without specifying any compiler options, the output is:

distance:  	   0.0999999
proj_vector_y: -0.0799999

Worse, if -march=skylake is passed to the clang C++ compiler, the output is:

distance:  	   0.1
proj_vector_y: -0.08

Analysing the memory of each running program reveals that, indeed, the bit patterns vary. Application of optimisations leads to variations between implementations in the nature, number and order of floating-point operations being executed during the program. This leads to different numbers of rounding operations, in different orders, and thus different values on different platforms.

5. Prior art

Before considering an approach, we should consider the approaches taken by the other languages standardised through ISO/IEC.

ISO/IEC 8652:2023 Ada

The Ada standard does not normatively reference any revision of ISO/IEC 60559. It defines its own floating-point format and operations. However, there is an expectation that some future version of Ada will indeed support a revision of ISO/IEC 60559.

ISO/IEC 13211-1:1995/Cor 3:2017 Prolog

The Prolog standard makes no reference to any revision of ISO/IEC 60559 whatsoever.

ISO/IEC 1989:2023 COBOL

The COBOL standard specifies language support for the facilities defined by ISO/IEC 60559:2011 in several paragraphs throughout sections such as 8.8.1 Arithmetic expressions, 8.8.4 Conditional expressions, 14.6.8 Alignment and transfer of data into data items and in Annex D.17 Forms of arithmetic. Typically, the specification is similar to the form:

All additions, subtractions, multiplications and divisions performed in the development of the result shall be performed in accordance with the corresponding rules in ISO/IEC/IEEE 60559.

Particularly, the specification directs implementers to follow ISO/IEC 60559:2011. There is no option to avoid supporting these facilities; all arithmetic support is specified with reference to ISO/IEC 60559:2011.

ISO/IEC 1539-1:2023 Fortran

The FORTRAN standard specifies, in Clause 17, language support for the facilities defined by [559]. From the first paragraph of that clause:

The intrinsic modules IEEE_EXCEPTIONS, IEEE_ARITHMETIC, and IEEE_FEATURES provide support for the facilities defined by ISO/IEC 60559:2020. Whether the modules are provided is processor dependent. If the module IEEE_FEATURES is provided, which of the named constants defined in this document are included is processor dependent.

The decision to support [559] facilities is not up to the implementer. Their only constraint is whether the processor supports [559] facilities. Again, all operations are specified with reference to [559].

ISO/IEC 9899:2018 C

In its annexes F though H, the C standard specifies language support for the facilities defined by [559]. From Annex F, Introduction, para 1:

This annex specifies C language support for the IEC 60559 floating-point standard. The IEC 60559 floating-point standard is specifically Floating-point arithmetic (ISO/IEC 60559:2020), also designated as IEEE Standard for Floating-Point Arithmetic (IEEE 754–2019). IEC 60559 generally refers to the floating-point standard, as in IEC 60559 operation, IEC 60559 format, etc.

It does not mandate support for floating-point arithmetic as defined by [559], and implementations are free to ignore that standard when implementing floating-point arithmetic. The annexes explicitly describe how an implementation SHOULD support [559] floating point arithmetic, not how an implementation SHALL support it. Even then, at Annex F.3, para 20:

The C functions in the following table correspond to mathematical operations recommended by IEC 60559. However, correct rounding, which IEC 60559 specifies for its operations, is not required for the C functions in the table. 7.33.8 (potentially) reserves cr_ prefixed names for functions fully matching the IEC 60559 mathematical operations. In the table, the C functions are represented by the function name without a type suffix.

Clearly, correct rounding is a challenging and expensive feature to support.

ISO/IEC DIS 14882:2023 C++

[C++] barely references [559]. In fact, only when defining the optional extended floating point types in section 6.8.3 [basic.extended.fp] is there any meaningful reference. The type trait std::is_iec559 is true for those types, and signals that those types offer some representations from that specification.

6. Possible approaches

There are several ways in which reproducible floating-point operations could be achieved. Now is the time to point out that the intent of this paper is not to specify full conformance with [559]. That would be a substantial undertaking, although this author is interested in further work in that direction, particularly in consideration of [P3397]. Nor does this paper make any proposals relating to parallel reproducibility. The purpose of this paper is to introduce a solution to the problem of reproducibility for some use cases.

Also, any solution which prioritises reproducibility does so at the cost of performance. The free hand given to implementers regarding operations on floating point numbers opens up opportunities for optimisation that would be unavailable if reproducibility were required. Therefore, the choice of reproducibility must be made by the programmer, mindful of the performance penalty this introduces.

Compiler options

One approach is to look at what options compilers already offer for reproducibility. Surveying clang, GCC and MSVC, there is no explicit support for reproducibility. However, they all offer opportunities for controlling the way that floating-point maths arithmetic is implemented.

For example, MSVC offers a number of command line flags prefixed /fp. Of note are two options, /fp:fast and /fp:strict. /fp:fast allows the compiler to reorder, combine or simplify floating-point operations to optimise floating-point code for speed and space. /fp:strict preserves the source ordering and rounding properties of floating-point code when it generates object code, and observes [559] specifications when handling special values. Similarly, GCC offers -fno-fast-math as a command line option, which offers similar behaviour to MSVC’s /fp:strict option, although it is not sufficient on its own to get strict floating-point behaviour.

However, this relies on details of the implementation and is not future-proof. Additionally, optimisation flags can interfere with these floating-point arithmetic choices. This is a deeply unergonomic solution that is vulnerable to future decisions of the implementers.

Fences

Another approach might be to signal to the compiler that ordering and rounding properties be untouched in selected places by using a new kind of fence. This would be specified to prevent the compiler from reordering, combining or simplifying floating-point operations. It would ensure that the same number of rounding operations took place regardless of implementation, the same rounding modes were used regardless of implementation, and that the operations took place in the same order, regardless of implementation. Writing a larger floating-point expression as a series of single expression statements, interleaved with this new kind of fence, would detail to the compiler exactly what was wanted.

Again, this is a rather unergonomic solution. Additionally, given that the accuracy of operations of floating-point types are implementation-defined, this would require that latitude to be tightened and for floating-point behaviour to be fully specified. If that specification is to take place, then specifying reproducibility as defined in [559] would be the least astonishing thing to do.

Specification overhaul

The next approach to consider is following the example of prior art. Looking at COBOL and Fortran, their standards dictate that all floating-point arithmetic conforms to a revision of ISO/IEC 60559. Following this pattern, the C++ standard would need to be respecified to ensure that the object and value representations and accuracy of operations of the standard and extended optional floating-point types conform fully to a revision of ISO/IEC 60559.

While this solution offers much better ergonomics for the programmer, it would degrade performance of existing code, requiring programmers to interact more carefully with the command line options of their compilers.

#pragma options

[C++] does not require compilers to support any pragmas, but many implementations support the ISO C standard pragmas used to control floating-point behaviour:

#pragma STDC FENV_ACCESS
#pragma STDC FP_CONTRACT
#pragma STDC CX_LIMITED_RANGE

These could be introduced into the language, along with an additional pragma, for example:

#pragma STDC FP_REPRODUCIBLE

While this seems like a simple solution, it would still require re-specification of existing types, which takes us very close to the specification overhaul described above. Also, it would require that any function called from within the block covered by the pragma would also be modified with such a pragma.

Function qualification

Just as class member functions can be qualified as const, a new function qualifier could be introduced which indicates that all floating-point operations within the scope of the function should be reproducible. This has the advantage of being a transitive property, like const, and users would be able to call other functions that are similarly qualified and retain reproducibility.

Unfortunately, we are in the same position with pragmas: this approach would still require re-specification of existing types.

A new core type

We have std::int_fast32_t, std::int_least32_t and the related types of differing width specifications. We also have the optional extended floating-point types std::float32_t and so on. We could take the same approach with floating-point types.

A new type could be introduced that supports all the specifications of Clause 11 of [559], particularly reproducible operations, format, status flags, and an implicit reproducible attribute, with a name like std::float_strict32_t. The specification of this type would prohibit reordering, fusion and variation in rounding. All rounding should be roundTiesToEven, as specified in Clause 4 of [559].

The requirement for reproducibility would mean that it would not be feasible to specify implementations of the contents of the <cmath> header. Users would be expected to provide all operations beyond addition, subtraction, multiplication and division.

This solution contains all the constraints of the problem to a single type, rather than requiring annotation distributed through code. However, introducing a new type to the core language is a considerable undertaking.

A new library type

There is a proposal in flight which partially solves this problem, [P2746R5] Deprecate and Replace Fenv Rounding Modes. From its abstract:

We argue that floating point rounding modes as specified by fesetround() are largely unusable, at least in C++. Furthermore, this implicit argument to floating point operations, which is almost never used, and largely opaque to the compiler, causes both language design and compilation problems. We thus make the somewhat drastic proposal to deprecate it, in spite of its long history, and to replace it with a much better-behaved facility.

In summary, that proposal proposes the elimination of functions fesetround and fegetround and replacing them with a struct which offers a very small set of the [559]-specified operations along with a rounding mode provided at construction. The value is not stored in the struct; rather, the member functions which are binary operations take two operands and apply the operation with the stored rounding mode. The struct effectively defines the rounding attribute as an object with a lifetime rather than as a #pragma qualifying a section of source code.

By additionally specifying that the operations cannot be fused or reordered, rounding mode would be respected, the correct number of roundings would be applied in the same order on all implementations, and all operations would be reproducible. This carries the penalty that a function call may be incurred for every operation, rather than some function calls being optimised away via FMA, for example. However, just as there is a trade-off between accuracy and speed, so is there a trade-off between reproducibility and speed. As described earlier, compiler flags specify optimisations that improve floating-point performance at the cost of accuracy. None of these optimisations would necessarily be available to this type.

Alternatively, a type can be proposed which is a companion type to the type proposed in [P2746R5]. At Six Impossible Things Before Breakfast, we have implemented several classes that we use to emulate reproducible floating-point operations. In the next section we shall look at the design of such a type.

7. API design

[559] conformance

From [559] Clause 11:

Reproducible floating-point numerical and status flag results are possible for reproducible operations, with reproducible attributes, operating on reproducible formats, defined for each language as follows:

  • A reproducible operation is one of the operations described in Clause 5 or is a supported operation from 9.2, 9.3, 9.5, or 9.6.
  • A reproducible attribute is an attribute that is required by a language standard in all implementations (see Clause 4).
  • A reproducible status flag is one raised by the invalid operation, division by zero, or overflow exceptions (see 7.2, 7.3, and 7.4).
  • A reproducible format is an arithmetic format that is also an interchange format (see Clause 3).

Programs that can be reliably translated into an explicit or implicit sequence of reproducible operations on reproducible formats produce reproducible results. That is, the same numerical and reproducible status flag results are produced.

Reproducible results require cooperation from language standards, language processors, and users. A language standard should support reproducible programming. Any conforming language standard supporting reproducible programming shall:

  • Support the reproducible-results attribute.
  • Support a reproducible format by providing all the reproducible operations for that format.
  • Provide means to explicitly or implicitly specify any sequence of reproducible operations on reproducible formats supported by that language.

and shall explicitly define:

  • Which language element corresponds to which supported reproducible format.
  • How to specify in the language each reproducible operation on each supported reproducible format.
  • One or more unambiguous expression evaluation rules that shall be available for user selection on all conforming implementations of that language standard, without deferring any aspect to implementations. If a language standard permits more than one interpretation of a sequence of operations from this standard it shall provide a means of specifying an unambiguous evaluation of that sequence (such as by prescriptive parentheses).
  • A reproducible-results attribute, as described in 4.1, with values to indicate when reproducible results are required or reproducible results are not required. Language standards define the default value. When the user selects reproducible results required:
    1. Execution behavior shall preserve the literal meaning (see 10.4) of the source code.
    2. Conversions to and from external decimal character sequence shall not limit the maximum supported precision H (see 5.12.2).
    3. Language processors shall indicate where reproducibility of operations that can affect the results of floating-point operations can not be guaranteed.
    4. Only default exception handling (see Clause 7) shall be used.
    If a language supports separately compiled routines (e.g., library routines for common functions) there must be some mechanism to ensure reproducible behavior.

To encapsulate enough of these requirements into a single type to achieve reproducibility, we need to decide which set of operations to support, which attributes to support and how, how to support the status flags and which interchange formats to support.

Operations

[559] Clause 5 lists a large number of operations over several sections. Many of these operations have analogues in [C++] within the <cmath> header, for example rounding and succession (5.3.1), some of the arithmetic operations (5.4.1), trigonometric functions (9.2) and so on.

[C++] does not specify the algorithms or rounding used in the calculation of these functions. It will not be feasible to expect reproducibility across implementations when the functions in <cmath> are invoked.

Therefore, we propose the set of operations to support should be limited to the fundamental operations of add, subtract, multiply and divide. All other functions should be implemented by the user using these operations.

We should also consider how to specify the addition of different reproducible types, since this may require conversion, which must be achieved reproducibly.

Attributes

The relevant attributes to support are rounding-direction and reproducibility. The rounding-direction attribute is discussed by [P2746R5]; it is bound up as a member of the type specified therein rather than explicitly specified for a block of code. Similarly, this proposal specifies the reproducibility attribute implicitly as part of this type.

However, the rounding attribute must be specified in some way, either dynamically as a data member of the type or statically as part of the type via a template parameter. There are four rounding attributes specified in [559] Clause 4 for a binary format implementation.

One attractive feature of the type would be for it to be a drop-in replacement for regular floating-point types. To do this, it would have to be the same size as those types. Specifying attributes dynamically would increase the size of the type beyond the size of the representation of the value being contained, so these should either be enumerated and provided as a template parameter, or, like the reproducibility parameter, be implicitly part of the type, thus requiring four separate types, one for each rounding mode.

If operations are requested on reproducible types with different rounding modes, which one should be applied? Clearly, the operand being modified should have its rounding mode respected.

Status flags

Status flags will be raised by the operations, so these need to be specified with them. At the moment, there is no specification in [C++] for responding to floating-point operations that behave unexpectedly; [basic.fundamental] paragraph 12 disposes of this by leaving everything implementation defined.

However, in <cfenv> there is provision for setting exception flags. The macros FE_DIVBYZERO, FE_INVALID and FE_OVERFLOW cover the required status flags. Although not specified by Clause 11 of [559], FE_INEXACT could also be specified for signalling.

Formats

Clause 3 of [559] defines four binary interchange formats binary16, binary32, binary64 and binary128. These are analogous to the optional extended floating-point types described in [C++] at [basic.extended.fp] named std::float16_t, std::float32_t, std::float64_t, and std::float128_t.

Class template or multiple similar classes?

With four formats and four rounding modes to choose from, we must decide if we want to specify sixteen similar types (one for each combination), or specify a class template.

This hints at a problem described earlier, optimisation. This type exists to tell the compiler that it should be immune from optimisation. If the specification can forbid participation in optimisations which may reorder or fuse operations, or otherwise potentially change the result of an operation, then we have a solution. Otherwise, the implementation will need to be provided in the runtime as a separate library out of reach of such optimisations, including link time optimisation. This is a QoI decision.

Math library

C++ offers a maths library in the <cmath> header. The standard does not specify how these functions shall be implemented. This makes reproducibility unachievable without precisely specifying the algorithms, something that the standard has declined to do since superior algorithms may emerge,or may be optimised for the target processor.

Implementations of some correctly-rounded elementary functions have been widely available for a while. Sun’s [MCR] from 2004 has exp, log, pow, the trigonometric lines, and the one-parameter arctangent. The INRIA’s [CRL] from 2007 also has the arcsine and arccosine, expm1, log1p, as well as the trigonometric functions and their inverses in half-turns rather than radians. Worth noting is the [MCS] project (Zimmermann et al.), which is under active development.

8. Proposed wording

namespace std { enum class iec_559_rounding_mode : unspecified; // freestanding inline constexpr iec_559_rounding_mode iec_559_round_ties_to_even = iec_559_rounding_mode::round_ties_to_even; // freestanding inline constexpr iec_559_rounding_mode iec_559_round_toward_positive = iec_559_rounding_mode::round_toward_positive; // freestanding inline constexpr iec_559_rounding_mode iec_559_round_toward_negative = iec_559_rounding_mode::round_toward_negative; // freestanding inline constexpr iec_559_rounding_mode iec_559_round_toward_zero = iec_559_rounding_mode::round_toward_zero; // freestanding template<class T, iec_559_rounding_mode R = iec_559_round_ties_to_even> class strict_float; // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator+(strict_float<T, R>, <T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator+(strict_float<T, R>, T); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator+(T, strict_float<T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator-(strict_float<T, R>, <T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator-(strict_float<T, R>, T); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator-(T, strict_float<T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator*(strict_float<T, R>, <T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator*(strict_float<T, R>, T); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator*(T, strict_float<T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator/(strict_float<T, R>, <T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator/(strict_float<T, R>, T); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator/(T, strict_float<T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator+(strict_float<T, R>); // freestanding template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator-(strict_float<T, R>); // freestanding }

?.?.? Class template strict_float [strict.float.generic]

namespace std { template<class T, iec_559_rounding_mode R = round_ties_to_even> class strict_float { public: using value_type = T; constexpr strict_float(T); constexpr strict_float(const strict_float&) = default; template<class X, iec_559_rounding_mode Y> constexpr explicit strict_float(const strict_float&); constexpr operator value_type() const; constexpr strict_float& operator= (const T&); constexpr strict_float& operator+=(const T&); constexpr strict_float& operator-=(const T&); constexpr strict_float& operator*=(const T&); constexpr strict_float& operator/=(const T&); }; }
The class strict_float describes an object that can store a value whose format is one of those specified in Clause 3.4 of ISO/IEC 60559:2020, whose rounding mode is one of those specified in Clause 4.3 of ISO/IEC 60559:2020, and which supports reproducibility as specified in Clause 11 of ISO/IEC 60559:2020.
Template parameter T shall be one of float16_t, float32_t, float64_t or float128_t.
Execution behavior of operations on objects of type strict_float shall preserve the literal meaning of the source code. Operations shall not be reordered or fused.

??.?.?.? Member functions [strict.float.fun]

constexpr strict_float(T& val);
Postconditions: operator value_type() returns val.
template<class X, iec_559_rounding_mode Y> constexpr explicit strict_float(const strict_float<X, Y>& other);
Effects: Initialises the object with the value of other.
constexpr operator value_type() const;
Returns: the value of the object.

??.?.?.? Member operators [strict.float.ops]

constexpr strict_float& operator+=(const T& rhs);
Effects: Adds the value of rhs to the value of *this, using the rounding mode specified by template parameter R, storing the result in *this.
Returns: *this.
Postconditions: If rounding was necessary to store the result of the addition then fetestexcept(FE_INEXACT) shall be true.
Postconditions: If the result of the addition is too large to be representable then fetestexcept(FE_OVERFLOW) shall be true.
constexpr strict_float& operator-=(const T& rhs);
Effects: Subtracts the value of rhs from the value of *this, using the rounding mode specified by template parameter R, storing the result in *this.
Returns: *this.
Postconditions: If rounding was necessary to store the result of the subtraction then fetestexcept(FE_INEXACT) shall be true.
Postconditions: If the result of the subtraction is too large to be representable then fetestexcept(FE_OVERFLOW) shall be true.
constexpr strict_float& operator*=(const T& rhs);
Effects: Multiplies the value of *this by the value of rhs, using the rounding mode specified by template parameter R, storing the result in *this.
Returns: *this.
Postconditions: If rounding was necessary to store the result of the multiplication then fetestexcept(FE_INEXACT) shall be true.
Postconditions: If the result of the multiplication is too large to be representable then fetestexcept(FE_OVERFLOW) shall be true.
constexpr strict_float& operator/=(const T& rhs);
Effects: Divides the value of *this by the value of rhs, using the rounding mode specified by template parameter R, storing the result in *this.
Returns: *this.
Postconditions: If rhs equals 0 then fetestexcept(FE_DIVBYZERO) shall be true.
Postconditions: If rounding was necessary to store the result of the division then fetestexcept(FE_INEXACT) shall be true.
Postconditions: If the result of the division is too large to be representable then fetestexcept(FE_OVERFLOW) shall be true.
Postconditions: If the result of the division was subnormal with a loss of precision then fetestexcept(FE_UNDERFLOW) shall be true.

??.?.?.? Non-member operators [strict.float.nmops]

template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator+(strict_float<T, R> lhs, strict_float<T, R> rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator+(strict_float<T, R> lhs, T rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator+(T lhs, strict_float<T, R> rhs);
Returns: strict_float<T, R>(lhs) += rhs.
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator-(strict_float<T, R> lhs, strict_float<T, R> rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator-(strict_float<T, R> lhs, T rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator-(T lhs, strict_float<T, R> rhs);
Returns: strict_float<T, R>(lhs) -= rhs.
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator*(strict_float<T, R> lhs, strict_float<T, R> rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator*(strict_float<T, R> lhs, T rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator*(T lhs, strict_float<T, R> rhs);
Returns: strict_float<T, R>(lhs) *= rhs.
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator/(strict_float<T, R> lhs, strict_float<T, R> rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator/(strict_float<T, R> lhs, T rhs);
template<class T, iec_559_rounding_mode R> constexpr strict_float<T, R> operator/(T lhs, strict_float<T, R> rhs);
Returns: strict_float<T, R>(lhs) /= rhs.

Feature test macro

??.?.? Header <version> synopsis [version.syn]

#define __cpp_lib_strict_float 2024??L

9. Open questions

  1. What other domains have uses for reproducibility?
  2. [P2746R5] also specifies square root as an operation. Should this also be supported in this API? How about FMA? Any other operations?
  3. Should the rounding attribute be specified statically or dynamically? If statically, should there be a separate type for each attribute, or should the attribute be a class template parameter?
  4. Are all the rounding modes for binary arithmetic required? Are they used?
  5. Can specification be provided which will prevent code-motion optimisations?
  6. Should a single class template, over format and rounding, be provided, or sixteen individual classes?
  7. Can the type proposed in [P2746R5] fully meet all the requirements of reproducibility, or can the proposal be easily modified so to do?
  8. Clause 11 of [559] does NOT specify the inexact status flag as one of the reproducible flags. Is the overspecification in the wording problematic?
  9. Can core language types be specified β€œas-if” they were a full specialisation of std::strict_float? E.g. std::float32_strict_rte_t is specified β€œas-if” it were std::strict_float<std::float32_t, std::round_ties_to_even>?
  10. Should the non-member binary operators be hidden friends?
  11. Should there be member functions which perform the operations and offer the user the opportunity to override the rounding mode?
  12. Is there appetite for fully specifying some or all of the optional extended floating-point types described in [basic.extended.fp] such that they are conformant to some extent with [559]

10. References and bibliography

[559] ISO/IEC JTC 1/SC 25 (May 2020). ISO/IEC 60559:2020 β€” Information technology β€” Microprocessor Systems β€” Floating-Point arithmetic. ISO. pp. 1–74.

[BAN] Market capitalization of the 100 largest banks worldwide from 1st quarter 2016 to 1st quarter 2024 https://www.statista.com/statistics/265135/market-capitalization-of-the-banking-sector-worldwide/

[BYR] Beware of fast-math (Dec 2021). https://simonbyrne.github.io/notes/fastmath/

[C++] C++23 Working Draft https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf

[CEX] Compiler Explorer https://godbolt.org/z/W8shs6dao

[CRL] crlibm https://github.com/taschini/crlibm/blob/master/crlibm.h

[GOL] What Every Computer Scientist Should Know About Floating-Point Arithmetic, David Goldberg, https://dl.acm.org/doi/10.1145/103162.103163

[HR1] You're Going To Have To Think!, Overload post. https://accu.org/journals/overload/18/99/harris_1702/

[HR2] Why Fixed Point Won't Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/18/100/harris_1717/

[HR3] Why Rationals Won’t Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/19/101/harris_1986/

[HR4] Why Computer Algebra Won’t Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/19/102/harris_1979/

[HR5] Why Interval Arithmetic Won’t Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/19/103/harris_1974/

[HNE] Beware of fast-math, Hacker News discussion. https://news.ycombinator.com/item?id=29201473

[MCR] Correctly rounded libm https://github.com/simonbyrne/libmcr

[MCS] The mathematical library for critical systems (LibmCS) https://gitlab.com/gtd-gmbh/libmcs/-/tree/core-math

[MPF] The GNU MPFR Library https://www.mpfr.org/

[OBI] Cross-Platform Issues With Floating-Point Arithmetics in C++, GΓΌnter Obiltschnig, https://www.appinf.com/download/FPIssues.pdf

[P2746R5] Deprecate and Replace Fenv Rounding Modes, Hans Boehm, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2746r5.pdf

[P3397R0] Clarify requirements on extended floating point types, Hans Boehm, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3397r0.pdf

[SEI] Deterministic cross-platform floating point arithmetics. http://christian-seiler.de/projekte/fpmath/

[UNU] Unum and Posit https://posithub.org/

[VGI] Video game industry - Statistics & Facts https://www.statista.com/topics/868/video-games/