Reproducible floating-point results

1. Abstract

The C++ standard, hereafter referred to as [C++], has very little to say about floating-point specification, which means that it is extremely hard to write floating-point expressions that, given the same inputs and rounding mode, will produce the same results on all implementations. This paper proposes the introduction of a new type which specifies sufficient conformance with the provisions specified in ISO/IEC 60559:2020 (which is itself imported from IEEE Std 754-2019, unmodified except for editorial adjustment), hereafter referred to as [559], to guarantee this reproducibility.

2. Revision history

R0 2024-09-09

Presented to the BSI C++ panel on 2024-09-30

R1 2024-10-08

Extended the contributor list. Many thanks, all of you.
Clarification of the relationship between IEE-754:2019 and ISO/IEC 60559:2020.
Added revision list.
Expanded motivation, clarifying non-associativity and varying error requirements.
Added banking use case.
Clarification of the need for bitwise identical results in simulations for games.
Added geometry use case.
Highlighted para 20 of Annex F.3 of the C standard.
Outlined non-goals.
Additional section on the provision of correctly rounded mathematical functions.
Added bibliography to the references for background reading.

R2 2025-01-13

Extended the contributor list. Many thanks, all of you.
Expanded the status quo to include discussion of the compile-time floating point environment and the parsing of literals.
Added discussion of other papers related to the problem space.
Added section on new direction for investigation with focus on core type formed from storage and attributes.
Withdrew library type proposal and wording.
Expanded core type proposal.
Updated open questions.

3. Motivation

From [559] Clause 11:

Programs that can be reliably translated into an explicit or implicit sequence of reproducible operations on reproducible formats produce reproducible results. That is, the same numerical and reproducible status flag results are produced.

Reproducible results require cooperation from language standards, language processors and users. A language standard should support reproducible programming.

C++ does not support reproducible programming. In fact, the standard does not specify floating point behaviour at all. In [C++], [basic.fundamental] paragraph 12:

Except as specified in 6.8.3, the object and value representations and accuracy of operations of floating-point types are implementation-defined.

Even in 6.8.3 [basic.extended.fp], we read:

If the implementation supports an extended floating-point type [basic.fundamental] whose properties are specified by the ISO/IEC 60559 floating-point interchange format binary16, then the typedef-name std::float16_t is defined in the header <stdfloat> and names such a type, the macro __STDCPP_FLOAT16_T__ is defined [cpp.predefined], and the floating-point literal suffixes f16 and F16 are supported [lex.fcon].

The subsequent paragraphs continue with the other binary interchange formats specified in Clause 3 of [559]. All that is specified is that the interchange format is supported. Behaviour is not specified and thus remains implementation defined.

C++ programmers might expect that any code they write shall behave identically on any platform, in much the same way that 2 + 2 will always equal 4. However, the very broad latitude granted here means that this is not necessarily the case, and that floating-point calculations are not necessarily reproducible on all platforms.

Worse than that, even the same function within the same executable compiled with the same compiler flags but called in different contexts can produce subtly different results due to different optimisations after inlining and due to link time optimisation. Even moving a function out of a header file to a source file is not guaranteed to help matters.

Of course, it bears clarifying that floating point numbers can only represent a small subset of their range. The results of operations are correct within specific rounding. How programs deal with this error is up to the programmer, and different domains have different error requirements. Additionally, floating point arithmetic is not associative: a + (b + c) is not necessarily equal to (a + b) + c. Since compilers may reorder instructions for optimisation purposes, this can have an unwanted effect on accuracy. Several domains would benefit from floating-point reproducibility.

Banking

The global banking sector is worth over $6 trillion [BAN]. Banks have very strict regulatory requirements relating to pricing and risk models. It is very common for software systems in banks to be deployed to several different platforms, including Windows and Linux, but also on different hardware platforms where the same models are used on AMD and Intel CPUs, but also on GPUs.

The fact that arithmetic on different platforms produces different results is an enormous challenge. When the business department signs off a specific set of figures generated on a specific platform, then any different behaviour on a different platform invalidates those sign-offs. This results in a huge challenge for many financial systems and having a possibility to get reproducible numbers across platforms would be invaluable in such contexts.

Game development

The video game industry is worth over $340bn worldwide [VGI]. Games typically provide a simulation which players interact with, sometimes from different machines over the internet. As each player interacts with the simulation, those interactions must be reflected on all machines so that the players continue to interact with the same simulation. This can only be achieved with bitwise identical results, otherwise the effects of errors propagate chaotically through these iterative function systems.

Games can be deployed to multiple platforms, typically to mobile phones, consoles, desktop and laptop computers. These platforms all enjoy different tool chains. Since simulations are specified using floating-point arithmetic, this prevents players on different platforms playing together unless the simulation is extremely basic, and the entire simulation can be transmitted trivially across the internet to each player. The divergence in floating-point results across implementations means that the simulation cannot remain synchronised across different platforms.

Reproducibility eliminates this divergence, opening up the possibility of all players being able to compete with one another regardless of their hardware platform.

Scientific computing

Since behaviour is not specified, implementations are free to fuse and reorder instructions to benefit execution speed. Sometimes this can have catastrophic results. From [GOL] 1.4:

Catastrophic cancellation occurs when the operands are subject to rounding errors. For example, in the quadratic formula, the expression b² – 4ac occurs. The quantities b² and 4ac are subject to rounding errors since they are the results of floating-point multiplications. Suppose they are rounded to the nearest floating-point number and so are accurate to within 1/2 ulp. When they are subtracted, cancellation can cause many of the accurate digits to disappear, leaving behind mainly digits contaminated by rounding error. Hence the difference might have an error of many ulps.

The paper then goes on to describe some strategies for mitigating this problem, involving rearrangement of operations. However, since there is no guarantee in the standard that the order of evaluation will be honoured, such rearrangement may be futile and catastrophic cancellation cannot be avoided.

Reproducibility requires the order of evaluation to be specified unambiguously. As a result, catastrophic cancellation can be avoided.

Geometry

Relevant to game development but also to the wider mathematical community, a typical problem in geometry is containment: does a point lie inside or outside of a polygon. One algorithm is to find the centroid of the polygon, find the nearest edge to the point, and find the line that connects the centroid, the point and the edge. If the distance from the point to the centroid exceeds the distance of the edge from the centroid, then it is outside of the polygon.

If the difference between these two values is very small, then rounding, fusion or reordering can yield different truth values for whether or not the polygon contains the point. This can be a very bad bug.

4. Status quo

The obvious question is “why are floating-point operations not reproducible on different implementations?”

There are three problems. Firstly, the compile-time floating-point environment is under-specified. There is no way of knowing or querying what the rounding mode is at compile time, nor of explicitly setting it. Similarly, there is no way of querying whether fusing, reordering or simplifying of floating-point operations is enabled for the translation unit, and taking mitigating measures to preserve reproducibility. This can lead to expressions evaluating to different results at compile time and run time, as has been observed with the <cmath> functions.

Secondly, the parsing of literals is under-specified. This shortcoming is identified in [CWG2752]. Inconsistent parsing may result in different representations.

Thirdly, consider the following example available on [CEX]:

int main() {
	float line_0x(0.f);
	float line_0y(7.f);

	float normal_x(0.57f);
	float normal_y(0.8f);

	float px(10.f);
	float py(0.f);

	float v2p_x = px - line_0x;
	float v2p_y = py - line_0y;

	float distance = v2p_x * normal_x + v2p_y * normal_y;

	float direction_x = normal_x * distance;
	float direction_y = normal_y * distance;

	float proj_vector_x = px - direction_x;
	float proj_vector_y = py - direction_y;

	std::cout << "distance:      " << distance << std::endl;
	std::cout << "proj_vector_y: " << proj_vector_y << std::endl;
}

When built with version 24.5 of the Nvidia C++ compiler, without specifying any compiler options, the output is:

distance:  	   0.0999997
proj_vector_y: -0.0799998

When built with version 18.1.0 of the clang C++ compiler, without specifying any compiler options, the output is:

distance:  	   0.0999999
proj_vector_y: -0.0799999

Worse, if -march=skylake is passed to the clang C++ compiler, the output is:

distance:  	   0.1
proj_vector_y: -0.08

Analysing the memory of each running program reveals that, indeed, the bit patterns vary. Application of optimisations leads to variations between implementations in the nature, number and order of floating-point operations being executed during the program. This leads to different numbers of rounding operations, in different orders, and thus different values on different platforms.

5. Prior art

Before considering an approach, we should consider the approaches taken by the other languages standardised through ISO/IEC.

ISO/IEC 8652:2023 Ada

The Ada standard does not normatively reference any revision of ISO/IEC 60559. It defines its own floating-point format and operations. However, there is an expectation that some future version of Ada will indeed support a revision of ISO/IEC 60559.

ISO/IEC 13211-1:1995/Cor 3:2017 Prolog

The Prolog standard makes no reference to any revision of ISO/IEC 60559 whatsoever.

ISO/IEC 1989:2023 COBOL

The COBOL standard specifies language support for the facilities defined by ISO/IEC 60559:2011 in several paragraphs throughout sections such as 8.8.1 Arithmetic expressions, 8.8.4 Conditional expressions, 14.6.8 Alignment and transfer of data into data items and in Annex D.17 Forms of arithmetic. Typically, the specification is similar to the form:

All additions, subtractions, multiplications and divisions performed in the development of the result shall be performed in accordance with the corresponding rules in ISO/IEC/IEEE 60559.

Particularly, the specification directs implementers to follow ISO/IEC 60559:2011. There is no option to avoid supporting these facilities; all arithmetic support is specified with reference to ISO/IEC 60559:2011.

ISO/IEC 1539-1:2023 Fortran

The FORTRAN standard specifies, in Clause 17, language support for the facilities defined by [559]. From the first paragraph of that clause:

The intrinsic modules IEEE_EXCEPTIONS, IEEE_ARITHMETIC, and IEEE_FEATURES provide support for the facilities defined by ISO/IEC 60559:2020. Whether the modules are provided is processor dependent. If the module IEEE_FEATURES is provided, which of the named constants defined in this document are included is processor dependent.

The decision to support [559] facilities is not up to the implementer. Their only constraint is whether the processor supports [559] facilities. Again, all operations are specified with reference to [559].

ISO/IEC 9899:2018 C

In its annexes F though H, the C standard specifies language support for the facilities defined by [559]. From Annex F, Introduction, para 1:

This annex specifies C language support for the IEC 60559 floating-point standard. The IEC 60559 floating-point standard is specifically Floating-point arithmetic (ISO/IEC 60559:2020), also designated as IEEE Standard for Floating-Point Arithmetic (IEEE 754–2019). IEC 60559 generally refers to the floating-point standard, as in IEC 60559 operation, IEC 60559 format, etc.

It does not mandate support for floating-point arithmetic as defined by [559], and implementations are free to ignore that standard when implementing floating-point arithmetic. The annexes explicitly describe how an implementation SHOULD support [559] floating point arithmetic, not how an implementation SHALL support it. Even then, at Annex F.3, para 20:

The C functions in the following table correspond to mathematical operations recommended by IEC 60559. However, correct rounding, which IEC 60559 specifies for its operations, is not required for the C functions in the table. 7.33.8 (potentially) reserves cr_ prefixed names for functions fully matching the IEC 60559 mathematical operations. In the table, the C functions are represented by the function name without a type suffix.

Clearly, correct rounding is a challenging and expensive feature to support.

ISO/IEC DIS 14882:2023 C++

[C++] barely references [559]. In fact, only when defining the optional extended floating point types in section 6.8.3 [basic.extended.fp] is there any meaningful reference. The type trait std::is_iec559 is true for those types, and signals that those types offer some representations from that specification.

6. Possible approaches

There are several ways in which reproducible floating-point operations could be achieved. Now is the time to point out that the intent of this paper is not to specify full conformance with [559]. That would be a substantial undertaking, although this author is interested in further work in that direction, particularly in consideration of [P3397]. Nor does this paper make any proposals relating to parallel reproducibility. The purpose of this paper is to introduce a solution to the problem of reproducibility for some use cases.

Also, any solution which prioritises reproducibility does so at the cost of performance. The free hand given to implementers regarding operations on floating point numbers opens up opportunities for optimisation that would be unavailable if reproducibility were required. Therefore, the choice of reproducibility must be made by the programmer, mindful of the performance penalty this introduces.

Compiler options

One approach is to look at what options compilers already offer for reproducibility. Surveying clang, GCC and MSVC, there is no explicit support for reproducibility. However, they all offer opportunities for controlling the way that floating-point maths arithmetic is implemented.

For example, MSVC offers a number of command line flags prefixed /fp. Of note are two options, /fp:fast and /fp:strict. /fp:fast allows the compiler to reorder, combine or simplify floating-point operations to optimise floating-point code for speed and space. /fp:strict preserves the source ordering and rounding properties of floating-point code when it generates object code, and observes [559] specifications when handling special values. Similarly, GCC offers -fno-fast-math as a command line option, which offers similar behaviour to MSVC’s /fp:strict option, although it is not sufficient on its own to get strict floating-point behaviour.

However, this relies on details of the implementation and is not future-proof. Additionally, optimisation flags can interfere with these floating-point arithmetic choices. This is a deeply unergonomic solution that is vulnerable to future decisions of the implementers.

Fences

Another approach might be to signal to the compiler that ordering and rounding properties be untouched in selected places by using a new kind of fence. This would be specified to prevent the compiler from reordering, combining or simplifying floating-point operations. It would ensure that the same number of rounding operations took place regardless of implementation, the same rounding modes were used regardless of implementation, and that the operations took place in the same order, regardless of implementation. Writing a larger floating-point expression as a series of single expression statements, interleaved with this new kind of fence, would detail to the compiler exactly what was wanted.

Again, this is a rather unergonomic solution. Additionally, given that the accuracy of operations of floating-point types are implementation-defined, this would require that latitude to be tightened and for floating-point behaviour to be fully specified. If that specification is to take place, then specifying reproducibility as defined in [559] would be the least astonishing thing to do.

Specification overhaul

The next approach to consider is following the example of prior art. Looking at COBOL and Fortran, their standards dictate that all floating-point arithmetic conforms to a revision of ISO/IEC 60559. Following this pattern, the C++ standard would need to be respecified to ensure that the object and value representations and accuracy of operations of the standard and extended optional floating-point types conform fully to a revision of ISO/IEC 60559.

While this solution offers much better ergonomics for the programmer, it would degrade performance of existing code, requiring programmers to interact more carefully with the command line options of their compilers.

`#pragma` options

[C++] does not require compilers to support any pragmas, but many implementations support the ISO C standard pragmas used to control floating-point behaviour:

#pragma STDC FENV_ACCESS
#pragma STDC FP_CONTRACT
#pragma STDC CX_LIMITED_RANGE

These could be introduced into the language, along with an additional pragma, for example:

#pragma STDC FP_REPRODUCIBLE

While this seems like a simple solution, it would still require re-specification of existing types, which takes us very close to the specification overhaul described above. Also, it would require that any function called from within the block covered by the pragma would also be modified with such a pragma.

Function qualification

Just as class member functions can be qualified as const, a new function qualifier could be introduced which indicates that all floating-point operations within the scope of the function should be reproducible. This has the advantage of being a transitive property, like const, and users would be able to call other functions that are similarly qualified and retain reproducibility.

Unfortunately, we are in the same position with pragmas: this approach would still require re-specification of existing types.

A new core type

We have std::int_fast32_t, std::int_least32_t and the related types of differing width specifications. We also have the optional extended floating-point types std::float32_t and so on. We could take the same approach with floating-point types.

A new type could be introduced that supports all the specifications of Clause 11 of [559], particularly reproducible operations, format, status flags, and an implicit reproducible attribute, with a name like std::float_strict32_t. The specification of this type would prohibit reordering, fusion and variation in rounding. All rounding should be roundTiesToEven, as specified in Clause 4 of [559].

The requirement for reproducibility would mean that it would not be feasible to specify implementations of the contents of the <cmath> header. Users would be expected to provide all operations beyond addition, subtraction, multiplication and division.

This solution contains all the constraints of the problem to a single type, rather than requiring annotation distributed through code. However, introducing a new type to the core language is a considerable undertaking.

A new library type

There is a proposal in flight which partially solves this problem, [P2746R5] Deprecate and Replace Fenv Rounding Modes. From its abstract:

We argue that floating point rounding modes as specified by fesetround() are largely unusable, at least in C++. Furthermore, this implicit argument to floating point operations, which is almost never used, and largely opaque to the compiler, causes both language design and compilation problems. We thus make the somewhat drastic proposal to deprecate it, in spite of its long history, and to replace it with a much better-behaved facility.

In summary, that proposal proposes the elimination of functions fesetround and fegetround and replacing them with a struct which offers a very small set of the [559]-specified operations along with a rounding mode provided at construction. The value is not stored in the struct; rather, the member functions which are binary operations take two operands and apply the operation with the stored rounding mode. The struct effectively defines the rounding attribute as an object with a lifetime rather than as a #pragma qualifying a section of source code.

By additionally specifying that the operations cannot be fused or reordered, rounding mode would be respected, the correct number of roundings would be applied in the same order on all implementations, and all operations would be reproducible. This carries the penalty that a function call may be incurred for every operation, rather than some function calls being optimised away via FMA, for example. However, just as there is a trade-off between accuracy and speed, so is there a trade-off between reproducibility and speed. As described earlier, compiler flags specify optimisations that improve floating-point performance at the cost of accuracy. None of these optimisations would necessarily be available to this type.

Alternatively, a type can be proposed which is a companion type to the type proposed in [P2746R5]. At Six Impossible Things Before Breakfast, we have implemented several classes that we use to emulate reproducible floating-point operations.

In addition to this paper, other relevant papers were discussed at Wrocław in November 2024.

[P3479R0] seeks to extend C's floating-point #pragma directives for use in C++ code. During discussion, it was observed that introducing new #pragma directives was not a good fit for the Evolution Working Group. Complications arise for ODR since an inline function could be compiled under different contexts according to the disposition of the #pragma directives. There was broad support for controlling the floating point environment, but #pragma directives were an unpopular choice, as were compiler flags, because of the potential for ODR violations. The preferred choice was to embed the selected floating point environment into the type system.

[P3488R1] seeks to resolve [CWG2752] following discussion at Kona in November 2023, which arrived at no consensus. The paper arrived at no consensus in EWG.

8. Discussion

Discussion of the prior revision of this paper, which proposed a library type, focused on the implementation of that type and how supplying its implementation through correctly rounded functions would be a better solution than providing a type built on top of those functions. No polls were taken, but there was a general agreement that this would be a better approach. However, it was also recognised that there is a larger problem about floating-point behaviour in general beyond reproducible results.

We are being drawn by these arguments towards a solution that defines a core type with format and floating-point attributes as specified in [559]. Part of this work has already been done, as mentioned earlier in the motivation section. [P1467R9] was voted into the working draft for [C++] and imported the three binary formats specified in clause three of [559] as well as specifying additional formats. Additionally, new literal suffixes were added, conversion rules were expanded and overloads were added to the standard library. However, no additional specification was added regarding floating-point attributes, nor accuracy, nor even behaviour, of operations on these floating-point types.

Part of the remaining work is to decide how to parameterise the type such that the attributes described in clause four of [559] can be selected. The set of attributes is:

rounding-direction

roundTiesToEven
roundTowardPositive
roundTowardNegative
roundTowardZero

alternate exception handling

invalid-operation
division-by-zero
overflow
underflow
inexact

preferredWidth

preferredWidthNone
preferredWidthFormat

value-changing optimisation

associate
distribute
contraction-synthesis
format-synthesis
excess-precision

reproducibility

true
false

Although this paper is concerned with reproducibility, this problem needs to be solved for all attributes. However, by enabling a user to specify rounding-direction and permitted value-changing optimisations then reproducibility can be achieved. This does not fully satisfy the requirements of clause four of [559] though, since that standard requires that some attributes are modifiable at runtime. This is the area of endeavour partially covered by [P2746R5].

9. Open questions

What other domains have uses for reproducibility?
Should the rounding attribute be specified statically or dynamically? If statically, should there be a separate type for each attribute, or should the attribute be a class template parameter?
Can the type proposed in [P2746R5] fully meet all the requirements of reproducibility, or can the proposal be easily modified so to do?
Clause 11 of [559] does NOT specify the inexact status flag as one of the reproducible flags. Is the overspecification in the wording problematic?
Is there appetite for fully specifying some or all of the optional extended floating-point types described in [basic.extended.fp] such that they are conformant to some extent with [559]
Is there still appetite for cr_-prefixed functions as specified for resesrvation by the C standard at 7.33.8?
Should we attempt to tighten the specification of floating-point behaviour for existing types, or limit ourselves to a new [559]-compliant type?
Do we need to be able to query the floating point environment as currently specified, and would such machinery be obviated by the introduction of a new type?

10. References and bibliography

[559] ISO/IEC JTC 1/SC 25 (May 2020). ISO/IEC 60559:2020 — Information technology — Microprocessor Systems — Floating-Point arithmetic. ISO. pp. 1–74.

[BAN] Market capitalization of the 100 largest banks worldwide from 1st quarter 2016 to 1st quarter 2024 https://www.statista.com/statistics/265135/market-capitalization-of-the-banking-sector-worldwide/

[BYR] Beware of fast-math (Dec 2021). https://simonbyrne.github.io/notes/fastmath/

[C++] C++23 Working Draft https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf

[CEX] Compiler Explorer https://godbolt.org/z/W8shs6dao

[CRL] crlibm https://github.com/taschini/crlibm/blob/master/crlibm.h

[CWG2752] Core issue 2752 Excess-precision floating-point literalshttps://github.com/cplusplus/papers/issues/1584

[GOL] What Every Computer Scientist Should Know About Floating-Point Arithmetic, David Goldberg, https://dl.acm.org/doi/10.1145/103162.103163

[HR1] You're Going To Have To Think!, Overload post. https://accu.org/journals/overload/18/99/harris_1702/

[HR2] Why Fixed Point Won't Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/18/100/harris_1717/

[HR3] Why Rationals Won’t Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/19/101/harris_1986/

[HR4] Why Computer Algebra Won’t Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/19/102/harris_1979/

[HR5] Why Interval Arithmetic Won’t Cure Your Floating Point Blues, Overload post. https://accu.org/journals/overload/19/103/harris_1974/

[HNE] Beware of fast-math, Hacker News discussion. https://news.ycombinator.com/item?id=29201473

[MCR] Correctly rounded libm https://github.com/simonbyrne/libmcr

[MCS] The mathematical library for critical systems (LibmCS) https://gitlab.com/gtd-gmbh/libmcs/-/tree/core-math

[MPF] The GNU MPFR Library https://www.mpfr.org/

[OBI] Cross-Platform Issues With Floating-Point Arithmetics in C++, Günter Obiltschnig, https://www.appinf.com/download/FPIssues.pdf

[P1467R9] Extended floating-point types and standard names, David Olsen et al, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1467r9.html

[P2746R5] Deprecate and Replace Fenv Rounding Modes, Hans Boehm, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2746r5.pdf

[P3397R0] Clarify requirements on extended floating point types, Hans Boehm, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3397r0.pdf

[P3479R0] Enabling C pragma support in C++, Joshua Cranmer, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3479r0.html

[P3488R1] Floating-Point Excess Precision, Matthias Kretz, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3488r1.pdf

[SEI] Deterministic cross-platform floating point arithmetics. http://christian-seiler.de/projekte/fpmath/

[UNU] Unum and Posit https://posithub.org/

[VGI] Video game industry - Statistics & Facts https://www.statista.com/topics/868/video-games/