"Is floating-point math broken?" - Cato Johnston (Stack Overflow)
1. Introduction
When
was proposed for standardization, floating-point formatting
was defined in terms of
to simplify specification. While being
a positive change overall, this introduced a small but undesirable change
compared to the design and reference implementation in [FMT], resulting in
surprising behavior to users, performance regression and an inconsistency with
other mainstream programming languages that have similar facilities. This paper
proposes fixing this issue, bringing the floating-point formatting on par with
other languages and in line with the original design intent.
2. Problem
Since Steele and White’s seminal paper ([STEELE-WHITE]), based on their work in the 70s, many programming languages converged on similar default representation of floating point numbers. The properties of an algorithm that produces such a representation are formulated in the paper as follows:
No information is lost; the original fraction can be recovered from the output by rounding.
No "garbage digits" are produced.
The output is correctly rounded.
It is never necessary to propagate carries on rounding.
The second bullet point means
that the algorithm shouldn’t produce more decimal digits (in the significand)
than necessary to satisfy the other requirements, most importantly the
round-trip guarantee. For example,
should be formatted as
and not
even though they produce the same value when read back
into an IEEE 754
.
The last bullet point is more of an optimization for retro computers and is less relevant on modern systems.
[STEELE-WHITE] and papers that followed referred to the second criteria as "shortness" even though it only talks about the number of decimal digits in the significand and ignores other properties such as the exponent and the decimal point.
Once such shortest decimal significand and the corresponding exponent are known
the formatting algorithm normally chooses between fixed and exponential
representation based on the value range. For example, in Python
([PYTHON-FORMAT]) and Rust (the
format which gives the "shortest"
representation for FP numbers) if the decimal exponent is greater or equal to 16
for
, the number is printed in the exponential format:
>>> 1234567890123456.0 1234567890123456.0 >>> 12345678901234567.0 1.2345678901234568e+16
16 is a reasonable threshold because IEEE 754 double precision can represent most 16-digit decimal values with high fidelity, and it balances human readability with precision retention when switching between fixed and exponential notation.
[FMT], which is modeled after Python’s formatting facility, adopted a similar representation based on the exponent range.
Swift has similar logic, switching to exponential notation for numbers greater or equal to 253 (9007199254740992). This is also a reasonable choice although a threshold that is not a power of 10 might be less intuitive to some users.
Similarly, languages normally switch from fixed to exponential notation when the absolute value is smaller than some small decimal power of 10, usually 10-3 ([JAVA-DOUBLE]) or 10-4 (Python, Rust, Swift).
When
was proposed for standardization, floating-point formatting
was defined in terms of
to simplify specification with the
assumption that the latter follows the industry practice for the default format
described above. It was great for explicit format specifiers such as
but,
as it turned out recently, it introduced an undesirable change to the default
format. This problem is that
defines "shortness" in terms of the
number of characters in the output which is different from the "shortness" of
decimal significand normally used both in the literature and in the industry.
The exponent range is much easier to reason about. For example, in this model
and
are printed in the same format:
>>> 100000.0 100000.0 >>> 120000.0 120000.0
However, if we consider the output size the two similar numbers are now printed completely differently:
auto s1 = std :: format ( "{}" , 100000.0 ); // s1 == "1e+05" auto s2 = std :: format ( "{}" , 120000.0 ); // s2 == "120000"
It seems surprising and undesirable.
Note that the output
is not really of the shortest possible number of
characters, because + and the leading zero in the exponent are redundant.
In fact, those are required, according to the specification of
([charconv.to.chars]),
is converted to a string in the style of
value in the
printf locale.
"C"
and the exponential format is defined as follows by the C standard ([N3220]):
A
argument representing a floating-point number is converted in the style [−]d.ddd e±dd ...
double
Nevertheless, users interpreting the shortness condition too literally may find this surprising.
Even more importantly, the current representation violates the original shortness requirement from [STEELE-WHITE]:
auto s = std :: format ( "{}" , 1234567890123456700000.0 ); // s == "1234567890123456774144"
The last 5 digits,
, are what Steele and White referred to as "garbage
digits" that almost no modern formatting facilities produce by default.
For example, Python avoids it by switching to the exponential format as one
would expect:
>>> 12345678901234567800000.0 1.2345678901234568e+22
The current behavior is a consequence of the shortness-in-characters criterion favoring the fixed format for large numbers while still satisfying the correct rounding condition (([charconv.to.chars])):
If there are several such representations, the representation with the smallest difference from the floating-point argument value is chosen, resolving any remaining ties using rounding according to
([round.style]).
round_to_nearest
Apart from giving a false sense of accuracy to users it also has negative
performance implications. Many of the optimized float-to-string algorithms
based on Steele and White’s criteria, such as Dragonbox ([DRAGONBOX]) and
Ryū ([RYU]), only focus on those criteria, especially the shortness of
decimal significand rather than the number of characters. As a result, an
implementation of the default floating-point handling of
(and
) cannot just directly rely on these otherwise perfectly
appropriate algorithms. Instead, it has to introduce non-trivial logic
dedicated for computing these "garbage digits". Furthermore, having to
introduce dedicated logic is likely not just because of the lack of advancement
in the algorithm research, because in this case we do need to compute more
digits than the actual precision implied by the data type, thus it is natural
to expect that we may need more precision than the case without garbage digits.
(In other words, even though a new algorithm that correctly deals with this
garbage digits case according to the current C++ standard is invented, it is
likely that it still includes some special handling of that case, in one form
or another.)
The performance issue can be illustrated on the following simple benchmark:
#include <format>#include <benchmark/benchmark.h>// Output: "1.2345678901234568e+22" double normal_input = 12345678901234567000000.0 ; // Output (current): "1234567890123456774144" // Output (desired): "1.2345678901234568e+21" double garbage_input = 1234567890123456700000.0 ; void normal ( benchmark :: State & state ) { for ( auto s : state ) { auto result = std :: format ( "{}" , normal_input ); benchmark :: DoNotOptimize ( result ); } } BENCHMARK ( normal ); void garbage ( benchmark :: State & state ) { for ( auto s : state ) { auto result = std :: format ( "{}" , garbage_input ); benchmark :: DoNotOptimize ( result ); } } BENCHMARK ( garbage ); BENCHMARK_MAIN ();
Results on macOS with Apple clang version 16.0.0 (clang-1600.0.26.6) and libc++:
% ./double-benchmark Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory This does not affect benchmark measurements, only the metadata output. ***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect. 2025-02-02T08:06:13-08:00 Running ./double-benchmark Run on (8 X 24 MHz CPU s) CPU Caches: L1 Data 64 KiB L1 Instruction 128 KiB L2 Unified 4096 KiB (x8) Load Average: 7.61, 5.78, 5.16 ----------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------- normal 77.5 ns 77.5 ns 9040424 garbage 91.4 ns 91.4 ns 7675186
Results on GNU/Linux with gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 and libstdc++:
$ ./double-benchmark 2025-02-02T17:22:25+00:00 Running ./double-benchmark Run on (2 X 48 MHz CPU s) CPU Caches: L1 Data 128 KiB (x2) L1 Instruction 192 KiB (x2) L2 Unified 12288 KiB (x2) Load Average: 0.25, 0.10, 0.02 ----------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------- normal 73.1 ns 73.1 ns 9441284 garbage 90.6 ns 90.6 ns 7360351
Results on Windows with Microsoft (R) C/C++ Optimizing Compiler Version 19.40.33811 for ARM64 and Microsoft STL:
>double-benchmark.exe 2025-02-02T08:10:39-08:00 Running double-benchmark.exe Run on (2 X 2000 MHz CPU s) CPU Caches: L1 Instruction 192 KiB (x2) L1 Data 128 KiB (x2) L2 Unified 12288 KiB (x2) ----------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------- normal 144 ns 143 ns 4480000 garbage 166 ns 165 ns 4072727
Although the output has the same size, producing "garbage digits" makes
15-24% slower on these inputs. If we exclude string construction
time, the difference will be even more profound. For example, profiling the
benchmark on macOS shows that the
call itself is more than 50% (!)
slower:
garbage ( benchmark :: State & ) : 241.00 ms ... std :: __1 :: to_chars_result std :: __1 :: _Floating_to_chars [ abi : ne180100 ] < ... > ( char * , char * , double , std :: __1 :: chars_format , int ) normal ( benchmark :: State & ) : 159.00 ms ... std :: __1 :: to_chars_result std :: __1 :: _Floating_to_chars [ abi : ne180100 ] < ... > ( char * , char * , double , std :: __1 :: chars_format , int )
For comparison here are the results of running the same benchmark with
replaced with
which doesn’t produce "garbage
digits":
$ ./double-benchmark Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory This does not affect benchmark measurements, only the metadata output. ***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect. 2025-03-15T08:18:56-07:00 Running ./double-benchmark Run on (8 X 24 MHz CPU s) CPU Caches: L1 Data 64 KiB L1 Instruction 128 KiB L2 Unified 4096 KiB (x8) Load Average: 3.00, 3.91, 4.85 ------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------ fmt_normal 53.0 ns 53.0 ns 13428484 fmt_garbage 53.4 ns 53.4 ns 13032712
As expected, the time is nearly identical between the two cases. It demonstrates that the performance gap can be eliminated if this paper is accepted.
Locale makes the situation even more confusing to users. Consider the following example:
std :: locale :: global ( std :: locale ( "en_US.UTF-8" )); auto s = std :: format ( "{:L}" , 1200000.0 ); // s == "1,200,000"
Here
is
even though
would be shorter.
3. Proposal
The current paper proposes fixing the default floating-point representation in
to use exponent range, fixing the issues described above.
Consistent, easy to reason about output format:
Code | Before | After |
---|---|---|
|
|
|
|
|
|
No "garbage digits":
Code | Before | After |
---|---|---|
|
|
|
Consistent localized output (assuming en_US.UTF-8
locale):
Code | Before | After |
---|---|---|
|
|
|
|
|
|
4. Wording
Modify [format.string.std]:
Table 105 — Meaning of type options for floating-point types [tab:format.type.float]
Type | Meaning |
|
If precision is specified, equivalent to
where
otherwise. |
... | ... |
| The same as , except that it uses to indicate exponent.
|
none |
Let be if is in the range [10-4, 10n), where
10n is
2 + 1 rounded down to the nearest power of 10, otherwise.
If precision is specified, equivalent to
where
otherwise. |
5. Implementation and usage experience
The current proposal is based on the existing implementation in [FMT] which has been available and widely used for over 6 years. Similar logic based on the value range rather than the output size is implemented in Python, Java, JavaScript, Rust and Swift.
6. Impact on existing code
This is technically a breaking change for users who rely on the exact
output that is being changed. However, the change doesn’t affect ABI or round
trip guarantees. Also reliance on the exact representation of floating-point
numbers is usually discouraged so the impact of this change is likely moderate.
In the past we had experience with changing the output format in [FMT], usage
of which is currently at least an order of magnitude higher than that of
.