Values of floating-point types
- Document number:
- P3938R0
- Date:
2025-12-14 - Audience:
- SG6, EWG, CWG
- Project:
- ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
- Reply-to:
- Jan Schultke <janschultke@gmail.com>
- GitHub Issue:
- wg21.link/P3938/github
- Source:
- github.com/eisenwave/cpp-proposals/blob/master/src/floating-point-values.cow
Contents
Introduction
Q&A
Do infinity and NaN exist from a core language perspective?
Can there be an unsigned zero, infinity, or NaN value, or are all floating-point values signed?
Conversely, can there be a signed zero, infinity, or NaN value?
Can there be a negative zero and an unsigned infinity, or is the "signedness requirement" all-or-none?
Is negative zero negative? What about infinity and NaN?
Is negative NaN
distinct from positive NaN
?
Are different NaN payloads distinct values?
Can an extended floating-point type have no finite values?
Does the core language need to distinguish between normal/subnormal numbers?
Are there values beyond finite values, infinity, and NaN?
What does it mean for a type to adhere to ISO/IEC 60559
?
Is 0 . 0 positive or negative zero?
Are arithmetic operations required to preserve the sign of zero?
How does template argument equivalence work for floating-point types?
Impact on the standard
Impact on implementations
Wording
[lex]
[basic]
[expr]
[temp]
[support]
[meta]
References
1. Introduction
The core language wording in the C++ standard does not specify what values a floating-point type may represent. There are a few questions that have no obvious answer:
- Do infinity and NaN exist from a core language perspective?
- Can there be an unsigned zero, infinity, or NaN value, or are all floating-point values signed?
- Conversely, can there be a signed zero, infinity, or NaN value?
- Can there be a negative zero and an unsigned infinity, or is the "signedness requirement" all-or-none?
- Are negative zero and positive zero distinct values? That is, do they compare equal, and if so, is there an observable difference (beyond looking at the sign bit) between them?
- Is negative zero negative? That is, when a Preconditions element requires a "non-negative" value, is the behavior undefined when negative zero is provided?
- Similarly, are negative infinity and negative NaN negative?
- Is "negative NaN" distinct from "positive NaN", or are these effectively the same NaN values with different "payloads"?
-
Are different NaN payloads distinct values?
If so, does that imply
isstd :: has_unique_object_representations_v < std :: float32_t > ?true -
Can an extended floating-point type be so imprecise
that it is incapable of representing any number?
That is, could an
type in the style ofinfinity-t be considered a floating-point type?nullptr_t - Does the core language need to distinguish between normal and subnormal numbers?
- Are there any other possible categories of values beyond finite values, infinity, and NaN?
-
What does it mean for a type to
adhere to ISO/IEC 60559
, as mentioned in?std :: numeric_limits :: is_iec559 - Is
positive or negative zero, or is it implementation-defined/unspecified?0 . 0 - Are arithmetic operations required to preserve the sign of zero?
- How does template argument equivalence work for floating-point types?
Bits of information may be found in various parts of the standard,
such as in the concept of "adhering to ISO/IEC 60559",
requirements,
the inheritance of C features such as , etc.
However, some of these questions are so deeply unclear that a core issue
alone wouldn't be sufficient to solve the problem.
The goal of this paper is to answer these questions, not by making any evolutionary changes to the language, but by investigating what the status quo is and turning that into wording.
2. Q&A
In the following subsections, the paper tries to find a good answer to the questions above. These answers are primarily based on the existing wording and on existing implementation practice.
2.1. Do infinity and NaN exist from a core language perspective?
Yes.
[basic.fundamental] mentions infinity.
While NaN is not mentioned explicitly,
implies it.
At the very least, floating-point types may represent
- zero, for zero-initialization to make sense,
negative zero
, since both C and C++ already mention this term many times, and- infinity, qNaN, and sNan, [numeric.limits] to make sense and to have types that adhere to ISO/IEC 60559.
2.2. Can there be an unsigned zero, infinity, or NaN value, or are all floating-point values signed?
There can be fully unsigned floating-point numbers.
The C standard explicitly mentions unsigned infinity and unsigned zero,
and different NaN signs are typically not distinct values.
The C++ standard presumably inherits the C model because all of the
functions are stated to work like C23's .
There is no C++ restriction on .
Furthermore, [numeric.limits.members]
states that is meaningful for all specializations,
so there is presumably no requirement that a floating-point type
or any of its values are signed.
2.3. Conversely, can there be a signed zero, infinity, or NaN value?
Presumably. The C++ standard mentions negative zero and negative infinity in a number of places. NaN with negative sign bit and NaN with positive sign bit are usually not considered distinct values (e.g. ISO/IEC 60559), but the C++ standard has no wording that would prohibit that distinction.
2.4. Can there be a negative zero and an unsigned infinity, or is the "signedness requirement" all-or-none?
It's not all-or-none. There are floating-point types that adhere to ISO/IEC 60559, and these do not have a distinct negative and positive NaN, so it cannot be all-or-none.
2.5. Is negative zero negative? What about infinity and NaN?
No.
The wording uses negative
in the less than zero
sense ([complex.numbers] negative value
to values which are less than zero,
so negative zero and negative NaN
(NaN with negative sign bit) are not negative.
Negative infinity is negative because it compares less than zero.
2.6. Is negative NaN
distinct from positive NaN
?
Presumably. While ISO/IEC 60559 does not consider NaNs with different sign bit to be distinct values, the C++ standard does not prohibit such a model.
2.7. Are different NaN payloads distinct values?
Sometimes.
and
are two NaNs distinguished only by payload.
However, not every implementation treats these distinctly.
For example, when compiling to WASM,
none of the or instructions handle signaling NaNs.
2.8. Can an extended floating-point type have no finite values?
No.
This would make zero-initialization unimplementable,
and would make members such as
worded in a nonsensical way.
2.9. Does the core language need to distinguish between normal/subnormal numbers?
No.
Subnormals either get flushed
to zero,
meaning that they are alternative representations of zero, numerically,
or they are simply finite numbers like any other.
The classification of normal
and subnormal
is an implementation detail of the floating-point format.
2.10. Are there values beyond finite values, infinity, and NaN?
The exotic and surprising part is that floating-point types may represent additional implementation-defined values beyond finite values, infinity, and NaN. This is backed by
- C23 §5.2.5.3.3 [Characteristics of floating types <float.h>] paragraph 8, and
- C23 §7.12 [Mathematics <math.h>] paragraph 12.
These paragraphs support the existence of such additional implementation-defined
classifications.
C++ inherits the behavior of from ,
so it must support the same classifications to be compatible.
macro for
VAX floating-point numbers
may yield (reserved operand),
which raises a reserved operand fault,
i.e. a CPU exception when processed,
similar to integer division by zero.
This is different from signaling NaN in that by default, signaling NaN only raises a floating-point exception but doesn't trap.
While neither the C23 wording nor the C++ wording handles these extra implementation-defined classifications well, they nonetheless exist, and it seems like an unmotivated breaking change to drop support for them.
2.11. What does it mean for a type to adhere to ISO/IEC 60559
?
It just means that the value representation is one of binary16, binary32, binary64, binary128, or some extended or decimal floating-point format specified in ISO/IEC 60559.
When it comes to operations,
the C++ standard specifies no mapping between expressions (e.g. )
and the ISO/IEC 60559 operations (e.g. division
).
In fact, it seems to deliberately deviate from the ISO/IEC 60559 operations
by declaring division by zero to be undefined behavior,
even for floating-point types ([expr.mul]).
is not infinity,
but UB by omission
or UB by wording hole
,
even if adheres to ISO/IEC 60559
.
While it would be desirable to align the C++ operations with ISO/IEC 60559 operations, this would require significant wording effort. There exists no core wording that even attempts at doing so. Note that implementations don't always compute results correctly rounded (with the greatest available precision), while the ISO/IEC 60559 operations are usually correctly rounded. Therefore, aligning the C++ expressions with ISO/IEC 60559 may break most implementations.
2.12. Is 0 . 0 positive or negative zero?
Presumably positive.
The C++ standard does not mandate a specific sign for
2.13. Are arithmetic operations required to preserve the sign of zero?
Presumably no. While the ISO/IEC 60559 multiplication operation preserves the sign bit when multiplying with 1, the C++ standard does not clearly mandate such behavior.
Especially considering that not every type adheres to ISO/IEC 60559,
we should require a specific handling of sign bits, explicitly.
For example, the unary operator should be required to flip the sign bit
even for zero,
otherwise may not be a spelling of negative zero,
as users rely on.
2.14. How does template argument equivalence work for floating-point types?
In practice, it is based on bitwise identical values. [temp.type] states that
Two values are template-argument-equivalent if they are of the same type and
- they are of integral type and their values are the same, or
- they are of floating-point type and their values are identical, or
- […]
The distinction between same
and identical
is not obvious.
It was added during C++20 NB comment resolution by [P1907R1],
after [P1714R1]
(the paper which originally added support for floating-point template parameters)
was rejected.
The original paper used the terminology identical value representations
,
which makes the intent obvious,
unlike the current wording.
GCC, Clang, and MSVC implement the design of the original paper by mangling the bit-casting to an integer and mangling it into the name.
with two different qNaNs payloads results in two distinct
instantiations, despite there only being only one distinct qNaN value,
at least from an ISO/IEC 60559 perspective.
Both GCC and Clang emit the assembly:
_Z1fILf7fc00000EEvv:
ret
_Z1fILf7fc00001EEvv:
ret
The wording should be clarified to match the design of the original paper
and to match what implementations actually do.
Creating a distinct instantiation for every bit pattern is the most useful behavior anyway;
it would be impossible to deliberately wrap a NaN payload of choice
in otherwise, for example.
3. Impact on the standard
The clarify the floating-point specification, a bit of additional wording is required.
This paper only picks the low-hanging fruits
,
so to speak.
In the long run, specifying the handling of NaNs and infinities by C++ expressions,
documenting ISO/IEC 60559 conformance,
and other large changes may be desirable.
However, those would require much greater changes.
4. Impact on implementations
All proposed wording changes document the current behavior of major implementations.
5. Wording
The changes are relative to [N5014].
[lex]
Change [lex.fcon] paragraph 3 as follows:
If the scaled value is not in the range of representable values for its type,
the program is ill-formed.
Otherwise, the value of a
[Example:
is positive zero, and
is negative zero if
has a signed zero ([basic.fundamental], [expr.unary.op]).
— end example]
[basic]
Immediately preceding [basic.fundamental] paragraph 13, insert three new paragraphs:
A floating-point type adheres to ISO/IEC 60559 if its value representation is one of the floating-point formats specified in ISO/IEC 60559 and can represent the sets of floating-point values listed in ISO/IEC 60559. For a floating-point type whose value representation is in a binary format that adheres to ISO/IEC 60559, it is implementation-defined which (if any) representations of NaN are quiet or signaling.
[Note:
Adherence to ISO/IEC 60559 does not imply that operations on floating-point types
behave exactly as specified in that standard.
For example, the behavior of addition in C++ ([expr.add])
can differ from the addition
operation in ISO/IEC 60559.
— end note]
A floating-point type shall at least represent a subset of rational numbers. Depending on the implementation-defined value representation for the type, it may additionally represent the non-finite values
- infinity,
- quiet
Not a Number
value, - signaling
Not a Number
value, and - further implementation-defined values.
For any of the above (including zero), a floating-point type may represent either a single unsigned value or two distinct values with negative and positive sign.
[Note: A floating-point type which adheres to ISO/IEC 60559 is capable of representing a negative and positive variant of finite values and infinity, an unsigned quiet NaN, and an unsigned signaling NaN. Such a type has multiple object representations for quiet NaN and signaling NaN. — end note]
Immediately preceding [basic.fundamental] paragraph 14, insert a new paragraph:
A value is negative if and only if it compares less than 0 ([expr.rel]).
[Note: Thus, negative zeros and NaNs are not negative values. — end note]
[expr]
At the end of [expr.pre], insert a new paragraph:
Unless otherwise stated, it is unspecified which of the alternative representations for a value is chosen as the result of an expression. Furthermore, if the result is of a floating-point type that can represent negative and positive zero, it is implementation-defined which zero is chosen as the result of the expression.
Change [conv.fpint] paragraph 2 as follows:
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. The result is exact if possible. If the value being converted is zero, the result is positive or unsigned zero. If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value.
[Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type. — end note]
If the value being converted is outside the range of values that can be represented,
the behavior is undefined.
If the source type is , the value false is converted to zero and
the value is converted to one.
Change [expr.unary.op] paragraph 8 as follows:
The operand of the unary operator shall be a prvalue of
arithmetic or unscoped enumeration type
and the result is the negative of its operand.
Integral promotion is performed on integral or enumeration operands.
The negative of an unsigned quantity is computed by subtracting its value from
,
where is the number of bits in the promoted operand.
For floating-point types that may represent negative and positive zero,
the unary operator results in the zero with opposite sign.
The type of the result is the type of the promoted operand.
Change [expr.spaceship] paragraph 4 as follows:
If both operands have arithmetic types, or one operand has integral type and the other operand has unscoped enumeration type, the usual arithmetic conversions are applied to the operands. Then:
- If a narrowing conversion ([cl.init.list]) is required, other than from an integral type to a floating-point type, the program is ill-formed.
- Otherwise, if the operands have integral type, […].
-
Otherwise, the operands have floating-point type,
and the result is of type
. The expressionstd :: partial_ordering yieldsa <=> b if a is less than b,std :: partial_ordering :: less if a is greater than b,std :: partial_ordering :: greater if a is equivalent to b, andstd :: partial_ordering :: equivalent otherwise. Positive zeros are equivalent to negative zeros.std :: partial_ordering :: unordered
Change [expr.rel] paragraph 6 as follows:
If both operands (after conversions) are of arithmetic or enumeration type,
each of the operators shall yield
if the specified relationship is and
if it is .
Positive zeros compare equal to negative zeros.
Change [expr.eq] paragraph 8 as follows:
If both operands are of arithmetic or enumeration type,
the usual arithmetic conversions ([expr.arith.conv]) are performed on both operands;
each of the operators shall yield
if the specified relationship is and
if it is .
Positive zeros compare equal to negative zeros.
[temp]
Immediately preceding [temp.type] paragraph 2, insert a new paragraph:
Two values and of a type
are bitwise identical if
([bit.cast])
is ,
where is a hypothetical unsigned integer type with the same size as and
the same amount and positioning of padding bits in its object representation as .
[Note:
It is possible that two values are the same but not bitwise identical.
For example, a floating-point type can have multiple representations
of zero,
of any other finite number,
of the same NaN value,
and more,
even if the object representation of has no padding bits.
— end note]
bitwise identical
.
Change [temp.type] paragraph 2 as follows:
Two values are template-argument-equivalent if they are of the same type and
- they are of integral type and their values are the same, or
- they are of floating-point type and their values are bitwise identical, or
- they are of type
, orstd :: nullptr_t - […]
While it would be possible to define bitwise identical
without the use of ,
the use of makes the intent obvious at first glance.
The use of unsigned
may seem unnecessary,
but it makes the wording more obvious,
and there exist integer types such as which don't behave
like the hypothetical type we want here.
[support]
Change [numeric.limits.members] as follows:
[…]
if the type has a representation for a signaling Not a Number
.
Meaningful for all floating-point types.
Shall be for all specializations in which
is .
for ,
despite there not existing a signaling NaN.
Implementations want to signal for a type,
even when none of the representations are signaling.
Doing so seemingly complies with ISO/IEC 60559 because it is not mandated that at least one signaling NaN representation exists. Even the specification of the setPayloadSignaling operation does not clearly prohibit an implementation that has no permissible payloads.
On the other hand, ISO/IEC 60559 defines which of the decimal floating-point NaNs are quiet and signaling.
Change [numeric.limits.members] as follows:
Representation of a signaling Not a Number
, if available.
Meaningful for all specializations for which .
Required in specializations for which is .
Change [cmp.alg] paragraph 2 as follows:
The name weak_order denotes a customization point object ([customization.point.object]).
Given subexpressions and ,
the expression is expression-equivalent ([defns.expression.equivalent])
to the following:
- […]
-
Otherwise, if the decayed type
ofT is a floating-point type, yields a value of typeE that is consistent with the ordering observed byweak_ordering 's comparison operators andT , and ifstrong_order isnumeric_limits < T > :: is_iec559 , is additionally consistent with the following equivalence classes, ordered from lesser to greater:true - together, all
negativeNaN values with negative sign bit; - negative infinity;
- each normal negative value;
- each subnormal negative value;
- together, both zero values;
- each subnormal positive value;
- each normal positive value;
- positive infinity;
- together, all
positiveNaN values with positive sign bit.
- together, all
- […]
negative NaN
or positive NaN
,
so the existing wording makes no sense,
especially not with the new definition of negative
.
[meta]
Change [meta.unary.prop] paragraph 10 as follows:
The predicate condition for a template specialization
shall be satisfied if and only if
is trivially copyable, andT - any two objects of type T with the same value have the same object representation, where
- two objects of array or non-union class type […]
- two objects of union type […]
The set of scalar types ([basic.fundamental]) for which this condition holds is implementation-defined.
[Example: The condition does not hold for floating-point types that adhere to ISO/IEC 60559 because there are multiple representations of the same NaN value. The condition also does not hold for any type whose object representation contains padding bits. — end example]