Document number: LEWG, EWG, SG14, SG6: P0037R1
Date: 2016-02-11
Project: Programming Language C++, Library Evolution WG, SG14
Reply-to: John McFarlane, fixed-point@john.mcfarlane.name
This proposal introduces a system for performing binary fixed-point arithmetic using integral types.
Floating-point types are an exceedingly versatile and widely supported method of expressing real numbers on modern architectures.
However, there are certain situations where fixed-point arithmetic is preferable. Some systems lack native floating-point registers and must emulate them in software. Many others are capable of performing some or all operations more efficiently using integer arithmetic. Certain applications can suffer from the variability in precision which comes from a dynamic radix point [1]. In situations where a variable exponent is not desired, it takes valuable space away from the significand and reduces precision.
Integer types provide the basis for an efficient representation of binary fixed-point real numbers. However, laborious, error-prone steps are required to normalize the results of certain operations and to convert to and from fixed-point types.
A set of tools for defining and manipulating fixed-point types is proposed. These tools are designed to make work easier for those who traditionally use integers to perform low-level, high-performance fixed-point computation. They are composable such that a wide range of trade-offs between speed, accuracy and safety are supported.
This proposal is a pure library extension. It does not require
changes to any standard classes or functions.
It adds two new class templates, resize<class, int>
and resize_t<class, int>
,
to existing header file, <type_traits>
.
It adds several new class and function templates
to new header file, <fixed_point>
.
The design is driven by the following aims in roughly descending order:
More generally, the aim of this proposal is to contain within a single API all the tools necessary to perform binary fixed-point arithmetic. The design facilitates a wide range of competing compile-time strategies for avoiding overflow and precision loss, but implements only the simplest by default. Similarly, orthogonal concerns such as run-time overflow detection and rounding modes are deferred to the underlying integer types used as storage.
Fixed-point numbers are specializations of
template <class ReprType, int Exponent>
class fixed_point;
where the template parameters are described as follows.
ReprType
Type Template ParameterThis parameter identifies the capacity and signedness of the
underlying type used to represent the value. In other words, the size
of the resulting type will be sizeof(ReprType)
and it will be
signed iff is_signed<ReprType>::value
is true. The default is
int
.
ReprType
may be a fundamental integral type or similar type with bit-shift and arithmetic operators.
The most suitable types are: std::int8_t
, std::uint8_t
,
std::int16_t
, std::uint16_t
, std::int32_t
and std::uint32_t
.
In limited situations, std::int64_t
and std::uint64_t
can be used.
The reasons for these limitations relate to the difficulty in finding
a type that is suitable for performing lossless integer
multiplication.
Exponent
Non-Type Template ParameterThe exponent of a fixed-point type is the equivalent of the exponent
field in a floating-point type and shifts the stored value by the
requisite number of bits necessary to produce the desired range. The
default value of Exponent
is zero, giving fixed_point<T>
the same
range as T
.
The resolution of a specialization of fixed_point
is
pow(2, Exponent)
and the minimum and maximum values are
std::numeric_limits<ReprType>::min() * pow(2, Exponent)
and
std::numeric_limits<ReprType>::max() * pow(2, Exponent)
respectively.
Any usage that results in values of Exponent
which lie outside the
range, (INT_MIN / 2
, INT_MAX / 2
), may result in undefined
behavior and/or overflow or underflow. This range of exponent values
is far in excess of the largest built-in floting-point type and should
be adequate for all intents and purposes.
make_fixed
and make_ufixed
Helper TypeThe Exponent
template parameter is versatile and concise. It is an
intuitive scale to use when considering the full range of positive and
negative exponents a fixed-point type might possess. It also
corresponds to the exponent field of built-in floating-point types.
However, most fixed-point formats can be described more intuitively by the cardinal number of integer and/or fractional digits they contain. Most users will prefer to distinguish fixed-point types using these parameters.
For this reason, two aliases are defined in the style of
make_signed
.
These aliases are declared as:
template <unsigned IntegerDigits, unsigned FractionalDigits = 0, class Archetype = signed>
using make_fixed;
and
template <unsigned IntegerDigits, unsigned FractionalDigits = 0, class Archetype = unsigned>
using make_ufixed;
They resolve to a fixed_point
specialization with the given
signedness and number of integer and fractional digits. They may
contain additional integer and fractional digits.
For example, one could define and initialize an 8-bit, unsigned, fixed-point variable with four integer digits and four fractional digits:
make_ufixed<4, 4> value { 15.9375 };
or a 32-bit, signed, fixed-point number with two integer digits and 29 fractional digits:
make_fixed<2, 29> value { 3.141592653 };
Type parameter, Archetype
, is provided in the case that a
fixed_point
specialization is desired which has as the ReprType
parameter some type other than a built-in integral. The signedness of
Archetype
corresponds to the signedness of the resultant
fixed_point
specialization although the size does not.
Fixed-point numbers can be explicitly converted to and from arithmetic types.
While effort is made to ensure that significant digits are not lost
during conversion, no effort is made to avoid rounding errors.
Whatever would happen when converting to and from ReprType
largely
applies to fixed_point
objects also. For example:
make_ufixed<4, 4>(.006) == make_ufixed<4, 4>(0)
...equates to true
and is considered an acceptable rounding error.
Any operators that might be applied to integer types can also be applied to fixed-point types. A guiding principle of operator overloads is that they perform as little run-time computation as is practically possible.
With the exception of shift and comparison operators, binary operators can take any combination of:
is_arithmetic
is true.Where the inputs are not identical fixed-point types, a simple set of promotion-like rules are applied to determine the return type:
Some examples:
make_ufixed<5, 3>{8} + make_ufixed<4, 4>{3} == make_ufixed<5, 3>{11};
make_ufixed<5, 3>{8} + 3 == make_ufixed<5, 3>{11};
make_ufixed<5, 3>{8} + float{3} == float{11};
The reasoning behind this choice is a combination of predictability and performance. It is explained for each rule as follows:
Shift operator overloads require an integer type as the right-hand parameter and return a type which is adjusted to accommodate the new value without risk of overflow or underflow.
Comparison operators convert the inputs to a common result type
following the rules above before performing a comparison and returning
true
or false
.
Because arithmetic operators return a result of equal capacity to their inputs, they carry a risk of overflow. For instance,
make_fixed<4, 3>(15) + make_fixed<4, 3>(1)
causes overflow because because a type with 4 integer bits cannot store a value of 16.
The result of overflow of any bits in a fixed-point value depends
entirely on how ReprType
handles overflow. Thus, for built-in
signed types, the result is undefined and for built-in unsigned types,
the value wraps around.
The other typical cause of lost bits is underflow where, for example,
make_fixed<7, 0>(15) / make_fixed<7, 0>(2)
results in a value of 7. This results in loss of precision but is generally considered acceptable.
However, when all bits are lost due to underflow, the value is said
to be flushed. As with overflow, the result of a flush is the same for
a fixed-point type as it is for its underlying ReprType
. In the case
of built-in integral types, the value becomes zero.
Errors resulting from overflow and flushes are two of the biggest headaches related to fixed-point arithmetic. Integers suffer the same kinds of errors but are somewhat easier to reason about as they lack fractional digits. Floating-point numbers are largely shielded from these errors by their variable exponent and implicit bit.
Four strategies for avoiding overflow in fixed-point types are presented:
ReprType
which behaves differently from built-in integral types;For arithmetic operators, choice 1) is taken because it most closely follows the behavior of integer types. Thus it should cause the least surprise to the fewest users. This makes it far easier to reason about in code where functions are written with a particular type in mind. It also requires the least computation in most cases.
Choice 2) is beyond the scope of this proposal
and is covered in more detail in the section, Alternative Types for ReprType
.
Choices 3) and 4) are reasonably robust to overflow events.
However, they represent different trade-offs and neither one is the best fit in all situations.
Notably, where any instance of c = a + b
is replaced with a += b
, results may change in surprising ways.
For these reasons, they are presented as named functions.
Function template, promote
, borrows a term from the language
feature which avoids integer overflow prior to certain operations. It
takes a fixed_point
object and returns the same value represented
by a larger fixed_point
specialization.
For example,
promote(make_fixed<5, 2>(15.5))
is equivalent to
make_fixed<11, 4>(15.5)
Complimentary function template, demote
, reverses the process,
returning a value of a smaller type.
The following named function templates can be used as a hassle-free alternative to arithmetic operators in situations where the aim is to avoid overflow.
Unary functions:
trunc_reciprocal, trunc_square, trunc_sqrt,
promote_reciprocal, promote_square
Binary functions:
trunc_add, trunc_subtract, trunc_multiply, trunc_divide
trunc_shift_left, trunc_shift_right,
promote_add, promote_sub, promote_multiply, promote_divide
Some notes:
trunc_
functions return the result as a type no larger than
the inputs and with an exponent adjusted to avoid overflow;promote_
functions return the result as a type large enough
to avoid overflow and underflow;_multiply
and _square
functions are not guaranteed to be
available for 64-bit types;_multiply
and _square
functions produce undefined behavior
when all input parameters are the most negative number;_square
functions return an unsigned type;_add
, _subtract
, _multiply
and _divide
functions take
heterogeneous fixed_point
specializations;_divide
and _reciprocal
functions in no way guard against
divide-by-zero errors;trunc_shift_
functions return results of the same type as
their first input parameter but with an adjusted Exponent
value andReprType
Using built-in integral types as the default underlying representation minimizes certain costs:
However, this choice also brings with it many of the deficiencies of built-in types. For example:
long long int
.The effort involved in addressing these deficiencies is non-trivial and on-going (for example [2]). As solutions are made available, it should become easier to define custom integral types which address concerns surrounding robustness and correctness. Such types deserve their place in the standard library.
integer
A composable system of integer types that is suitable for use with fixed_point
might take the following form:
// size may be rounded up in some cases;
template <int NumBytes, bool IsSigned = true>
class sized_integer;
// may take built-in or sized_integer as Repr parameter
template <class Repr = int, rounding Rounding = rounding::towards_odd>
class rounding_integer;
// may take built-in, sized_integer or rounding_integer as Repr parameter
template <class Repr = int, overflow Overflow = overflow::exception>
class overflow_integer;
// a 'kitchen sink' custom integer type
template <int NumBytes, bool IsSigned = true, rounding Rounding = rounding::towards_odd, overflow Overflow = overflow::exception>
using integer =
overflow_integer<
rounding_integer<
sized_integer<NumBytes, IsSigned>,
Rounding>,
Overflow>;
Any of these types might be used to compose fixed_point
specializations
without paying (in compile-time complexity) for features that are not used.
While the issues related to integer types affect the fixed-point types they support, they are not specific to fixed-point. It would not only be premature - but inappropriate - to attempt to address rounding and error handling at the level of a fixed-point type.
For a type to be suitable as parameter, ReprType
, of fixed_point
,
it must meet the following requirements:
is_signed
or is_unsigned
make_signed
and make_unsigned
resize
resize
and resize_t
Helper TypesAny type used as ReprType
(including built-in integers) must define a specialization of resize
:
template <class Archetype, int NumBytes>
struct resize;
An alias to resize
is resize_t
which can be defined as:
template <class Archetype, int NumBytes>
using resize_t = typename resize<Archetype, NumBytes>::type;
The resultant type is of the same signedness as Archetype
and has a capacity of no less than NumBytes
bytes
such that for any Integer, T
and positive integer value, 'N':
is_signed_v<resize_t<T, N>> == is_signed_v<T>
and
sizeof(resize_t<T, N>) >= N
are both true.
For example, the type of resize_t<unsigned, 1>
is uint8_t
and the type of resize_t<int8_t, 8>
is int64_t
.
The following example calculates the magnitude of a 3-dimensional vector.
template <class Fp>
constexpr auto magnitude(const Fp & x, const Fp & y, const Fp & z)
-> decltype(trunc_sqrt(trunc_add(trunc_square(x), trunc_square(y), trunc_square(z))))
{
return trunc_sqrt(trunc_add(trunc_square(x), trunc_square(y), trunc_square(z)));
}
Calling the above function as follows
static_cast<double>(magnitude(
make_ufixed<4, 12>(1),
make_ufixed<4, 12>(4),
make_ufixed<4, 12>(9)));
returns the value, 9.890625.
namespace std {
template <class ReprType, int Exponent> class fixed_point;
template <unsigned IntegerDigits, unsigned FractionalDigits = 0, class Archetype = signed>
using make_fixed;
template <unsigned IntegerDigits, unsigned FractionalDigits = 0, class Archetype = unsigned>
using make_ufixed;
template <class ReprType, int IntegerDigits>
using make_fixed_from_repr;
template <class ReprType, int Exponent, int NumBytes>
struct resize<fixed_point<ReprType, Exponent>, NumBytes>;
template <class FixedPoint>
using promote_result;
template <class FixedPoint>
promote_result<FixedPoint>
constexpr promote(const FixedPoint & from) noexcept;
template <class FixedPoint>
using demote_result;
template <class FixedPoint>
demote_result<FixedPoint>
constexpr demote(const FixedPoint & from) noexcept;
template <class ReprType, int Exponent>
constexpr bool operator==(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr bool operator!=(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr bool operator<(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr bool operator>(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr bool operator>=(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr bool operator<=(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr fixed_point<ReprType, Exponent> operator-(
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr fixed_point<ReprType, Exponent> operator+(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr fixed_point<ReprType, Exponent> operator-(
const fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
fixed_point<ReprType, Exponent> & operator+=(
fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
fixed_point<ReprType, Exponent> & operator-=(
fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
fixed_point<ReprType, Exponent> & operator*=(
fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class ReprType, int Exponent>
fixed_point<ReprType, Exponent> & operator/=(
fixed_point<ReprType, Exponent> & lhs,
const fixed_point<ReprType, Exponent> & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator==(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator!=(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator<(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator>(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator>=(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator<=(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator+(
const Lhs & lhs,
const Rhs & rhs) noexcept;
template <class Lhs, class Rhs>
constexpr auto operator-(
const Lhs & lhs,
const Rhs & rhs) noexcept;
template <class LhsReprType, int LhsExponent, class RhsReprType, int RhsExponent>
constexpr auto operator*(
const fixed_point<LhsReprType, LhsExponent> & lhs,
const fixed_point<RhsReprType, RhsExponent> & rhs) noexcept;
template <class LhsReprType, int LhsExponent, class RhsReprType, int RhsExponent>
constexpr auto operator/(
const fixed_point<LhsReprType, LhsExponent> & lhs,
const fixed_point<RhsReprType, RhsExponent> & rhs) noexcept;
template <class LhsReprType, int LhsExponent, class Integer>
constexpr auto operator*(
const fixed_point<LhsReprType, LhsExponent> & lhs,
const Integer & rhs) noexcept;
template <class LhsReprType, int LhsExponent, class Integer>
constexpr auto operator/(
const fixed_point<LhsReprType, LhsExponent> & lhs,
const Integer & rhs) noexcept;
template <class Integer, class RhsReprType, int RhsExponent>
constexpr auto operator*(
const Integer & lhs,
const fixed_point<RhsReprType, RhsExponent> & rhs) noexcept;
template <class Integer, class RhsReprType, int RhsExponent>
constexpr auto operator/(
const Integer & lhs,
const fixed_point<RhsReprType, RhsExponent> & rhs) noexcept;
template <class LhsReprType, int LhsExponent, class Float>
constexpr auto operator*(
const fixed_point<LhsReprType, LhsExponent> & lhs,
const Float & rhs) noexcept;
template <class LhsReprType, int LhsExponent, class Float>
constexpr auto operator/(
const fixed_point<LhsReprType, LhsExponent> & lhs,
const Float & rhs) noexcept;
template <class Float, class RhsReprType, int RhsExponent>
constexpr auto operator*(
const Float & lhs,
const fixed_point<RhsReprType, RhsExponent> & rhs) noexcept;
template <class Float, class RhsReprType, int RhsExponent>
constexpr auto operator/(
const Float & lhs,
const fixed_point<RhsReprType, RhsExponent> & rhs) noexcept;
template <class LhsReprType, int Exponent, class Rhs>
fixed_point<LhsReprType, Exponent> & operator+=(fixed_point<LhsReprType, Exponent> & lhs, const Rhs & rhs) noexcept;
template <class LhsReprType, int Exponent, class Rhs>
fixed_point<LhsReprType, Exponent> & operator-=(fixed_point<LhsReprType, Exponent> & lhs, const Rhs & rhs) noexcept;
template <class LhsReprType, int Exponent>
template <class Rhs, typename std::enable_if<std::is_arithmetic<Rhs>::value, int>::type Dummy>
fixed_point<LhsReprType, Exponent> &
fixed_point<LhsReprType, Exponent>::operator*=(const Rhs & rhs) noexcept;
template <class LhsReprType, int Exponent>
template <class Rhs, typename std::enable_if<std::is_arithmetic<Rhs>::value, int>::type Dummy>
fixed_point<LhsReprType, Exponent> &
fixed_point<LhsReprType, Exponent>::operator/=(const Rhs & rhs) noexcept;
template <class ReprType, int Exponent>
constexpr fixed_point<ReprType, Exponent>
sqrt(const fixed_point<ReprType, Exponent> & x) noexcept;
template <class FixedPoint, unsigned N = 2>
using trunc_add_result;
template <class FixedPoint, class ... Tail>
trunc_add_result<FixedPoint, sizeof...(Tail) + 1>
constexpr trunc_add(const FixedPoint & addend1, const Tail & ... addend_tail);
template <class Lhs, class Rhs = Lhs>
using trunc_subtract_result;
template <class Lhs, class Rhs>
trunc_subtract_result<Lhs, Rhs>
constexpr trunc_subtract(const Lhs & minuend, const Rhs & subtrahend);
template <class Lhs, class Rhs = Lhs>
using trunc_multiply_result;
template <class Lhs, class Rhs>
trunc_multiply_result<Lhs, Rhs>
constexpr trunc_multiply(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class FixedPointDividend, class FixedPointDivisor = FixedPointDividend>
using trunc_divide_result;
template <class FixedPointDividend, class FixedPointDivisor>
trunc_divide_result<FixedPointDividend, FixedPointDivisor>
constexpr trunc_divide(const FixedPointDividend & lhs, const FixedPointDivisor & rhs) noexcept;
template <class FixedPoint>
using trunc_reciprocal_result;
template <class FixedPoint>
trunc_reciprocal_result<FixedPoint>
constexpr trunc_reciprocal(const FixedPoint & fixed_point) noexcept;
template <class FixedPoint>
using trunc_square_result;
template <class FixedPoint>
trunc_square_result<FixedPoint>
constexpr trunc_square(const FixedPoint & root) noexcept;
template <class FixedPoint>
using trunc_sqrt_result;
template <class FixedPoint>
trunc_sqrt_result<FixedPoint>
constexpr trunc_sqrt(const FixedPoint & square) noexcept;
template <int Integer, class ReprType, int Exponent>
constexpr fixed_point<ReprType, Exponent + Integer>
trunc_shift_left(const fixed_point<ReprType, Exponent> & fp) noexcept;
template <int Integer, class ReprType, int Exponent>
constexpr fixed_point<ReprType, Exponent - Integer>
trunc_shift_right(const fixed_point<ReprType, Exponent> & fp) noexcept;
template <class FixedPoint, unsigned N = 2>
using promote_add_result;
template <class FixedPoint, class ... Tail>
promote_add_result<FixedPoint, sizeof...(Tail) + 1>
constexpr promote_add(const FixedPoint & addend1, const Tail & ... addend_tail);
template <class Lhs, class Rhs = Lhs>
using promote_subtract_result
template <class Lhs, class Rhs>
promote_subtract_result<Lhs, Rhs>
constexpr promote_subtract(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs = Lhs>
using promote_multiply_result;
template <class Lhs, class Rhs>
promote_multiply_result<Lhs, Rhs>
constexpr promote_multiply(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class Lhs, class Rhs = Lhs>
using promote_divide_result;
template <class Lhs, class Rhs>
promote_divide_result<Lhs, Rhs>
constexpr promote_divide(const Lhs & lhs, const Rhs & rhs) noexcept;
template <class FixedPoint>
using promote_square_result;
template <class FixedPoint>
promote_square_result<FixedPoint>
constexpr promote_square(const FixedPoint & root) noexcept;
template <class Archetype, int NumBytes>
struct resize;
template <class Archetype, int NumBytes>
using resize_t;
}
fixed_point<>
Class Templatetemplate <class ReprType = int, int Exponent = 0>
class fixed_point
{
public:
using repr_type = ReprType;
constexpr static int exponent;
constexpr static int digits;
constexpr static int integer_digits;
constexpr static int fractional_digits;
fixed_point() noexcept;
template <class S, typename std::enable_if<_impl::is_integral<S>::value, int>::type Dummy = 0>
explicit constexpr fixed_point(S s) noexcept;
template <class S, typename std::enable_if<std::is_floating_point<S>::value, int>::type Dummy = 0>
explicit constexpr fixed_point(S s) noexcept;
template <class FromReprType, int FromExponent>
explicit constexpr fixed_point(const fixed_point<FromReprType, FromExponent> & rhs) noexcept;
template <class S, typename std::enable_if<_impl::is_integral<S>::value, int>::type Dummy = 0>
fixed_point & operator=(S s) noexcept;
template <class S, typename std::enable_if<std::is_floating_point<S>::value, int>::type Dummy = 0>
fixed_point & operator=(S s) noexcept;
template <class FromReprType, int FromExponent>
fixed_point & operator=(const fixed_point<FromReprType, FromExponent> & rhs) noexcept;
template <class S, typename std::enable_if<_impl::is_integral<S>::value, int>::type Dummy = 0>
explicit constexpr operator S() const noexcept;
template <class S, typename std::enable_if<std::is_floating_point<S>::value, int>::type Dummy = 0>
explicit constexpr operator S() const noexcept;
explicit constexpr operator bool() const noexcept;
template <class Rhs, typename std::enable_if<std::is_arithmetic<Rhs>::value, int>::type Dummy = 0>
fixed_point &operator*=(const Rhs & rhs) noexcept;
template <class Rhs, typename std::enable_if<std::is_arithmetic<Rhs>::value, int>::type Dummy = 0>
fixed_point & operator/=(const Rhs & rhs) noexcept;
constexpr repr_type data() const noexcept;
static constexpr fixed_point from_data(repr_type repr) noexcept;
};
The following definitions are added to <type_traits>
:
template <class Archetype, int NumBytes>
struct resize;
template <class Archetype, int NumBytes>
using resize_t = typename resize<Archetype, NumBytes>::type;
Because the aim is to provide an alternative to existing arithmetic types which are supported by the standard library, it is conceivable that a future proposal might specialize existing class templates and overload existing functions.
Possible candidates for overloading include the functions defined in
<cmath> and a templated specialization of numeric_limits
. A new type
trait, is_fixed_point
, would also be useful.
While fixed_point
is intended to provide drop-in replacements to
existing built-ins, it may be preferable to deviate slightly from the
behavior of certain standard functions. For example, overloads of
functions from <cmath> will be considerably less concise, efficient
and versatile if they obey rules surrounding error cases. In
particular, the guarantee of setting errno
in the case of an error
prevents a function from being defined as pure. This highlights a
wider issue surrounding the adoption of the functional approach and
compile-time computation that is beyond the scope of this document.
The bounded::integer library [3] exemplifies the benefits of keeping track of ranges of values in arithmetic types at compile time.
To a limited extent, the trunc_
functions defined here also keep
track of - and modify - the limits of values. However, a combination
of techniques is capable of producing superior results.
For instance, consider the following expression:
make_ufixed<2, 6> three(3);
auto n = trunc_square(trunc_square(three));
The type of n
is make_ufixed<8, 0>
but its value does not
exceed 81. Hence, an integer bit has been wasted. It may be possible
to track more accurate limits in the same manner as the
bounded::integer library in order to improve the precision of types
returned by trunc_
functions. For this reason, the exact value of
the exponents of these return types is not given.
Notes:
A notable feature of the fp library [4]
is the creation of an alias for integral_constant
which can be applied to the right-hand side of bit-shift operations.
The type returned from this operation has identical bit-wise value to the left-hand input
but with Exponent
value adjusted by the amount of the right-hand side.
It is essentially the same as the trunc_shift_
functions
and means that when shifting by literal values what looks looks like run-time operation
is a compile-time calculation which guarantees no overflow or underflow.
When devising a strategy for mitigating the risk of overflow during arithmetic operations,
the number of integer and fractional bits stored in the result is an important choice.
The fixed_point
type picks one of the simpler options by default,
but it is by no means the only viable one.
The fp library [4] returns a type
whose size matches the inputs but whose exponent is shifted to preserve high bits.
The arithmetic types proposed in P0106
[9]
increase capacity to ensure that precision is preserved.
(These two approaches share common aims with
the trunc_
and promote_
named arithmetic functions proposed earlier.)
Even greater control of the required capacity of a fixed-point type can be afforded by
systems such as the bounded::integer library [3].
A common requirement among these approaches is the ability to specify the return type of arithmetic operations. For this reason, named-function arithmetic operators which are more expressive than those proposed so far may be necessary.
These functions would specify a return type as a template parameter. For example:
template <class Result, class Lhs, class Rhs>
constexpr Result fixed_point_multiply(const Lhs & lhs, const Rhs & rhs);
Many examples of fixed-point support in C and C++ exist. While almost all of them aim for low run-time cost and expressive alternatives to raw integer manipulation, they vary greatly in detail and in terms of their interface.
One especially interesting dichotomy is between solutions which offer a discrete selection of fixed-point types and libraries which contain a continuous range of exponents through type parameterization.
One example of the former is found in proposal N1169 [6], the intent of which is to expose features found in certain embedded hardware. It introduces a succinct set of language-level fixed-point types and impose constraints on the number of integer or fractional digits each can possess.
As with all examples of discrete-type fixed-point support, the limited choice of exponents is a considerable restriction on the versatility and expressiveness of the API.
Nevertheless, it may be possible to harness performance gains provided by N1169 fixed-point types through explicit template specialization. This is likely to be a valuable proposition to potential users of the library who find themselves targeting platforms which support fixed-point arithmetic at the hardware level.
There are many other C++ libraries available which fall into the latter category of continuous-range fixed-point arithmetic [4] [7] [8]. In particular, an existing library proposal, P0106 [9], aims to achieve very similar goals through similar means and warrants closer comparison than N1169.
P0106 introduces four class templates covering the quadrant of signed versus unsigned and fractional versus integer numeric types. It is intended to replace built-in types in a wide variety of situations and accordingly, is highly compile-time configurable in terms of how rounding and overflow are handled. Parameters to these four class templates include the storage in bits and - for fractional types - the resolution.
The fixed_point
class template could probably - with a few caveats -
be generated using the two fractional types, nonnegative
and
negatable
, replacing the ReprType
parameter with the integer bit
count of ReprType
, specifying fastest
for the rounding mode and
specifying undefined
as the overflow mode.
However, fixed_point
more closely and concisely caters to the needs of
users who already use integer types and simply desire a more concise,
less error-prone form. It more closely follows the four design aims of
the library and - it can be argued - more closely follows the spirit
of the standard in its pursuit of zero-cost abstraction.
Some aspects of the design of the P0106 API which back up these conclusion are that:
trunc_
function templates and are potentially more costly at run-time;The added versatility that the P0106 API provides regarding rounding and overflow handling are of relatively low priority to users who already bear the scars of battles with raw integer types. Nevertheless, providing them as options to be turned on or off at compile time is an ideal way to leave the choice in the hands of the user.
Many high-performance applications - in which fixed-point is of potential value - favor run-time checks during development which are subsequently deactivated in production builds. The P0106 interface is highly conducive to this style of development. The design proposed in this paper aims to achieve similar results by composing fixed-point types from custom integral types.
Subgroup: Guy Davidson, Michael Wong
Contributors: Ed Ainsley, Billy Baker, Lance Dyson, Marco Foco,
Clément Grégoire, Nicolas Guillemot, Matt Kinzelman, Joël Lamotte,
Sean Middleditch, Patrice Roy, Peter Schregle, Ryhor Spivak
This paper revises P0037R0:
fixed_point
to address issues such as rounding and overflow with;resize
and resize_t
to header, type_traits
;make_fixed
and make_ufixed
;trunc_shift_
functions;ReprType
;An in-development implementation of the fixed_point class template and its essential supporting functions and types is available [10].
Items include:
numeric_limits
specialization;sg14::integer
, intended to explore and illustrate the potential of custom ReprType
;constexpr
operations;Despite a focus on usable interface and direct translation from integer-based fixed-point operations, there is an overwhelming expectation that the source code result in minimal instructions and clock cycles. A few preliminary numbers are presented to give a very early idea of how the API might perform.
Some notes:
A few test functions were run, ranging from single arithmetic operations to basic geometric functions, performed against integer, floating-point and fixed-point types for comparison.
Figures were taken from a single CPU, OS and compiler, namely:
Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
Target: x86_64-pc-linux-gnu
Thread model: posix
Fixed inputs were provided to each function, meaning that branch prediction rarely fails. Results may also not represent the full range of inputs.
Details of the test harness used can be found in the source project mentioned in Appendix 1;
Times are in nanoseconds;
Code has not yet been optimized for performance.
Where applicable various combinations of integer, floating-point and fixed-point types were tested with the following identifiers:
uint8_t
, int8_t
, uint16_t
, int16_t
, uint32_t
, int32_t
,
uint64_t
and int64_t
built-in integer types;float
, double
and long double
built-in floating-point types;Plus, minus, multiplication and division were tested in isolation using a number of different numeric types with the following results:
name cpu_time
add(float) 1.78011
add(double) 1.73966
add(long double) 3.46011
add(u4_4) 1.87726
add(s3_4) 1.85051
add(u8_8) 1.85417
add(s7_8) 1.82057
add(u16_16) 1.94194
add(s15_16) 1.93463
add(u32_32) 1.94674
add(s31_32) 1.94446
add(int8_t) 2.14857
add(uint8_t) 2.12571
add(int16_t) 1.9936
add(uint16_t) 1.88229
add(int32_t) 1.82126
add(uint32_t) 1.76
add(int64_t) 1.76
add(uint64_t) 1.83223
sub(float) 1.96617
sub(double) 1.98491
sub(long double) 3.55474
sub(u4_4) 1.77006
sub(s3_4) 1.72983
sub(u8_8) 1.72983
sub(s7_8) 1.72983
sub(u16_16) 1.73966
sub(s15_16) 1.85051
sub(u32_32) 1.88229
sub(s31_32) 1.87063
sub(int8_t) 1.76
sub(uint8_t) 1.74994
sub(int16_t) 1.82126
sub(uint16_t) 1.83794
sub(int32_t) 1.89074
sub(uint32_t) 1.85417
sub(int64_t) 1.83703
sub(uint64_t) 2.04914
mul(float) 1.9376
mul(double) 1.93097
mul(long double) 102.446
mul(u4_4) 2.46583
mul(s3_4) 2.09189
mul(u8_8) 2.08
mul(s7_8) 2.18697
mul(u16_16) 2.12571
mul(s15_16) 2.10789
mul(u32_32) 2.10789
mul(s31_32) 2.10789
mul(int8_t) 1.76
mul(uint8_t) 1.78011
mul(int16_t) 1.8432
mul(uint16_t) 1.76914
mul(int32_t) 1.78011
mul(uint32_t) 2.19086
mul(int64_t) 1.7696
mul(uint64_t) 1.79017
div(float) 5.12
div(double) 7.64343
div(long double) 8.304
div(u4_4) 3.82171
div(s3_4) 3.82171
div(u8_8) 3.84
div(s7_8) 3.8
div(u16_16) 9.152
div(s15_16) 11.232
div(u32_32) 30.8434
div(s31_32) 34
div(int8_t) 3.82171
div(uint8_t) 3.82171
div(int16_t) 3.8
div(uint16_t) 3.82171
div(int32_t) 3.82171
div(uint32_t) 3.81806
div(int64_t) 10.2286
div(uint64_t) 8.304
Among the slowest types are long double
. It is likely that they are
emulated in software. The next slowest operations are fixed-point
multiply and divide operations - especially with 64-bit types. This is
because values need to be promoted temporarily to double-width types.
This is a known fixed-point technique which inevitably experiences
slowdown where a 128-bit type is required on a 64-bit system.
Here is a section of the disassembly of the s15:16 multiply call:
30: mov %r14,%rax
mov %r15,%rax
movslq -0x28(%rbp),%rax
movslq -0x30(%rbp),%rcx
imul %rax,%rcx
shr $0x10,%rcx
mov %ecx,-0x38(%rbp)
mov %r12,%rax
4c: movzbl (%rbx),%eax
cmp $0x1,%eax
↓ jne 68
54: mov 0x8(%rbx),%rax
lea 0x1(%rax),%rcx
mov %rcx,0x8(%rbx)
cmp 0x38(%rbx),%rax
↑ jb 30
The two 32-bit numbers are multiplied together and the result shifted
down - much as it would if raw int
values were used. The efficiency
of this operation varies with the exponent. An exponent of zero should
mean no shift at all.
A fast sqrt
implementation has not yet been tested with
fixed_point
. (The naive implementation takes over 300ns.) For this
reason, a magnitude-squared function is measured, combining multiply
and add operations:
template <class FP>
constexpr FP magnitude_squared(const FP & x, const FP & y, const FP & z)
{
return x * x + y * y + z * z;
}
Only real number formats are tested:
float 2.42606
double 2.08
long double 4.5056
s3_4 2.768
s7_8 2.77577
s15_16 2.752
s31_32 4.10331
Again, the size of the type seems to have the largest impact.
A similar operation includes a comparison and branch:
template <class Real>
bool circle_intersect_generic(Real x1, Real y1, Real r1, Real x2, Real y2, Real r2)
{
auto x_diff = x2 - x1;
auto y_diff = y2 - y1;
auto distance_squared = x_diff * x_diff + y_diff * y_diff;
auto touch_distance = r1 + r2;
auto touch_distance_squared = touch_distance * touch_distance;
return distance_squared <= touch_distance_squared;
}
float 3.46011
double 3.48
long double 6.4
s3_4 3.88
s7_8 4.5312
s15_16 3.82171
s31_32 5.92
Again, fixed-point and native performance are comparable.