Better shifting
- Document number:
- P3793R1
- Date:
2025-11-19 - Audience:
- SG6
- Project:
- ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
- Reply-to:
- Brian Bi <bbi5291@gmail.com>
- Co-authors:
- Jan Schultke <janschultke@gmail.com>
- GitHub Issue:
- wg21.link/P3793/github
- Source:
- github.com/Eisenwave/cpp-proposals/blob/master/src/better-shifting.cow
header
to perform bit shifts on integer operands.
The and functions provide the
following advantages over the built-in shift operators:
- They always produce a mathematically correct result when possible.
- They never have undefined behavior.
- They avoid the confusing precedence of the built-in shift operators.
Contents
Revision history
Changes since R0
Introduction
Motivation — Why are overlong shifts needed?
Design considerations
Proposed behavior for overlong shifts
Wrapping behavior is not useful
Shift amounts which are overlong by more than one should be safe
Logical versus arithmetic shifts
Negative shift amounts
Emulating possible behaviors using std :: shl
Benchmarks of std :: shl variants
Naming
Signatures
SIMD support
Possible implementation
Wording
[version.syn]
[bit]
[simd]
References
1. Revision history
1.1. Changes since R0
-
changed the function signatures to take any signed or unsigned
instead ofS int - changed behavior for negative shift amounts to "shift in the opposite direction"
- added §4.3.1. Emulating possible behaviors using
std :: shl - added §4.3.2. Benchmarks of
variantsstd :: shl - added §5. Possible implementation
- rebased §6. Wording on [N5014]
- fixed various minor typos and wording issues
2. Introduction
C++ has built-in shift operators, and ,
inherited from C with semantics essentially unchanged,
including the following two inconvenient properties:
- The precedence of these operators is less than the precedence of the additive operators, which is counterintuitive because shift operations behave as multiplication and division by a power of 2.
- If the shift amount is greater than or equal to the width of the (promoted) left operand or is negative, the behavior is undefined. In the remainder of this document, we will refer to shifts by a positive amount that is greater than or equal to the bit width of the operand as overlong shifts.
The first property certainly cannot be changed at this point in time. Reflector discussion revealed that GCC relies on the second property when vectorizing multiple shift operations on adjacent memory locations, and changing it might therefore not be "free". Besides that, changing the second property is also likely to be contentious because some committee members prefer to make overlong shifts erroneous behavior instead of defining them to produce the mathematically correct result.
For these reasons, we are instead proposing to solve the problems with shifting in C++
by introducing new Standard Library facilities in the header.
3. Motivation — Why are overlong shifts needed?
We consider the undefined behavior of overlong shifts to be gratuitous. Unlike other arithmetic operators, which produce undefined behavior only when the result is "not mathematically defined or is outside the range of representable values" ([expr.pre] paragraph 4), the built-in shift operators unconditionally produce UB for overlong shifts. This behavior is inherited from C, in which the arithmetic operators were originally designed to do whatever the corresponding hardware instructions would do; processor families differ as to how overlong shifts are handled. However, it is unclear why the behavior of overlong shifts was standardized as being undefined behavior as opposed to producing an unspecified result; perhaps there's some CPU that we (the authors of this paper) don't know about (and that might even be obsolete) on which the behavior could include trapping, halting, or otherwise failing to produce a result.
Besides that, there are practical reasons why the undefined behavior of overlong shifts is inconvenient, particularly when the second operand is equal to the bit width of the first operand.
x num_bits num_bits 32 When is equal to 32, the behavior of is the natural
continuation of the behavior for smaller values of num_bits.
However, to implement this function,
we must guard against :
The first statement in the body of would be unnecessary
if the shift operation produced the mathematically correct result of
when is equal to .
Overlong bit-shifts can also be a problem with right-shifting, although these problems are significantly less common.
to represent a bitset,
where represents an element which is in the set,
and represents an integer that does not.
Reporting an element that has no capacity for
as not being in the set may be exactly the behavior we want, rather than UB,
and makes that easy.
4. Design considerations
4.1. Proposed behavior for overlong shifts
We propose the addition of Standard Library functions
and
that produce mathematically correct results for overlong shifts.
That is, these functions actually just shift the bits by the number of positions specified:
in the case of an overlong shift, that means that all bits that would
be shifted are shifted off the end.
For a logical shift, the result is ;
for an arithmetic right shift,
the result is when the first operand is negative, and otherwise.
4.1.1. Wrapping behavior is not useful
We believe that the "wrapping" behavior
(most famously exhibited by the x86 family of processors)
in which the shift amount is reduced modulo the bit width
of the other operand is not useful,
other than when implementing bit rotations.
Since the functions and
are already provided by the Standard
Library, C++ programmers do not need to implement rotations themselves anymore,
and would not benefit from wrapping behavior for shifts.
4.1.2. Shift amounts which are overlong by more than one should be safe
Although shifts by an amount that are strictly greater than the bit width of the
first operand are not as useful as when the amount is equal to the bit width,
we believe that requiring the implementation to produce the mathematically correct
result in those cases does not impose an additional performance burden.
For example, if is bits wide,
then requiring to produce
does not impose additional overhead
beyond only requiring to
produce .
Using instead of can improve performance
only in the sense that UB enables more optimizations in general
(e.g. assuming that the branch leading to the UB is not taken).
The generated code in Clang 20 is very similar.
Such similarities in generated assembly are observed across many other architectures
because if a single branch or conditional move instruction is
required in order to produce the mathematically correct result for amount ,
then a single one will also be needed merely to account for the case of
(when is disallowed).
4.2. Logical versus arithmetic shifts
The built-in operator performs an arithmetic shift on signed operands:
that is, the sign bit is extended.
One possible design is to provide both arithmetic and logical right shifts,
either as separate functions or as one function with
an additional parameter indicating the choice of arithmetic or logical shift.
However, we believe that this is unnecessary because in cases where the
programmer wishes to always perform a logical shift,
it is customary to employ unsigned types,
possibly by inserting a cast that will be optimized out.
Conversely, the deliberate choice to use a signed type for the left operand of
a right shift indicates intent to perform an arithmetic shift.
The proposed and
functions in [P3643R0] would make such conversions convenient,
even in generic code.
A survey of popular programming languages supports this design direction:
- Languages that provide an explicit choice between a logical shift or an arithmetic shift usually lack unsigned integer types (Fortran, Java, JavaScript, OCaml). C# had unsigned integer types from its initial release, but didn't have the logical right shift operator until version 11.
- Languages that do not provide an explicit choice between a logical shift and an arithmetic shift always perform arithmetic shifts for negative operands (Go, Haskell, Objective-C, Python, Ruby, Rust).
- Perl is possibly an exception to the above, but it is hard to track down the behavior of shifts prior to version 5.
4.3. Negative shift amounts
The behavior of negative shift amounts in some popular programming languages is listed below:
- Shift amount reduced modulo bit width of other operand: C#, Java, JavaScript
- Exception or panic: Go, Haskell, Python
- Shift in other direction: Fortran, Objective-C, Perl, Ruby
- Unspecified or implementation-defined result: OCaml, Rust
The mathematically correct behavior is to "shit in the other direction". That is if we consider left-shifting to be a multiplication with a power of two; a negative shift is a negative exponent, i.e. a division by a power of two, i.e. a right-shift. We propose this "mathematically correct" behavior.
R0 of this paper proposed an implementation-defined result and erroneous behavior. Possible alternatives were discussed in SG6, and neither the proposed direction nor the alternatives sparked confidence. However, we now believe we've identified the right direction for the following reasons:
-
The implementation-defined result in the R0
was perceived as a "half-measure".std :: shl -
Shifting in the opposite direction provides
a clear distinction in behavior from
. It is as safe as costly as possible, whereas<< is as unsafe and cheap as possible.<< -
All other behaviors can be obtained by modifying the call site (§4.3.1. Emulating possible behaviors using
).std :: shl -
The performance cost of shifting in the opposite direction was measured to be acceptable
(§4.3.2. Benchmarks of
variants).std :: shl
4.3.1. Emulating possible behaviors using std :: shl
,
from most safe (top) to least safe (bottom):
4.3.2. Benchmarks of std :: shl variants
To make an educated choice about which behavior would be appropriate for negative
shifts in ,
find a comparison of the following functions below
([QuickBench], [CompilerExplorer]):
and functions
display behavior that was originally proposed (meaningless result for negative shifts).
Some of the implementations rely on behavior that is technically undefined, but "works in practice".
| Function | Assembly ( |
Time |
|---|---|---|
The key takeaways are:
-
Handling of negative shift amounts incurs roughly a 25% performance cost
relative to producing a meaningless result,
and a 60% cost relative to the builtin
operator.<< - Attempts at manual branchless optimization provide little to no value, or perform significantly worse.
is unsigned, and are equivalent.
4.4. Naming
We chose the names and
because they are as short as possible,
while still being familiar to many programmers.
For example, the x86 and ARM instruction sets
have shift instructions named and ,
Pascal has built-in operators with these names,
and Rust uses and as the names of the traits
that must be implemented for types to support the shift operators.
We hope the brevity will encourage adoption of and
as safe alternatives to the built-in operators.
These abbreviations are also consistent
with the existing and functions,
which are conspicuously not named and ,
respectively.
4.5. Signatures
We propose the following signatures:
The rationale for these signature is as follows:
-
We follow the design of
andstd :: rotl in proposing that the shift functions do not perform integer promotion.std :: rotr -
Like
androtl , these functions don't participate in overload resolution unlessrotr is an integer type. UnlikeT androtl , the proposedrotr andshl accept either signed or unsigned integral types for the first operand: forshr , the signedness of the first parameter determines whether an arithmetic or logical shift is performed, and it would be surprising ifshr did not also accept signed types.shl -
Unlike
androtl , the shift amount is not passed viarotr . That is because shifting by is not equivalent to shifting by , but such a large value stored in aint may be turned into zero due to upper bits being discarded in the conversion. For rotations, this is not a problem because for power-of-two wide integers, discarding upper bits has no effect on the result.long long -
The functions are
because they have a wide contract.noexcept
4.6. SIMD support
Following [P2933R4],
almost all functions have a corresponding overload in ,
including and .
The proposed functions should also have overloads,
in the style of and .
That is, overloads that can either shift a by a
of shift amounts,
or by a scalar shift amount which applies to all elements in the .
5. Possible implementation
The scalar and may be implemented as follows:
See also [Cxx26BitPermutations] for a reference implementation and test code.
and may be implemented (naively)
by performing the operation for each element.
A less naive implementation would be based on the variants
shown in §4.3.2. Benchmarks of variants.
in terms of e.g.
6. Wording
The following changes are relative to [N5014].
and functions should immediately precede the
and functions in the wording. Therefore, if this paper
and [P3764R0] are adopted in the same meeting, the and
functions should be inserted between the and
functions.
6.1. [version.syn]
Bump feature-test macros in [version.syn] as follows:
6.2. [bit]
In [bit.syn], change the synopsis as follows:
In [bit], add a new subclause immediately preceding [bit.rotate]:
Shifting [bit.shift]
1
In the following descriptions,
the result of an arithmetic operation is
first represented in a hypothetical integer type with sufficient range,
then converted to as if by .
2
[Note:
The value of in the following descriptions may be negative,
which has the effect of shifting in the opposite direction
.
— end note]
3
Constraints:
Each of and
is a signed or unsigned integer type ([basic.fundamental]).
4 Returns: rounded towards negative infinity.
5
Constraints:
Each of and
is a signed or unsigned integer type ([basic.fundamental]).
6 Returns: rounded towards negative infinity.
[Note:
is equivalent to
if is of signed type,
except that can overflow ([expr.pre], [expr.unary.op]).
— end note]
6.3. [simd]
In [simd.syn], change the synopsis as follows:
In [simd.bit], immediately preceding the first declaration of
, insert the following:
11 Constraints:
- The type
is a signed or unsigned integer type ([basic.fundamental]),V0 :: value_type - the type
is a signed or unsigned integer type,V1 :: value_type isV0 :: size ( ) == V1 :: size ( ) , andtrue issizeof ( typename V0 :: value_type ) == sizeof ( typename V1 :: value_type ) .true
12
Returns:
A object where the element
is initialized to the result of
for all in the range
[, ), where has the same
behavior as the corresponding scalar function from .
13
Constraints:
Each of and
is a signed or unsigned integer type ([basic.fundamental]).
14
Returns:
A object where the element is
initialized to the result of
for all in the range [, ), where
is the corresponding scalar function from
.