Document number |
P2724R0 |
Date |
2022-12-11 |
Reply-to |
Jarrad J. Waterloo <descender76 at gmail dot com>
|
Audience |
Evolution Working Group (EWG) |
constant dangling
Table of contents
Changelog
R0
- The constant content was extracted and merged from the
temporary storage class specifiers
and implicit constant initialization
proposals.
Abstract
This paper proposes the standard adds anonymous global constants to the language with the intention of automatically fixing a shocking type of dangling which occurs when constants or that which should be constants dangle. This is shocking because constant like instances should really have constant-initialization meaning that they should have static storage duration and consequently should not dangle. This trips up beginner code requiring teaching dangling on day one. It is annoying to non beginners. Constants are used as defaults in production code. Constants are also frequently used in test and example code. Further, many instances of dangling used by non C++
language comparisons frequently use constants as examples.
Motivation
There are multiple resolutions to dangling in the C++
language.
- Produce an error
- Fix with block/variable scoping
Fix the range-based for loop, Rev2
Get Fix of Broken Range-based for Loop Finally Done
- Fix by making the instance global
All are valid resolutions and individually are better than the others, given the scenario. This proposal is focused on the third option, which is to fix by making the instance global.
Dangling the stack is shocking because is violates our trust in our compilers and language, since they are primarily responsible for the stack. However, there are three types of dangling that are even more shocking than the rest.
- Returning a direct reference to a local
- partially resolved by
Simpler implicit move
- Immediate dangling
- Dangling Constants
Making an instance global is a legitimate fix to dangling.
C++ Core Guidelines F.43: Never (directly or indirectly) return a pointer or a reference to a local object
Reason To avoid the crashes and data corruption that can result from the use of such a dangling pointer.
…
Note This applies only to non-static local variables. All static variables are (as their name indicates) statically allocated, so that pointers to them cannot dangle.
|
While making an instance global doesn’t fix all dangling in the language, it is the only resolution that can fix all three most shocking types of dangling provided the instance in question is a constant. It is also the best fix for these instances.
Since constexpr
was added to the language in C++11
there has been an increase in the candidates of temporary instances that could be turned into global constants. ROMability was in part the motivation for constexpr
but the requirement was never made. Even if a C++
architecture doesn’t support ROM, it is still required by language to support static storage duration
and const
. Matter of fact, due to the immutable nature of constant-initialized constant expressions, these expressions/instances are constant for the entire program even though they, at present, don’t have static storage duration
, even if just logically. There is a greater need now that more types are getting constexpr constructors. Also types that would normally only be dynamically allocated, such as string and vector, since C++20
, can also be constexpr
. This has opened up the door wide for many more types being constructed at compile time.
Motivating Examples
Before diving into the examples, let’s discuss what exactly is being asked for. There are two features; one implicit and the other explicit.
implicit constant initialization
If a temporary argument is constant-initialized (7.7 Constant expressions [expr.const])
and its argument/instance type is a LiteralType
and its parameter/local/member type is const
and not mutable
then the instance is implicitly created with constant initialization
.
As such it has static storage duration
and can’t dangle.
explicit constant initialization
The constinit
specifier can be applied to temporaries. Applying it asserts that the temporary was const-initialized
, that the argument type is a LiteralType
and its parameter/local/member type is const
and not mutable
. This explicitly gives the temporary static storage duration
.
While implicit constant initialization
automatically fixes dangle, constinit
allows the programmers to manually and explicitly fix some dangling. The former is better for programmers and the language, while the later favors code reviewers or programmers who copy an example and want to have the compiler, momentarily, verify whether it is correct.
So what sorts of dangling does this fix for us. Besides fixing some dangling, this also fixes some inconsistencies between string literals (5.13.5 String literals [lex.string]) and other literal types.
std::string_view sv = "hello world";
std::string_view sv = "hello world"s;
std::string_view sv = constinit "hello world"s;
This is reasonable based on how programmers reason about constants being immutable variables and temporaries which are known at compile time and do not change for the life of the program. This also works with plain old references.
struct X
{
int a, b;
};
const int& get_a(const X& x)
{
return x.a;
}
const int& a = get_a({4, 2});
a;
“Such a feature would also help to … fix several bugs we see in practice:”
“Consider we have a function returning the value of a map element or a default value if no such element exists without copying it:”
const V& findOrDefault(const std::map<K,V>& m, const K& key, const V& defvalue);
“then this results in a classical bug:”
std::map<std::string, std::string> myMap;
const std::string& s = findOrDefault(myMap, key, "none");
Is this really a bug? With this proposal, it isn’t! Here is why. The function findOrDefault
expects a const
string&
for its third parameter. Since C++20
, string’s constructor is constexpr
. It CAN be constructed as a constant expression. Since all the arguments passed to this constexpr
constructor are constant expressions, in this case "none"
, the temporary string
defvalue
IS also constant-initialized
(7.7 Constant expressions [expr.const]). This paper advises that if you have a non mutable
const
that it is constant-initialized
, that the variable or temporary undergoes constant initialization
(6.9.3.2 Static initialization [basic.start.static]). In other words it has implicit static storage duration
. The temporary would actually cease to be a temporary. As such this usage of findOrDefault
CAN’T dangle.
The pain of immediate dangling associated with temporaries are especially felt when working with other anonymous language features of C++
such as lambda functions and coroutines.
Lambda functions
Whenever a lambda function captures a reference to a temporary it immediately dangles before an opportunity is given to call it, unless it is a immediately invoked lambda/function expression.
[&c1 = "hello"s](const std::string& s)
{
return c1 + " "s + s;
}("world"s);
auto lambda = [&c1 = "hello"s](const std::string& s)
{
return c1 + " "s + s;
}
lambda("world"s);
This problem is resolved when the scope of temporaries has static storage duration
instead of the containing expression provided c1
resolves to a const std::string&
since c1
was constant-initialized. The constinit
specifier could ensure this.
Coroutines
Similarly, whenever a coroutine gets constructed with a reference to a temporary it immediately dangles before an opportunity is given for it to be co_await
ed upon.
generator<char> each_char(const std::string& s) {
for (char ch : s) {
co_yield ch;
}
}
int main() {
auto ec = each_char("hello world")
for (char ch : ec) {
std::print(ch);
}
}
This specific immediately dangling example is fixed by implicit constant initialization since the parameter s
expects a const std::string&
and it was constant-initialized.
It should be noted too that the current rules of temporaries discourages the use of temporaries because of the dangling it introduces. However, if the lifetime of temporaries was increased to a reasonable degree than programmers would use temporaries more. This would reduce dangling further because there would be fewer named variables that could be propagated outside of their containing scope. This would also improve code clarity by reducing the number of lines of code allowing any remaining dangling to be more clearly seen.
Proposed Wording
6.7.5.4 Automatic storage duration [basic.stc.auto]
1 Variables that belong to a block or parameter scope and are not explicitly declared static, thread_local, or extern or had not underwent implicit constant initialization (6.9.3.2) have automatic storage duration. The storage for these entities lasts until the block in which they are created exits.
…
6.9.3.2 Static initialization [basic.start.static]
…
2 Constant initialization is performed explicitly if a variable or temporary object with static or thread storage duration is constant-initialized (7.7). Constant initialization is performed implicitly if a non mutable const variable or non mutable const temporary object is constant-initialized (7.7). If constant initialization is not performed, a variable with static storage duration (6.7.5.2) or thread storage duration (6.7.5.3) is zero-initialized (9.4). Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. All static initialization strongly happens before (6.9.2.2) any dynamic initialization.
…
9.2.7 The constinit specifer [dcl.constinit]
1 If the constinit specifer is applied to a temporary, it gives the temporary static storage duration, asserts that the argument is a LiteralType
and asserts that the parameter type is not mutable
and const
otherwise the constinit specifer shall be applied only to a declaration of a variable with static or thread storage duration. If the specifer is applied to any declaration of a variable, it shall be applied to the initializing declaration. No diagnostic is required if no constinit declaration is reachable at the point of the initializing declaration.
…
NOTE: Wording still need to capture that these temporaries are no longer temporaries and that their value category is lvalue
.
In Depth Rationale
There is a general expectation across programming languages that constants or more specifically constant literals are “immutable values which are known at compile time and do not change for the life of the program”. In most programming languages or rather the most widely used programming languages, constants do not dangle. Constants are so simple, so trivial (English wise), that it is shocking to even have to be conscience of dangling. This is shocking to C++
beginners, expert programmers from other programming languages who come over to C++
and at times even shocking to experienced C++
programmers.
There is already significant interest in this type of feature from programmers. Just look at C23
as an example. For instance, the Introduce storage-class specifiers for compound literals
and The 'constexpr' specifier
allows C
programmers to specify static
, constexpr
and thread_local
as storage class specifiers on their compound literals. The compound literals equivalent in C++
is LiteralType
and temporaries. This paper reuses our existing keyword constinit
over static
because of what we all know from the C++ Core Guidelines
.
I.2: Avoid non-const global variables
Reason Non-const global variables hide dependencies and make the dependencies subject to unpredictable changes.
|
I also did not choose constexpr
, though that may be better for greater C
compatibility, since I wrote my proposals before my seeing the C
paper. Also constinit
better matches that which these features are doing in the context of existing C++
terminology of constant initialization. Further, there are differences in what constexpr
means to C++
and C
, at present.
It should also be noted that these concepts are already in the standard just not fully exposed in the language. For instance, strings literals already have static storage duration and attempting to modify one is undefined.
Working Draft, Standard for Programming Language C++
“5.13.5 String literals [lex.string]”
“9 Evaluating a string-literal results in a string literal object with static storage duration (6.7.5). Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecifed.”
“[Note 4: The effect of attempting to modify a string literal object is undefined. — end note]”
|
Further, this behavior happens all the time with evaluations of constant expressions but unfortunately we can’t enjoy all the benefits thereof.
Working Draft, Standard for Programming Language C++
“6.9.3.2 Static initialization [basic.start.static]”
“1 Variables with static storage duration are initialized as a consequence of program initiation. Variables with thread storage duration are initialized as a consequence of thread execution. Within each of these phases of initiation, initialization occurs as follows.”
“2 Constant initialization is performed if a variable or temporary object with static or thread storage duration is constant-initialized (7.7). …”
…
“3 An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically, provided that …”
|
These ROM-able instances do not dangle as globals but from our code perspective they currently look like dangling locals. This causes false positive with our static analyzers and the programmer’s themselves. If we would just admit from a language standpoint that these are indeed constants than not only do we fix some dangling but also our mental model. This same reference also says “constant initialization is performed if a … temporary object with static … storage duration is constant-initialized”. Programmers can’t fully utilize this scenario because at present we can only use static
on class members and locals but not temporary arguments. Since the code identified by this paper is already subject to constant initialization than there is no real chance of these changes causing any breakage.
Value Categories
If some temporaries can be changed to have global scope than how does it affect their value categories? Currently, if the literal is a string than it is a lvalue
and it has global scope. For all the other literals, they tend to be a prvalue
and have statement scope.
|
movable
|
unmovable
|
named
|
xvalue |
lvalue |
unnamed
|
prvalue |
? |
From the programmers perspective, global temporaries are just anonymously named variables. When they are passed as arguments, they have life beyond the life of the function that it is given to. As such the expression is not movable. As such, the desired behavior described throughout the paper is that they are lvalues
which makes sense from a anonymously named standpoint. However, it must be said that technically they are unnamed which places them into the value category that C++
currently does not have; the unmovable unnamed. The point is, this is simple whether it is worded as a lvalue
or an unambiguous new value category that behaves like a lvalue
. Regardless of which, there are some advantages that must be pointed out.
Avoids superfluous moves
The proposed avoids superfluous moves. Copying pointers and lvalue references are cheaper than performing a move which is cheaper than performing any non trivial value copy.
Undo forced naming
The proposed makes using types that delete their rvalue
reference constructor easier to use. For instance, std::reference_wrapper
can’t be created/reassigned with a rvalue
reference, i.e. temporaries. Rather, it must be created/reassigned with a lvalue
reference created on a seperate line. This requires superfluous naming which increases the chances of dangling. Further, according to the C++ Core Guidelines
, it is developers practice to do the following:
- ES.5: Keep scopes small [^cppcges5]
- ES.6: Declare names in for-statement initializers and conditions to limit scope [^cppcges6]
std::reference_wrapper<int> rwi1(5);
int value1 = 5;
std::reference_wrapper<int> rwi2(value1);
if(randomBool())
{
int value2 = 7;
rwi2 = ref(value2);
rwi2 = ref(7);
rwi2 = 7;
}
else
{
int value3 = 9;
rwi2 = ref(value3);
rwi2 = ref(9);
rwi2 = 9;
}
Since the variable value2
and value3
is likely to be created manually at block scope instead of variable scope, it can accidentally introduce more dangling. Constructing and reassigning with a global scoped
lvalue
temporary avoids these common dangling possibilities along with simplifying the code.
There are at least three ways to provide a non dangling globalish constant.
- ROM i.e. hardware
const
and static
i.e. C++
language
- assembly opcode with inline constant i.e. machine code level
While the first two are addressable, the last one isn’t.
In the next three examples, the same assembly is produced regardless of whether the literal 5
was provided via a native literal, a constexpr
or a const
global. The following results were produced in Compiler Explorer using both “x86-64 clang (trunk) -std=c++20 -O3
” and “x86-64 gcc (trunk) -std=c++20 -O3
”.
values that are [logically] global constants
local constant but logically a global constant
int main()
{
return 5;
}
main: # @main
mov eax, 5
ret
constant expression i.e. logically a global constant
constexpr int return5()
{
return 5;
}
int main()
{
return return5();
}
main: # @main
mov eax, 5
ret
an actual global
const int GLOBAL = 5;
int main()
{
return GLOBAL;
}
main: # @main
mov eax, 5
ret
The point is all three are logically non dangling, constant global. Now let’s look at reference examples.
Not only do all three following examples produce the exact same assembly, they also provide the exact same assembly as the previous three examples. They are all essentially global constants from the assembly and programmer standpoint but the current standard says two of the three dangle, unnecessarily.
local constant but logically a global constant
int main()
{
const int& reflocal = 5;
return reflocal;
}
main: # @main
mov eax, 5
ret
int main()
{
const int local = 5;
const int& reflocal = local;
return reflocal;
}
main: # @main
mov eax, 5
ret
constant expression i.e. logically a global constant
constexpr int return5()
{
return 5;
}
int main()
{
const int& reflocal = return5();
return reflocal;
}
main: # @main
mov eax, 5
ret
an actual global
const int GLOBAL = 5;
int main()
{
const int& reflocal = GLOBAL;
return reflocal;
}
main: # @main
mov eax, 5
ret
indirect dangling of caller’s local
Similarly to, the next three examples produce the same assembly in the 3 clang
cases and 2 of the gcc
cases. GCC
would have produced the same result in its 2nd case had it had treated the const
expected evaluation of a constant expression as a global constant as its third case did.
local constant but logically a global constant
const int& potential_dangler(const int& passthrough)
{
return passthrough;
}
int main()
{
const int local = 5;
const int& reflocal = potential_dangler(local);
return reflocal;
}
potential_dangler(int const&): # @potential_dangler(int const&)
mov rax, rdi
ret
main: # @main
mov eax, 5
ret
constant expression i.e. logically a global constant
constexpr int return5()
{
return 5;
}
const int& potential_dangler(const int& passthrough)
{
return passthrough;
}
int main()
{
const int& reflocal = potential_dangler(return5());
return reflocal;
}
x86-64 clang (trunk) -std=c++20 -O3
potential_dangler(int const&): # @potential_dangler(int const&)
mov rax, rdi
ret
main: # @main
mov eax, 5
ret
x86-64 gcc (trunk) -std=c++20 -O3
NOTE: Can’t really say what GCC is doing with the xor
. However, if GCC had treated the resolved constant expression which is const required as a const global as in the next example than the results would have been the same.
potential_dangler(int const&):
mov rax, rdi
ret
main:
xor eax, eax
ret
an actual global
const int GLOBAL = 5;
const int& potential_dangler(const int& passthrough)
{
return passthrough;
}
int main()
{
const int& reflocal = potential_dangler(GLOBAL);
return reflocal;
}
potential_dangler(int const&): # @potential_dangler(int const&)
mov rax, rdi
ret
main: # @main
mov eax, 5
ret
In all these logically global constant cases, no instance was actually stored global but was perfectly inlined as an assembly opcode constant. So, the worst case performance of this proposal would be a single upfront load time cost. Contrast that with the current potential local constant cost of constantly creating and destroying instances, even multiple times concurrently in different threads. Even the proposed cost can go from 1 to 0 while the current non global local could result in superfluous dynamic allocations since std::string
and std::vector
are now constexpr
.
Microsoft’s compiler and existing dangling detection
Things really get interesting when we factor Microsoft’s compiler into the equation and contrast its dangling detection between optimized configurations.
x64 msvc v19.latest
indirect dangling of caller’s local
temporary constant but logically a global constant
|
const int& potential_dangler(const int& passthrough)
{
return passthrough;
}
int main()
{
const int& reftemp = potential_dangler(5);
return reftemp;
}
|
|
/Ox optimizations (favor speed)
|
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov QWORD PTR [rsp+8], rcx
mov rax, QWORD PTR passthrough$[rsp]
ret 0
int const & potential_dangler(int const &) ENDP
$T1 = 32
reftemp$ = 40
main PROC
$LN3:
sub rsp, 56
mov DWORD PTR $T1[rsp], 5
lea rcx, QWORD PTR $T1[rsp]
call int const & potential_dangler(int const &)
mov QWORD PTR reftemp$[rsp], rax
mov rax, QWORD PTR reftemp$[rsp]
mov eax, DWORD PTR [rax]
add rsp, 56
ret 0
main ENDP
|
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov rax, rcx
ret 0
int const & potential_dangler(int const &) ENDP
main PROC
mov eax, 5
ret 0
main ENDP
|
local constant but logically a global constant
|
const int& potential_dangler(const int& passthrough)
{
return passthrough;
}
int main()
{
const int local = 5;
const int& reflocal = potential_dangler(local);
return reflocal;
}
|
|
/Ox optimizations (favor speed)
|
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov QWORD PTR [rsp+8], rcx
mov rax, QWORD PTR passthrough$[rsp]
ret 0
int const & potential_dangler(int const &) ENDP
local$ = 32
reflocal$ = 40
main PROC
$LN3:
sub rsp, 56
mov DWORD PTR local$[rsp], 5
lea rcx, QWORD PTR local$[rsp]
call int const & potential_dangler(int const &)
mov QWORD PTR reflocal$[rsp], rax
mov rax, QWORD PTR reflocal$[rsp]
mov eax, DWORD PTR [rax]
add rsp, 56
ret 0
main ENDP
|
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov rax, rcx
ret 0
int const & potential_dangler(int const &) ENDP
main PROC
mov eax, 5
ret 0
main ENDP
|
constant expression i.e. logically a global constant
|
constexpr int return5()
{
return 5;
}
const int& potential_dangler(const int& passthrough)
{
return passthrough;
}
int main()
{
const int& reflocal = potential_dangler(return5());
return reflocal;
}
|
|
/Ox optimizations (favor speed)
|
int return5(void) PROC
mov eax, 5
ret 0
int return5(void) ENDP
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov QWORD PTR [rsp+8], rcx
mov rax, QWORD PTR passthrough$[rsp]
ret 0
int const & potential_dangler(int const &) ENDP
$T1 = 32
reflocal$ = 40
main PROC
$LN3:
sub rsp, 56
call int return5(void)
mov DWORD PTR $T1[rsp], eax
lea rcx, QWORD PTR $T1[rsp]
call int const & potential_dangler(int const &)
mov QWORD PTR reflocal$[rsp], rax
mov rax, QWORD PTR reflocal$[rsp]
mov eax, DWORD PTR [rax]
add rsp, 56
ret 0
main ENDP
|
int return5(void) PROC
mov eax, 5
ret 0
int return5(void) ENDP
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov rax, rcx
ret 0
int const & potential_dangler(int const &) ENDP
main PROC
mov eax, 5
ret 0
main ENDP
|
an actual global
|
const int GLOBAL = 5;
const int& potential_dangler(const int& passthrough)
{
return passthrough;
}
int main()
{
const int& reflocal = potential_dangler(GLOBAL);
return reflocal;
}
|
|
/Ox optimizations (favor speed)
|
int const GLOBAL DD 05H
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov QWORD PTR [rsp+8], rcx
mov rax, QWORD PTR passthrough$[rsp]
ret 0
int const & potential_dangler(int const &) ENDP
reflocal$ = 32
main PROC
$LN3:
sub rsp, 56
lea rcx, OFFSET FLAT:int const GLOBAL
call int const & potential_dangler(int const &)
mov QWORD PTR reflocal$[rsp], rax
mov rax, QWORD PTR reflocal$[rsp]
mov eax, DWORD PTR [rax]
add rsp, 56
ret 0
main ENDP
|
passthrough$ = 8
int const & potential_dangler(int const &) PROC
mov rax, rcx
ret 0
int const & potential_dangler(int const &) ENDP
main PROC
mov eax, 5
ret 0
main ENDP
|
In all four cases, when optimizations (favor speed) is turned on, the Microsoft compiler produced the same non dangling code regardless of whether it was an actual global, a local constant, a temporary constant or a constant expression evaluation. This is also the same that GCC
and Clang
was generating. To the msvc
compiler’s credit, it not only detect functions that can potentially dangle but also executions that could as well. In all cases, it was a warning instead of an error. While the temporary constant and the constant expression evaluation is truly dangling when not optimized, it was not a compiler error. Further, the global example was incorrectly flagged as potentially dangling even though it new it was a global. Regardless the optimized compilation, fixed the dangling and removed the potentially dangling flag.
This proposal advocates standardizing an optimization that compiler’s are already doing and have been doing before C++
got constexpr
in the language. Fixing this type of dangling in this fashion is the best possible way because potentially invalid code becomes valid with no programmer intervention, it produces no errors, it is faster, uses less memory and produces smaller executable sizes. In short, the compiler/language already has all it needs to fix dangling constants. Compilers are already doing this but there is currently no verbiage in the standard that state that anonymous constants don’t dangle because they are logically a global constant. Adopting this proposal ensures programmers do not have to fix something that was never dangling in the first place even though the current language makes it look like it is, needlessly.
There area a couple tooling opportunities especially with respect to the constinit
specifier.
- A command line and/or IDE tool could analyze the code for
const
, constexpr
/LiteralType
and constant-initialized and if the conditions matches automatically add the constinit
specifier for code reviewers.
- Another command line and/or IDE tool could strip
constinit
specifier from any temporaries for programmers.
Combined they would form a constinit
toggle which wouldn’t be all that much different from whitespace and special character toggles already found in many IDE(s).
Summary
The advantages to C++
with adopting this proposal is manifold.
- Safer
- Eliminate dangling of what should be constants
- Reduce immediate dangling when the instance is a constant
- Reduce returning direct reference dangling when the instance is a constant
- Reduce returning indirect reference dangling when the instance is a constant and was provided as an argument
- Reduce indirect dangling that can occur in the body of a function
- Reduce unitialized and delayed initialization errors
- Increases safety by avoiding data races.
- Simpler
- Encourages the use of temporaries
- Reduce lines of code
- Reduce naming; fewer names to return dangle
- Increases anonymously named
lvalues
and decreases rvalues
in the code.
- Reduce lines of code
- Reduce naming; fewer names to return dangle
- Make constexpr literals less surprising for new and old developers alike
- Reduce the gap between
C++
and C99
compound literals
- Improve the potential contribution of
C++
's dangling resolutions back to C
- Make string literals and
C++
literals more consistent with one another
- Taking a step closer to reducing undefined behavior in string literals
- Simplify the language to match existing practice
- Consequently, a “cleanup”, i.e. adoption of simpler, more general rules/guidelines
- Faster & More Memory Efficient
- Reduce unnecessary heap allocations
- Increase and improve upon the utilization of ROM and the benefits that entails
Frequently Asked Questions
What about locality of reference?
It is true that globals can be slower than locals because they are farther in memory from the code that uses them. So let me clarify, when I say static storage duration
, I really mean logically static storage duration
. If a type is a PODType
/TrivialType
or LiteralType
than there is nothing preventing the compiler from copying the global to a local that is closer to the executing code. Rather, the compiler must ensure that the instance is always available; effectively static storage duration
.
Consider this from an processor and assembly/machine language standpoint. A processor usually has instructions that works with memory. Whether that memory is ROM or is logically so because it is never written to by a program, then we have constants.
mov <register>,<memory>
A processor may also have specialized versions of common instructions where a constant value is taken as part of the instruction itself. This too is a constant. However, this constant is guaranteed closer to the code because it is physically a part of it.
mov <register>,<constant>
mov <memory>,<constant>
What is more interesting is these two examples of constants have different value categories since the ROM version is addressable and the instruction only version, clearly, is not. It should also be noted that the later unnamed/unaddressable version physically can’t dangle.
Won’t this break a lot of existing code?
NO, if any. To the contrary, code that is broken is now fixed. Code that would be invalid is now valid, makes sense and can be rationally explained. Let me summarize:
This feature not only changes the point of destruction but also the point of construction. Instances that were of automatic storage duration, are now of static storage duration. Instances that were temporaries, are no longer temporaries. Surely, something must be broken! From the earlier section “Present”, subsection “C Standard Compound Literals”. Even the C++
standard recognized that their are other opportunities for constant initialization.
Working Draft, Standard for Programming Language C++
“6.9.3.2 Static initialization [basic.start.static]”
“3 An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically …”
|
So, what is the point? For the instances that would benefit from implicit constant initialization, their are currently NO guarantees as far as their lifetime and as such is indeterminite. With this portion of the proposal, a guarantee is given and as such that which was non determinite becomes determinite.
It should also be noted that while this enhancement is applied implicitly, programmers has opted into this up to three times.
- The programmer of the type must have provided a means for the type to be constructed at compile time likely by having a
constexpr
constructor.
- The programmer of the variable or function parameter must have stated that they want a
const
.
- The end programmer have
const-initialized
the variable or argument.
Having expressed contant requirements three times, it is pretty certain that the end programmer wanted a constant, even if it is anonymous.
Who would even use these features? Their isn’t sufficient use to justify these changes.
Everyone … Quite a bit, actually
Consider all the examples littered throughout our history, these are what gets fixed.
- dangling reported on normal use of the
STL
- dangling examples reported in the
C++
standard
- real world dangling reported in NAD, not a defect, reports
This doesn’t even include the countless examples found in numerous articles comparing C++
with other nameless programming languages which would be fixed. However, the best proof can be found in our usage and other proposals.
C++ Core Guidelines
F.16: For “in” parameters, pass cheaply-copied types by value and others by reference to const
|
In C++
, we use const
parameters alot. This is the first of three requirements of implicit constant initialization
. What about the use of types that can be constructed at compile time?
C++20
: std::pair
, std::tuple
, std::string
, std::vector
C++23
: std::optional
, std::variant
, std::unique_ptr
As their was sufficient use to justify making the constructors of any one of these listed above types to be constexpr
than their would be sufficient use of the implicit constant initialization
feature which would use them all as this satisfies its second and third requirement that the instances be constructable at compile time and constant-initialized
.
Why not just use a static analyzer?
Typically a static analyzer doesn’t fix code. Instead, it just produces warnings and errors. It is the programmer’s responsibility to fix code by deciding whether the incident was a false positive or not and making the corresponding code changes. This proposal does fix some dangling but others go unresolved and unidentified. As such this proposal and static analyzers are complimentary. Combined this proposal can fix some dangling and a static analyzer could be used to identify what is remaining. As such those who still ask, “why not just use a static analyzer”, might really be saying this proposal’s language enhancements might break their static analyzer. To which I say, the standard dictates the analyzer, not the other way around. That is true for all tools. However, let’s explore the potential impact of this proposal on static analyzers.
The C++
language is complex. It stands to reason that our tools would have some degree of complexity, since they would need to take some subset of our language’s rules into consideration. In any proposal, mine included, fixes to any dangling would result in potential dangling incidents becoming false positives between those identified by a static analyzer that overlap with said proposal. The false positives would join those that a static analyzer already has for not factoring existing language rules into consideration just as it would for any new language rules.
With implicit constant initialization
, existing static analyzers would need to be enhanced to track the const
ness of variables and parameters, whether or not the types of variables and parameters can be constructed at compile time and whether or not instances were constant-initialized. Until that happens, an existing dangling incident reported by static analyzer will just be a false positive. The total number of incidents remain the same and the programmer just need to recognize that it was a false positive which should be easy to do since constants are trivial and these rules are simple.
Can this even be implemented?
C++
already provides static storage duration guarantee for instances of one type and allows it for many others.
- native string literals already have static storage duration
- compilers have been free for a long time to promote compile time constructed instances to have static storage duration
- any
LiteralType
instances that are constant-initialized are already prime candidates for compilers to promote to having static storage duration
Doesn’t the implicit constant initialization
feature make it harder for programmers to identify dangling and thus harder to teach?
If there was no dangling than there would be nothing to teach with respect to any dangling feature. Even the whole standard is not taught. So the more dangling we fix in the language, the less dangling that has to be taught to beginners. Consider the following example, does the new features make it easier or harder to identify dangling?
f({1,2});
int i = 1;
f({i, 2});
It is plain to see that {1,2}
is constant-initialized as it is composed entirely of LiteralType
(s). It is also plain to see that {i,2}
is modifiable as its initialization statement is variable and dynamic due to the variable i
. So the real questions are as follows:
- Is the first parameter to the function
f
const?
- Is the type of the first parameter to the function
f
a LiteralType
?
The fact is some programmer had to have known the answer to both questions in order to have writtern f({1,2})
in the first place. The case could be made that it would be nice to be able to use the constinit
keyword on temporary arguments, f(constinit {1, 2})
, as this would allow those who don’t write the code, such as code reviewers, to quickly validate the code. Even the programmer would benefit, some, if the code was copied. However, constinit
would mostly be superfluous, if the temporaries are just anonymously named variables
feature is added. As such, constinit
should be optional. Consequently, any negative impact upon identifying and teaching dangling is negligible.
Yet, both implicit and explicit constant initialization
feature, by itself, makes it easier to identify and teach dangling.
C++ Core Guidelines
F.43: Never (directly or indirectly) return a pointer or a reference to a local object
…
Note This applies only to non-static local variables. All static variables are (as their name indicates) statically allocated, so that pointers to them cannot dangle.
Instances that have static storage duration can’t dangle. Currently in C++
, instances that don’t immediately dangle can still dangle later such as by returning. Using static storage duration
short circuits the dangling identification process. An instance, once identified, doesn’t need to be factored into any additional dangling decision making process. Using more static storage duration
speeds up the dangling identification process. This would also be of benefit to static analyzers that goes through a similar thought process.
Doesn’t this make C++ harder to teach?
Until the day that all dangling gets fixed, any incremental fixes to dangling still would require programmers to be able to identify any remaining dangling and know how to fix it specific to the given scenario, as there are multiple solutions. Since dangling occurs even for things as simple as constants and immediate dangling is so naturally easy to produce, dangling resolution still have to be taught, even to beginners. As this proposal fixes these types of dangling, it makes teaching C++
easier because it makes C++
easier.
So, what do we teach now and what bearing does these teachings, the C++
standard and this proposal have on one another.
C++ Core Guidelines
F.42: Return a T*
to indicate a position (only)
Note Do not return a pointer to something that is not in the caller’s scope; see F.43.
Returning references to something in the caller’s scope is only natural. It is a part of our reference delegating programming model. A function when given a reference does not know how the instance was created and it doesn’t care as long as it is good for the life of the function call (and beyond). Unfortunately, scoping temporary arguments to the statement instead of the containing block doesn’t just create immediate dangling but it provides to functions references to instances that are near death. These instances are almost dead on arrival. Having the ability to return a reference to a caller’s instance or a sub-instance thereof assumes, correctly, that reference from the caller’s scope would still be alive after this function call. The fact that temporary rules shortened the life to the statement is at odds with what we teach. This proposal restores to some temporaries the lifetime of anonymously named constants which is not only natural but also consistent with what programmers already know. It is also in line with what we teach as was codified in the C++ Core Guidelines. One such is as follows:
C++ Core Guidelines
F.43: Never (directly or indirectly) return a pointer or a reference to a local object
Reason To avoid the crashes and data corruption that can result from the use of such a dangling pointer.
Other than turning some of these locals into globals, this proposal does not solve nor contradict this teaching. If anything, by cleaning up the simple dangling it makes the remaining more visible.
Further, what is proposed is easy to teach because we already teach it and it makes C++
even easier to teach.
- We already teach that native string literals don’t dangle because they have static storage duration. This proposal just extends the concept to constants, as expected. This increases good consistency and reduces a bifurcation that is currently taught.
All of this can be done without adding any new keywords or any new attributes. We just use constant concepts that beginners are already familiar with. In fact, we will would be working in harmony with all that we already teach about globals in the Core C++ Guidelines
.
I.2: Avoid non-const global variables
I.22: Avoid complex initialization of global objects
F.15: Prefer simple and conventional ways of passing information
F.16: For “in” parameters, pass cheaply-copied types by value and others by reference to const
F.43: Never (directly or indirectly) return a pointer or a reference to a local object
R.5: Prefer scoped objects, don’t heap-allocate unnecessarily
R.6: Avoid non-const global variables
CP.2: Avoid data races
CP.24: Think of a thread as a global container
CP.32: To share ownership between unrelated threads use shared_ptr
How do these specifiers propagate?
These specifiers apply to the temporary immediately to the right of said specifier and to any child temporaries. It does not impact any parent or sibling temporaries. Consider these examples:
f({1, { {2, 3}, 4}, {5, 6} });
f({1, { {2, 3}, constinit 4}, {5, 6} });
f({1, { constinit {2, 3}, 4}, {5, 6} });
f({1, constinit { {2, 3}, 4}, {5, 6} });
f(constinit {1, { {2, 3}, 4}, {5, 6} });
f({1, { {2, 3}, 4}, {constinit 5, 6} });
References
Jarrad J. Waterloo <descender76 at gmail dot com>
constant dangling
Table of contents
Changelog
R0
temporary storage class specifiers
[1] andimplicit constant initialization
[2] proposals.Abstract
This paper proposes the standard adds anonymous global constants to the language with the intention of automatically fixing a shocking type of dangling which occurs when constants or that which should be constants dangle. This is shocking because constant like instances should really have constant-initialization meaning that they should have static storage duration and consequently should not dangle. This trips up beginner code requiring teaching dangling on day one. It is annoying to non beginners. Constants are used as defaults in production code. Constants are also frequently used in test and example code. Further, many instances of dangling used by non
C++
language comparisons frequently use constants as examples.Motivation
There are multiple resolutions to dangling in the
C++
language.Simpler implicit move
[3]Fix the range-based for loop, Rev2
[4]Get Fix of Broken Range-based for Loop Finally Done
[5]This proposal
All are valid resolutions and individually are better than the others, given the scenario. This proposal is focused on the third option, which is to fix by making the instance global.
Dangling the stack is shocking because is violates our trust in our compilers and language, since they are primarily responsible for the stack. However, there are three types of dangling that are even more shocking than the rest.
Simpler implicit move
[3:1]Making an instance global is a legitimate fix to dangling.
C++ Core Guidelines
F.43: Never (directly or indirectly) return a pointer or a reference to a local object [6]
Reason To avoid the crashes and data corruption that can result from the use of such a dangling pointer. [6:1]
…
Note This applies only to non-static local variables. All static variables are (as their name indicates) statically allocated, so that pointers to them cannot dangle. [6:2]
While making an instance global doesn’t fix all dangling in the language, it is the only resolution that can fix all three most shocking types of dangling provided the instance in question is a constant. It is also the best fix for these instances.
Since
constexpr
was added to the language inC++11
there has been an increase in the candidates of temporary instances that could be turned into global constants. ROMability was in part the motivation forconstexpr
but the requirement was never made. Even if aC++
architecture doesn’t support ROM, it is still required by language to supportstatic storage duration
andconst
. Matter of fact, due to the immutable nature of constant-initialized constant expressions, these expressions/instances are constant for the entire program even though they, at present, don’t havestatic storage duration
, even if just logically. There is a greater need now that more types are getting constexpr constructors. Also types that would normally only be dynamically allocated, such as string and vector, sinceC++20
, can also beconstexpr
. This has opened up the door wide for many more types being constructed at compile time.Motivating Examples
Before diving into the examples, let’s discuss what exactly is being asked for. There are two features; one implicit and the other explicit.
implicit constant initialization
If a temporary argument is constant-initialized [7] (7.7 Constant expressions [expr.const]) and its argument/instance type is a
LiteralType
and its parameter/local/member type isconst
and notmutable
then the instance is implicitly created withconstant initialization
.As such it has
static storage duration
and can’t dangle.explicit constant initialization
The
constinit
specifier can be applied to temporaries. Applying it asserts that the temporary wasconst-initialized
, that the argument type is aLiteralType
and its parameter/local/member type isconst
and notmutable
. This explicitly gives the temporarystatic storage duration
.While
implicit constant initialization
automatically fixes dangle,constinit
allows the programmers to manually and explicitly fix some dangling. The former is better for programmers and the language, while the later favors code reviewers or programmers who copy an example and want to have the compiler, momentarily, verify whether it is correct.So what sorts of dangling does this fix for us. Besides fixing some dangling, this also fixes some inconsistencies between string literals [7:1] (5.13.5 String literals [lex.string]) and other literal types.
This is reasonable based on how programmers reason about constants being immutable variables and temporaries which are known at compile time and do not change for the life of the program. This also works with plain old references.
“Such a feature would also help to … fix several bugs we see in practice:” [8]
“Consider we have a function returning the value of a map element or a default value if no such element exists without copying it:” [8:1]
“then this results in a classical bug:” [8:2]
Is this really a bug? With this proposal, it isn’t! Here is why. The function
findOrDefault
expects aconst
string&
for its third parameter. SinceC++20
, string’s constructor isconstexpr
. It CAN be constructed as a constant expression. Since all the arguments passed to thisconstexpr
constructor are constant expressions, in this case"none"
, the temporarystring
defvalue
IS alsoconstant-initialized
[7:2] (7.7 Constant expressions [expr.const]). This paper advises that if you have a nonmutable
const
that it isconstant-initialized
, that the variable or temporary undergoesconstant initialization
[7:3] (6.9.3.2 Static initialization [basic.start.static]). In other words it has implicitstatic storage duration
. The temporary would actually cease to be a temporary. As such this usage offindOrDefault
CAN’T dangle.The pain of immediate dangling associated with temporaries are especially felt when working with other anonymous language features of
C++
such as lambda functions and coroutines.Lambda functions
Whenever a lambda function captures a reference to a temporary it immediately dangles before an opportunity is given to call it, unless it is a immediately invoked lambda/function expression.
This problem is resolved when the scope of temporaries has
static storage duration
instead of the containing expression providedc1
resolves to aconst std::string&
sincec1
was constant-initialized. Theconstinit
specifier could ensure this.Coroutines
Similarly, whenever a coroutine gets constructed with a reference to a temporary it immediately dangles before an opportunity is given for it to be
co_await
ed upon.This specific immediately dangling example is fixed by implicit constant initialization since the parameter
s
expects aconst std::string&
and it was constant-initialized.It should be noted too that the current rules of temporaries discourages the use of temporaries because of the dangling it introduces. However, if the lifetime of temporaries was increased to a reasonable degree than programmers would use temporaries more. This would reduce dangling further because there would be fewer named variables that could be propagated outside of their containing scope. This would also improve code clarity by reducing the number of lines of code allowing any remaining dangling to be more clearly seen.
Proposed Wording
6.7.5.4 Automatic storage duration [basic.stc.auto]
1 Variables that belong to a block or parameter scope and are not explicitly declared static, thread_local,
orextern or had not underwent implicit constant initialization (6.9.3.2) have automatic storage duration. The storage for these entities lasts until the block in which they are created exits.…
6.9.3.2 Static initialization [basic.start.static]
…
2 Constant initialization is performed explicitly if a variable or temporary object with static or thread storage duration is constant-initialized (7.7). Constant initialization is performed implicitly if a non mutable const variable or non mutable const temporary object is constant-initialized (7.7). If constant initialization is not performed, a variable with static storage duration (6.7.5.2) or thread storage duration (6.7.5.3) is zero-initialized (9.4). Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. All static initialization strongly happens before (6.9.2.2) any dynamic initialization.
…
9.2.7 The constinit specifer [dcl.constinit]
1 If the constinit specifer is applied to a temporary, it gives the temporary static storage duration, asserts that the argument is a
LiteralType
and asserts that the parameter type is notmutable
andconst
otherwise the constinit specifer shall be applied only to a declaration of a variable with static or thread storage duration. If the specifer is applied to any declaration of a variable, it shall be applied to the initializing declaration. No diagnostic is required if no constinit declaration is reachable at the point of the initializing declaration.…
NOTE: Wording still need to capture that these temporaries are no longer temporaries and that their value category is
lvalue
.In Depth Rationale
There is a general expectation across programming languages that constants or more specifically constant literals are “immutable values which are known at compile time and do not change for the life of the program”. [9] In most programming languages or rather the most widely used programming languages, constants do not dangle. Constants are so simple, so trivial (English wise), that it is shocking to even have to be conscience of dangling. This is shocking to
C++
beginners, expert programmers from other programming languages who come over toC++
and at times even shocking to experiencedC++
programmers.There is already significant interest in this type of feature from programmers. Just look at
C23
as an example. For instance, theIntroduce storage-class specifiers for compound literals
[10] andThe 'constexpr' specifier
[11] allowsC
programmers to specifystatic
,constexpr
andthread_local
as storage class specifiers on their compound literals. The compound literals equivalent inC++
isLiteralType
and temporaries. This paper reuses our existing keywordconstinit
overstatic
because of what we all know from theC++ Core Guidelines
[12].I.2: Avoid non-const global variables[12:1]
Reason Non-const global variables hide dependencies and make the dependencies subject to unpredictable changes.[12:2]
I also did not choose
constexpr
, though that may be better for greaterC
compatibility, since I wrote my proposals before my seeing theC
paper. Alsoconstinit
better matches that which these features are doing in the context of existingC++
terminology of constant initialization. Further, there are differences in whatconstexpr
means toC++
andC
, at present.It should also be noted that these concepts are already in the standard just not fully exposed in the language. For instance, strings literals already have static storage duration and attempting to modify one is undefined.
Working Draft, Standard for Programming Language C++
[7:4]“5.13.5 String literals [lex.string]”
“9 Evaluating a string-literal results in a string literal object with static storage duration (6.7.5). Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecifed.”
“[Note 4: The effect of attempting to modify a string literal object is undefined. — end note]”
Further, this behavior happens all the time with evaluations of constant expressions but unfortunately we can’t enjoy all the benefits thereof.
Working Draft, Standard for Programming Language C++
[7:5]“6.9.3.2 Static initialization [basic.start.static]”
“1 Variables with static storage duration are initialized as a consequence of program initiation. Variables with thread storage duration are initialized as a consequence of thread execution. Within each of these phases of initiation, initialization occurs as follows.”
“2 Constant initialization is performed if a variable or temporary object with static or thread storage duration is constant-initialized (7.7). …”
…
“3 An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically, provided that …”
These ROM-able instances do not dangle as globals but from our code perspective they currently look like dangling locals. This causes false positive with our static analyzers and the programmer’s themselves. If we would just admit from a language standpoint that these are indeed constants than not only do we fix some dangling but also our mental model. This same reference also says “constant initialization is performed if a … temporary object with static … storage duration is constant-initialized”. Programmers can’t fully utilize this scenario because at present we can only use
static
on class members and locals but not temporary arguments. Since the code identified by this paper is already subject to constant initialization than there is no real chance of these changes causing any breakage.Value Categories
If some temporaries can be changed to have global scope than how does it affect their value categories? Currently, if the literal is a string than it is a
lvalue
and it has global scope. For all the other literals, they tend to be aprvalue
and have statement scope.movable
unmovable
named
unnamed
From the programmers perspective, global temporaries are just anonymously named variables. When they are passed as arguments, they have life beyond the life of the function that it is given to. As such the expression is not movable. As such, the desired behavior described throughout the paper is that they are
lvalues
which makes sense from a anonymously named standpoint. However, it must be said that technically they are unnamed which places them into the value category thatC++
currently does not have; the unmovable unnamed. The point is, this is simple whether it is worded as alvalue
or an unambiguous new value category that behaves like alvalue
. Regardless of which, there are some advantages that must be pointed out.Avoids superfluous moves
The proposed avoids superfluous moves. Copying pointers and lvalue references are cheaper than performing a move which is cheaper than performing any non trivial value copy.
Undo forced naming
The proposed makes using types that delete their
rvalue
reference constructor easier to use. For instance,std::reference_wrapper
can’t be created/reassigned with arvalue
reference, i.e. temporaries. Rather, it must be created/reassigned with alvalue
reference created on a seperate line. This requires superfluous naming which increases the chances of dangling. Further, according to theC++ Core Guidelines
, it is developers practice to do the following:Since the variable
value2
andvalue3
is likely to be created manually at block scope instead of variable scope, it can accidentally introduce more dangling. Constructing and reassigning with aglobal scoped
lvalue
temporary avoids these common dangling possibilities along with simplifying the code.Performance Considerations
There are at least three ways to provide a non dangling globalish constant.
const
andstatic
i.e.C++
languageWhile the first two are addressable, the last one isn’t.
In the next three examples, the same assembly is produced regardless of whether the literal
5
was provided via a native literal, aconstexpr
or aconst
global. The following results were produced in Compiler Explorer using both “x86-64 clang (trunk) -std=c++20 -O3
” and “x86-64 gcc (trunk) -std=c++20 -O3
”.values that are [logically] global constants
local constant but logically a global constant
main: # @main mov eax, 5 ret
constant expression i.e. logically a global constant
main: # @main mov eax, 5 ret
an actual global
main: # @main mov eax, 5 ret
The point is all three are logically non dangling, constant global. Now let’s look at reference examples.
immediate dangling
Not only do all three following examples produce the exact same assembly, they also provide the exact same assembly as the previous three examples. They are all essentially global constants from the assembly and programmer standpoint but the current standard says two of the three dangle, unnecessarily.
local constant but logically a global constant
main: # @main mov eax, 5 ret
main: # @main mov eax, 5 ret
constant expression i.e. logically a global constant
main: # @main mov eax, 5 ret
an actual global
main: # @main mov eax, 5 ret
indirect dangling of caller’s local
Similarly to, the next three examples produce the same assembly in the 3
clang
cases and 2 of thegcc
cases.GCC
would have produced the same result in its 2nd case had it had treated theconst
expected evaluation of a constant expression as a global constant as its third case did.local constant but logically a global constant
potential_dangler(int const&): # @potential_dangler(int const&) mov rax, rdi ret main: # @main mov eax, 5 ret
constant expression i.e. logically a global constant
x86-64 clang (trunk) -std=c++20 -O3
potential_dangler(int const&): # @potential_dangler(int const&) mov rax, rdi ret main: # @main mov eax, 5 ret
x86-64 gcc (trunk) -std=c++20 -O3
NOTE: Can’t really say what GCC is doing with the
xor
. However, if GCC had treated the resolved constant expression which is const required as a const global as in the next example than the results would have been the same.potential_dangler(int const&): mov rax, rdi ret main: xor eax, eax ret
an actual global
potential_dangler(int const&): # @potential_dangler(int const&) mov rax, rdi ret main: # @main mov eax, 5 ret
In all these logically global constant cases, no instance was actually stored global but was perfectly inlined as an assembly opcode constant. So, the worst case performance of this proposal would be a single upfront load time cost. Contrast that with the current potential local constant cost of constantly creating and destroying instances, even multiple times concurrently in different threads. Even the proposed cost can go from 1 to 0 while the current non global local could result in superfluous dynamic allocations since
std::string
andstd::vector
are nowconstexpr
.Microsoft’s compiler and existing dangling detection
Things really get interesting when we factor Microsoft’s compiler into the equation and contrast its dangling detection between optimized configurations.
x64 msvc v19.latest
indirect dangling of caller’s local
temporary constant but logically a global constant
/Ox optimizations (favor speed)
passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov QWORD PTR [rsp+8], rcx mov rax, QWORD PTR passthrough$[rsp] ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP $T1 = 32 reftemp$ = 40 main PROC $LN3: sub rsp, 56; 00000038H mov DWORD PTR $T1[rsp], 5 lea rcx, QWORD PTR $T1[rsp] ; potential_dangler call int const & potential_dangler(int const &) mov QWORD PTR reftemp$[rsp], rax mov rax, QWORD PTR reftemp$[rsp] mov eax, DWORD PTR [rax] add rsp, 56; 00000038H ret 0 main ENDP
passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov rax, rcx ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP main PROC mov eax, 5 ret 0 main ENDP
local constant but logically a global constant
/Ox optimizations (favor speed)
passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov QWORD PTR [rsp+8], rcx mov rax, QWORD PTR passthrough$[rsp] ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP local$ = 32 reflocal$ = 40 main PROC $LN3: sub rsp, 56; 00000038H mov DWORD PTR local$[rsp], 5 lea rcx, QWORD PTR local$[rsp] ; potential_dangler call int const & potential_dangler(int const &) mov QWORD PTR reflocal$[rsp], rax mov rax, QWORD PTR reflocal$[rsp] mov eax, DWORD PTR [rax] add rsp, 56; 00000038H ret 0 main ENDP
passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov rax, rcx ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP main PROC mov eax, 5 ret 0 main ENDP
constant expression i.e. logically a global constant
/Ox optimizations (favor speed)
int return5(void) PROC; return5, COMDAT mov eax, 5 ret 0 int return5(void) ENDP; return5 passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov QWORD PTR [rsp+8], rcx mov rax, QWORD PTR passthrough$[rsp] ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP $T1 = 32 reflocal$ = 40 main PROC $LN3: sub rsp, 56; 00000038H call int return5(void); return5 mov DWORD PTR $T1[rsp], eax lea rcx, QWORD PTR $T1[rsp] ; potential_dangler call int const & potential_dangler(int const &) mov QWORD PTR reflocal$[rsp], rax mov rax, QWORD PTR reflocal$[rsp] mov eax, DWORD PTR [rax] add rsp, 56; 00000038H ret 0 main ENDP
int return5(void) PROC; return5, COMDAT mov eax, 5 ret 0 int return5(void) ENDP; return5 passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov rax, rcx ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP main PROC mov eax, 5 ret 0 main ENDP
an actual global
/Ox optimizations (favor speed)
int const GLOBAL DD 05H; GLOBAL passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov QWORD PTR [rsp+8], rcx mov rax, QWORD PTR passthrough$[rsp] ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP reflocal$ = 32 main PROC $LN3: sub rsp, 56; 00000038H lea rcx, OFFSET FLAT:int const GLOBAL ; potential_dangler call int const & potential_dangler(int const &) mov QWORD PTR reflocal$[rsp], rax mov rax, QWORD PTR reflocal$[rsp] mov eax, DWORD PTR [rax] add rsp, 56; 00000038H ret 0 main ENDP
passthrough$ = 8 ; potential_dangler int const & potential_dangler(int const &) PROC mov rax, rcx ret 0 ; potential_dangler int const & potential_dangler(int const &) ENDP main PROC mov eax, 5 ret 0 main ENDP
In all four cases, when optimizations (favor speed) is turned on, the Microsoft compiler produced the same non dangling code regardless of whether it was an actual global, a local constant, a temporary constant or a constant expression evaluation. This is also the same that
GCC
andClang
was generating. To themsvc
compiler’s credit, it not only detect functions that can potentially dangle but also executions that could as well. In all cases, it was a warning instead of an error. While the temporary constant and the constant expression evaluation is truly dangling when not optimized, it was not a compiler error. Further, the global example was incorrectly flagged as potentially dangling even though it new it was a global. Regardless the optimized compilation, fixed the dangling and removed the potentially dangling flag.This proposal advocates standardizing an optimization that compiler’s are already doing and have been doing before
C++
gotconstexpr
in the language. Fixing this type of dangling in this fashion is the best possible way because potentially invalid code becomes valid with no programmer intervention, it produces no errors, it is faster, uses less memory and produces smaller executable sizes. In short, the compiler/language already has all it needs to fix dangling constants. Compilers are already doing this but there is currently no verbiage in the standard that state that anonymous constants don’t dangle because they are logically a global constant. Adopting this proposal ensures programmers do not have to fix something that was never dangling in the first place even though the current language makes it look like it is, needlessly.Tooling Opportunities
There area a couple tooling opportunities especially with respect to the
constinit
specifier.const
,constexpr
/LiteralType
and constant-initialized and if the conditions matches automatically add theconstinit
specifier for code reviewers.constinit
specifier from any temporaries for programmers.Combined they would form a
constinit
toggle which wouldn’t be all that much different from whitespace and special character toggles already found in many IDE(s).Summary
The advantages to
C++
with adopting this proposal is manifold.lvalues
and decreasesrvalues
in the code.C++
andC99
compound literalsC++
's dangling resolutions back toC
C++
literals more consistent with one anotherFrequently Asked Questions
What about locality of reference?
It is true that globals can be slower than locals because they are farther in memory from the code that uses them. So let me clarify, when I say
static storage duration
, I really mean logicallystatic storage duration
. If a type is aPODType
/TrivialType
orLiteralType
than there is nothing preventing the compiler from copying the global to a local that is closer to the executing code. Rather, the compiler must ensure that the instance is always available; effectivelystatic storage duration
.Consider this from an processor and assembly/machine language standpoint. A processor usually has instructions that works with memory. Whether that memory is ROM or is logically so because it is never written to by a program, then we have constants.
A processor may also have specialized versions of common instructions where a constant value is taken as part of the instruction itself. This too is a constant. However, this constant is guaranteed closer to the code because it is physically a part of it.
What is more interesting is these two examples of constants have different value categories since the ROM version is addressable and the instruction only version, clearly, is not. It should also be noted that the later unnamed/unaddressable version physically can’t dangle.
Won’t this break a lot of existing code?
NO, if any. To the contrary, code that is broken is now fixed. Code that would be invalid is now valid, makes sense and can be rationally explained. Let me summarize:
This feature not only changes the point of destruction but also the point of construction. Instances that were of automatic storage duration, are now of static storage duration. Instances that were temporaries, are no longer temporaries. Surely, something must be broken! From the earlier section “Present”, subsection “C Standard Compound Literals”. Even the
C++
standard recognized that their are other opportunities for constant initialization.Working Draft, Standard for Programming Language C++
[7:6]“6.9.3.2 Static initialization [basic.start.static]”
“3 An implementation is permitted to perform the initialization of a variable with static or thread storage duration as a static initialization even if such initialization is not required to be done statically …”
So, what is the point? For the instances that would benefit from implicit constant initialization, their are currently NO guarantees as far as their lifetime and as such is indeterminite. With this portion of the proposal, a guarantee is given and as such that which was non determinite becomes determinite.
It should also be noted that while this enhancement is applied implicitly, programmers has opted into this up to three times.
constexpr
constructor.const
.const-initialized
the variable or argument.Having expressed contant requirements three times, it is pretty certain that the end programmer wanted a constant, even if it is anonymous.
Who would even use these features? Their isn’t sufficient use to justify these changes.
Everyone … Quite a bit, actually
Consider all the examples littered throughout our history, these are what gets fixed.
STL
C++
standardThis doesn’t even include the countless examples found in numerous articles comparing
C++
with other nameless programming languages which would be fixed. However, the best proof can be found in our usage and other proposals.C++ Core Guidelines
[13]F.16: For “in” parameters, pass cheaply-copied types by value and others by reference to const
In
C++
, we useconst
parameters alot. This is the first of three requirements ofimplicit constant initialization
. What about the use of types that can be constructed at compile time?C++20
:std::pair
,std::tuple
,std::string
,std::vector
C++23
:std::optional
,std::variant
,std::unique_ptr
As their was sufficient use to justify making the constructors of any one of these listed above types to be
constexpr
than their would be sufficient use of theimplicit constant initialization
feature which would use them all as this satisfies its second and third requirement that the instances be constructable at compile time andconstant-initialized
.Why not just use a static analyzer?
Typically a static analyzer doesn’t fix code. Instead, it just produces warnings and errors. It is the programmer’s responsibility to fix code by deciding whether the incident was a false positive or not and making the corresponding code changes. This proposal does fix some dangling but others go unresolved and unidentified. As such this proposal and static analyzers are complimentary. Combined this proposal can fix some dangling and a static analyzer could be used to identify what is remaining. As such those who still ask, “why not just use a static analyzer”, might really be saying this proposal’s language enhancements might break their static analyzer. To which I say, the standard dictates the analyzer, not the other way around. That is true for all tools. However, let’s explore the potential impact of this proposal on static analyzers.
The
C++
language is complex. It stands to reason that our tools would have some degree of complexity, since they would need to take some subset of our language’s rules into consideration. In any proposal, mine included, fixes to any dangling would result in potential dangling incidents becoming false positives between those identified by a static analyzer that overlap with said proposal. The false positives would join those that a static analyzer already has for not factoring existing language rules into consideration just as it would for any new language rules.With
implicit constant initialization
, existing static analyzers would need to be enhanced to track theconst
ness of variables and parameters, whether or not the types of variables and parameters can be constructed at compile time and whether or not instances were constant-initialized. Until that happens, an existing dangling incident reported by static analyzer will just be a false positive. The total number of incidents remain the same and the programmer just need to recognize that it was a false positive which should be easy to do since constants are trivial and these rules are simple.Can this even be implemented?
C++
already provides static storage duration guarantee for instances of one type and allows it for many others.LiteralType
instances that are constant-initialized are already prime candidates for compilers to promote to having static storage durationDoesn’t the
implicit constant initialization
feature make it harder for programmers to identify dangling and thus harder to teach?If there was no dangling than there would be nothing to teach with respect to any dangling feature. Even the whole standard is not taught. So the more dangling we fix in the language, the less dangling that has to be taught to beginners. Consider the following example, does the new features make it easier or harder to identify dangling?
It is plain to see that
{1,2}
is constant-initialized as it is composed entirely ofLiteralType
(s). It is also plain to see that{i,2}
is modifiable as its initialization statement is variable and dynamic due to the variablei
. So the real questions are as follows:f
const?f
aLiteralType
?The fact is some programmer had to have known the answer to both questions in order to have writtern
f({1,2})
in the first place. The case could be made that it would be nice to be able to use theconstinit
keyword on temporary arguments,f(constinit {1, 2})
, as this would allow those who don’t write the code, such as code reviewers, to quickly validate the code. Even the programmer would benefit, some, if the code was copied. However,constinit
would mostly be superfluous, if thetemporaries are just anonymously named variables
feature is added. As such,constinit
should be optional. Consequently, any negative impact upon identifying and teaching dangling is negligible.Yet, both
implicit and explicit constant initialization
feature, by itself, makes it easier to identify and teach dangling.C++ Core Guidelines
F.43: Never (directly or indirectly) return a pointer or a reference to a local object [6:3]
…
Note This applies only to non-static local variables. All static variables are (as their name indicates) statically allocated, so that pointers to them cannot dangle. [6:4]
Instances that have static storage duration can’t dangle. Currently in
C++
, instances that don’t immediately dangle can still dangle later such as by returning. Usingstatic storage duration
short circuits the dangling identification process. An instance, once identified, doesn’t need to be factored into any additional dangling decision making process. Using morestatic storage duration
speeds up the dangling identification process. This would also be of benefit to static analyzers that goes through a similar thought process.Doesn’t this make C++ harder to teach?
Until the day that all dangling gets fixed, any incremental fixes to dangling still would require programmers to be able to identify any remaining dangling and know how to fix it specific to the given scenario, as there are multiple solutions. Since dangling occurs even for things as simple as constants and immediate dangling is so naturally easy to produce, dangling resolution still have to be taught, even to beginners. As this proposal fixes these types of dangling, it makes teaching
C++
easier because it makesC++
easier.So, what do we teach now and what bearing does these teachings, the
C++
standard and this proposal have on one another.C++ Core Guidelines
F.42: Return a
T*
to indicate a position (only) [12:3]Note Do not return a pointer to something that is not in the caller’s scope; see F.43. [6:5]
Returning references to something in the caller’s scope is only natural. It is a part of our reference delegating programming model. A function when given a reference does not know how the instance was created and it doesn’t care as long as it is good for the life of the function call (and beyond). Unfortunately, scoping temporary arguments to the statement instead of the containing block doesn’t just create immediate dangling but it provides to functions references to instances that are near death. These instances are almost dead on arrival. Having the ability to return a reference to a caller’s instance or a sub-instance thereof assumes, correctly, that reference from the caller’s scope would still be alive after this function call. The fact that temporary rules shortened the life to the statement is at odds with what we teach. This proposal restores to some temporaries the lifetime of anonymously named constants which is not only natural but also consistent with what programmers already know. It is also in line with what we teach as was codified in the C++ Core Guidelines. One such is as follows:
C++ Core Guidelines
F.43: Never (directly or indirectly) return a pointer or a reference to a local object [6:6]
Reason To avoid the crashes and data corruption that can result from the use of such a dangling pointer. [6:7]
Other than turning some of these locals into globals, this proposal does not solve nor contradict this teaching. If anything, by cleaning up the simple dangling it makes the remaining more visible.
Further, what is proposed is easy to teach because we already teach it and it makes
C++
even easier to teach.All of this can be done without adding any new keywords or any new attributes. We just use constant concepts that beginners are already familiar with. In fact, we will would be working in harmony with all that we already teach about globals in the
Core C++ Guidelines
[14].I.2: Avoid non-const global variables
[12:4]I.22: Avoid complex initialization of global objects
[12:5]F.15: Prefer simple and conventional ways of passing information
[12:6]F.16: For “in” parameters, pass cheaply-copied types by value and others by reference to const
[12:7]F.43: Never (directly or indirectly) return a pointer or a reference to a local object
[6:8]R.5: Prefer scoped objects, don’t heap-allocate unnecessarily
[12:8]R.6: Avoid non-const global variables
[12:9]CP.2: Avoid data races
[12:10]CP.24: Think of a thread as a global container
[12:11]CP.32: To share ownership between unrelated threads use shared_ptr
[12:12]How do these specifiers propagate?
These specifiers apply to the temporary immediately to the right of said specifier and to any child temporaries. It does not impact any parent or sibling temporaries. Consider these examples:
References
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2658r0.html ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2623r2.html ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2266r3.html ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2012r2.pdf ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2644r0.pdf ↩︎
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f43-never-directly-or-indirectly-return-a-pointer-or-a-reference-to-a-local-object ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/n4910.pdf ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0936r0.pdf ↩︎ ↩︎ ↩︎
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/constants ↩︎
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3038.htm ↩︎
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2917.pdf ↩︎
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#glossary ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-in ↩︎
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines ↩︎