Document number |
P2878R0 |
Date |
2023-5-10 |
Reply-to |
Jarrad J. Waterloo <descender76 at gmail dot com>
|
Audience |
SG23 Safety and Security |
Reference checking
Table of contents
Abstract
This paper proposes that we allow programmers to provide explicit lifetime dependence information to the compiler for the following reasons:
- Standardize the documentation of lifetimes of API(s) for developers
- Standardize the specification of lifetimes for proposals
- Greatly reduce the dangling of the stack for references
This paper is NOT about the following:
- Fixing dangling of the stack that can only be discovered at runtime
- Directly fixing danging of the heap when not using RAII
- Fixing dangling that can occur when using pointers and pointer like types
Rather it is about making those instances of dangling references to the stack which are always bad code, detectable as errors in the language, instead of warnings.
Motivational Example
Disclaimer: I am not a RUST expert. Any RUST examples provided here is to illustrate this feature as a language feature instead of an attribute. This proposal will repeatedly refer to this as an attribute but there is no preference on which, rather it is important to gain the functionality.
What is being asked for is similar to but not exactly like Rust’s feature called explicit lifetimes
.
Example taken from Why are explicit lifetimes needed in Rust?
fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'a u32 {
x
}
Similar but better functionality has been requested in a variety of C++ proposals.
Bind Returned/Initialized Objects to the Lifetime of Parameters, Rev0
const unsigned int& foo(const unsigned int& x lifetimebound, const unsigned int& y) {
return x;
}
indirect dangling identification
[[parameter_dependency(dependent{"return"}, providers{"x"})]]
const unsigned int& foo(const unsigned int& x, const unsigned int& y) {
return x;
}
Towards memory safety in C++
[[dependson(x)]] const unsigned int& foo(const unsigned int& x, const unsigned int& y) {
return x;
}
Rust also allows providing explicit lifetimes
on struct(s) but that is NOT being asked for in this proposal.
Example taken from Why are explicit lifetimes needed in Rust?
struct Foo<'a> {
x: &'a i32,
}
fn main() {
let f : Foo;
{
let n = 5;
let y = &n;
f = Foo { x: y };
};
println!("{}", f.x);
}
Motivation
Having these checks in the language is highly desirable because it is highly effective. Other proposals are advocating for a seperate tool whether that be a static analysis tool or a runtime tool. Those solutions, while needed, are less effective because they are at best only PPE
, personal protective equipment that only works if you have it installed, turned on, configured, used and acted upon. This proposal, while limited, is highly effective because it eliminates these instances of safety issues as illustrated on the [exponential] safety scale .
This proposal is also in line with what that C++ community believes and teaches.
“In.force: Enforcement”
…
“This adds up to quite a few dilemmas. We try to resolve those using tools. Each rule has an Enforcement section listing ideas for enforcement. Enforcement might be done by code review, by static analysis, by compiler, or by run-time checks. Wherever possible, we prefer ‘mechanical’ checking (humans are slow, inaccurate, and bore easily) and static checking. Run-time checks are suggested only rarely where no alternative exists; we do not want to introduce ‘distributed bloat’.”
…
P.5: Prefer compile-time checking to run-time checking
C++ Core Guidelines Bjarne Stroustrup, Herb Sutter
|
Other Proposals
Before going into the technical details, it would be good to consider this in light of other safety related proposals.
Impact on general lifetime safety
This proposal does not conflict with Lifetime safety: Preventing common dangling
. To the contrary, it is complimentary.
“1.1.4 Function calls”
“Finally, since every function is analyzed in isolation, we have to have some way of reasoning about function calls when a function call returns a Pointer. If the user doesn’t annotate otherwise, by default we assume that a function returns values that are derived from its arguments.”
Lifetime safety: Preventing common dangling Herb Sutter
|
Just like the Lifetime safety: Preventing common dangling
paper, this proposal analyze each function in isolation. While that proposal is focused on when programmers do not annotate/document their code, this proposal is focused on when programmers do document their code. That proposal “assume that a function returns values that are derived from its arguments” and because of this assumption has to produce warnings. This proposal makes no assumption at all and consequently can produce errors from the library authors documented intentions. This is both complimentary, independent and both proposals are desired by the C++ community. Since both related features are independent than there is no reason why the compile checks proposal couldn’t be added before the runtime checks proposal.
Impact on Contracts
This proposal also does not conflict with contracts. Matter of fact this proposal could be described in terms of contracts, just applied to existing language features. Since, none of the current contract proposals allow applying contracts to existing and future non functional language features such as return
and do return
, this proposal is complimentary.
“7.1 LIFETIME”
“These sources of undefined behavior pertain to accessing an object outside its lifetime or validity of a pointer. By their very nature, they are not directly syntactic. The approach suggested in this proposal is to prohibit the use of certain syntactic constructs which might – under the wrong circumstances – lead to undefined behavior. Those restrictions are syntactic, so clearly will prohibit cases that someone might find useful.”
Contracts for C++: Prioritizing Safety Gabriel Dos Reis
|
The Contracts for C++: Prioritizing Safety
proposal envisions a predicate called object_address
that could be applied via contracts to functions. In contract like terms, this proposal would be advocating for is_global_object_address
, is_local_object_address
, is_temporary_object_address
, is_not_global_local_temporary_object_address
and then applying these predicates to return
, reference dereference and the points of reference use. Since all of this is compile time information and the places where applied would always produce errors than there is no need for the programmer to add such checks anywhere because the compiler can do it automatically. Further, as this proposal only serve to identify code that is definitely bad, then this proposal does not “prohibit cases that someone might find useful”.
Technical Details
In order to make this proposal work, 2 bits of information is needed per reference at compile time. The 2 bits represents an enumeration of 4 possible lifetime values.
- Is global
- Is local
- Is temporary
- Is all other i.e. not global, local or temporary? i.e. unknown,
nullptr
and dynamic
This lifetime enumeration gets associated with each reference at the point of construction since references have to be initialized, can’t be nullptr
and can’t be rebound.
const int GLOBAL = 42;
void f(int* ip, int& ir)
{
int local = 42;
int& r1 = *ip;
int& r2 = ir;
int& r3 = GLOBAL;
int& r4 = local;
}
The next step is copying a reference copies its lifetime metadata.
const int GLOBAL = 42;
void f(int* ip, int& ir)
{
int local = 42;
int& r1 = *ip;
int& r2 = ir;
int& r3 = GLOBAL;
int& r4 = local;
int& r5 = r1;
int& r6 = r2;
int& r7 = r3;
int& r8 = r4;
}
1st check: Returning a reference to a local produces an error.
const int GLOBAL = 42;
int& f(int* ip, int& ir)
{
int local = 42;
int& r1 = *ip;
int& r2 = ir;
int& r3 = GLOBAL;
int& r4 = local;
int& r5 = r1;
int& r6 = r2;
int& r7 = r3;
int& r8 = r4;
return r8;
}
This error doesn’t give programmers much as C++ addressed this in C++23 with Simpler implicit move
. However, since Simpler implicit move
was framed in terms of value categories than any error message would also be in terms of value categories. This proposal advises such an error would be expressed in terms of dangling which is more human readable for programmers.
Things get really interesting when programmers are allowed to provide explicit lifetime dependence information to the compiler. Unlike Rust’s explicit lifetimes
, this feature, explicit lifetime dependence
, allows a reference to be tied to the lifetimes of multiple other references. In these cases, the lifetime is the most constrained as in temporary
is more constrained than local
which is more constrained than global
.
global > local > temporary
This is also time to add two more checks.
2nd check: Returning a reference to a temporary
produces an error.
3rd check: Using a reference to a temporary
after it has been assigned, i.e. on another line of code which is not the full-expression, produces an error.
const int GLOBAL = 42;
[[dependson(left, right)]]
const int& f1(const int& left, const int& right)
{
if(randomBool())
{
return left;
}
else
{
return right;
}
}
int& f2()
{
int local = 42;
const int& r1 = f1(local, local);
const int& r2 = f1(GLOBAL, GLOBAL);
const int& r3 = f1(42, 42);
const int& r4 = f1(local, GLOBAL);
const int& r5 = f1(local, 42);
const int& r6 = f1(GLOBAL, 42);
if(randomBool())
{
return r1;
}
if(randomBool())
{
return r2;
}
if(randomBool())
{
return r3;
}
if(randomBool())
{
return r4;
}
if(randomBool())
{
return r5;
}
if(randomBool())
{
return r6;
}
int x1 = r3 + 43;
int x2 = r5 + 44;
int x3 = r6 + 45;
return f1(f1(GLOBAL, 4), f1(local, 2));
}
Besides fixing indirect dangling of a local, this also fixes indirect dangling of temporaries which causes immediate dangling.
6.7.7 Temporary objects [class.temporary]
…
4 When an implementation introduces a temporary object of a class that has a non-trivial constructor (11.4.5.2, 11.4.5.3), it shall ensure that a constructor is called for the temporary object. Similarly, the destructor shall be called for a temporary with a non-trivial destructor (11.4.7). Temporary objects are destroyed as the last step in evaluating the full-expression (6.9.1) that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception. The value computations and side effects of destroying a temporary object are associated only with the full-expression, not with any specifc subexpression.
…
(6.12) - A temporary bound to a reference in a new-initializer (7.6.2.8) persists until the completion of the full-expression containing the new-initializer.
[Note 7: This might introduce a dangling reference. - end note]
Working Draft, Standard for Programming Language C++
|
If we fixed nothing else identified in this proposal, that would be a welcome reprieve. However, much more can be done simply. Let’s say, instead of adding this lifetime metadata to each reference, we add it to each instance. For references, lifetime metadata would still say that the reference refers to an instance with a particular lifetime but for non reference and non pointer instances, lifetime metadata would indicate that the instance is dependent upon another instance. Let’s see what type of dangling this would mitigate.
Structs and Classes
Before adding this metadata to instances in general, consider a struct that contains references. This shows that the composition and decomposition of references can be handled by the compiler provided the reference is accessible such as public
.
struct S { int& first; const int& second; };
const int GLOBAL = 42;
[[dependson(left, right)]]
const int& f1(const int& left, const int& right)
{
if(randomBool())
{
return left;
}
else
{
return right;
}
}
int& f2()
{
int local = 42;
S s1{GLOBAL, local};
S s2{local, f1(GLOBAL, 24)};
const int& r1 = s1.first;
const int& r2 = s1.second;
const int& r3 = s2.first;
const int& r4 = s2.second;
if(randomBool())
{
return r1;
}
if(randomBool())
{
return r2;
}
if(randomBool())
{
return r3;
}
if(randomBool())
{
return r4;
}
int x = r4 + 43;
return 42;
}
S f3()
{
int local = 42;
S s1{GLOBAL, local};
S s2{local, f1(GLOBAL, 24)};
if(randomBool())
{
return r1;
}
return r2;
}
auto lambda()
{
int local = 42;
const int& ref_temporary = f1(GLOBAL, 24);
return [&local, &ref_temporary]() -> const int&
{
if(randomBool())
{
return local;
}
return ref_temporary;
};
}
auto coroutine()
{
int local = 42;
const int& ref_temporary = f1(GLOBAL, 24);
return [&local, &ref_temporary]() -> generator<const int&>
{
if(randomBool())
{
co_return local;
}
co_return ref_temporary;
};
}
Is there any way we can mitigate dangling when the reference has been hidden by abstractions such as for reference like classes via protected
, private
access and public
accessors. If lifetime metadata are also appied to instances in general, than constructors, conversion operators and factory functions could be annotated to say a returned reference like type is dependent upon another type. Consider the std::string_view
and sts::string
.
GIVEN
constexpr std::string::operator [[dependson(this)]] std::basic_string_view<CharT, Traits>() const noexcept;
std::string_view sv = "hello world"s;
sv.size();
This would also work with other reference like types such as std::span
and function_ref
. Just as this proposal does not address pointers because they are more run time than compile time, this proposal would not address std::reference_wrapper
since it is rebindable. Nor would this proposal address, std::shared_ptr
and std::unique_ptr
which like pointers are nullable and rebindable, even though they already have runtime safeties built in.
Even though this proposal does not address pointers and pointer like types, it is still useful to non owning versions of these constructs when they are const constructed because they would need to be non null initialized to be usable and wouldn’t be rebindable because they were const
. For instance, std::reference_wrapper
is rebindable like a pointer so it tends to be more runtime than compile time. However, a const std::reference_wrapper
can’t be rebound so it would make sense if a programmer bound its lifetime to another instance.
[[dependson(left, right)]]
const std::reference_wrapper<const int> f(const int& left, const int& right)
{
if(randomBool())
{
return std::cref(left);
}
else
{
return std::cref(right);
}
}
This is also time to mention our final check.
4th check: You can’t use a temporary in a new
expression if the type being instantiated will become dependent upon that temporary.
6.7.7 Temporary objects [class.temporary]
…
4 … (11.4.7). Temporary objects are destroyed as the last step in evaluating the full-expression (6.9.1) …
…
(6.12) - A temporary bound to a reference in a new-initializer (7.6.2.8) persists until the completion of the full-expression containing the new-initializer.
[Note 7: This might introduce a dangling reference. - end note]
[Example 5:
struct S { int mi; const std::pair<int,int>& mp; };
S a { 1, {2,3} };
S* p = new S{ 1, {2,3} };
– end example]
Working Draft, Standard for Programming Language C++
|
In the previous example an instance of type S became dependent upon the temporary, {2, 3}
, because it retained a reference to the temporary. This is detectable because the member mp
is publicly available since the type S
is a struct.
For a class where members mi
and mp
are abstracted away, the dependence can be expressed in its constructor.
class S
{
public:
[[parameter_dependency(dependent{"this"}, providers{"mp"})]]
S(int mi, const std::pair<int,int>& mp);
private:
int mi;
const std::pair<int,int>& mp;
};
S a { 1, {2,3} };
S* p = new S{ 1, {2,3} };
Impact on Pattern Matching
This feature can be enhanced further to work with potential future C++
language features such as pattern matching.
WARNING: If pattern patching can allow propagating references from inner scopes to containing scopes than there will be a new category of dangling added to C++ and consequently a weakening of the safety of references.
This paper references the do expressions
paper for the relevant portion of pattern matching because it is more explicit. If the C++ standard goes with an implicit syntax it can still be an issue if it allow propagating references from inner scopes to containing scopes.
The simplest example is as follows.
int x = do { do return 42; };
The simplest example we have to be on guard against is as follows.
int& x = do
{
do return 42;
};
This paper can similarly handle the indirect cases as was performed for returns.
const int& x = do
{
int local = 42;
const int& r1 = f1(local, local);
const int& r2 = f1(GLOBAL, GLOBAL);
const int& r3 = f1(42, 42);
const int& r4 = f1(local, GLOBAL);
const int& r5 = f1(local, 42);
const int& r6 = f1(GLOBAL, 42);
if(randomBool())
{
do return r1;
}
if(randomBool())
{
do return r2;
}
if(randomBool())
{
do return r3;
}
if(randomBool())
{
do return r4;
}
if(randomBool())
{
do return r5;
}
if(randomBool())
{
do return r6;
}
int x1 = r3 + 43;
int x2 = r5 + 44;
int x3 = r6 + 45;
return f1(f1(GLOBAL, 4), f1(local, 2));
};
Do expressions i.e. pattern matching expressions are not completely self contained like functions. They can directly use locals from containing scopes. Do returning these locals COULD be allowed, while do return
ing locals in the same scope of the do expression
should DEFINITELY be disallowed.
const int& f()
{
int local1 = 42;
const int& rint1 = do
{
int local2 = 42;
const int& rint2 = do
{
int local3 = 42;
if(randomBool())
{
do return local1;
}
if(randomBool())
{
do return local2;
}
do return local3;
};
if(randomBool())
{
do return local1;
}
if(randomBool())
{
do return local2;
}
do return rint2;
};
}
The problem on rint2
is that what should its lifetime response be; an error or unknown? I could think of a couple of solutions.
- Prevent
do expressions
whether explicit or implicit in some other pattern matching proposals from do returning
references to locals that are not declared before the top most nested do
. “An ounce of prevention is worth a pound of cure.”
- Do nothing as this proposal is not meant to fix runtime dangling of the stack
Regardless of the decision that would need to be made, so much definitively bad dangling has been identified and made more transparent.
So, how does this proposal stack up to the design group’s opinion on safety for C++.
- Do not radically break backwards compatibility – compatibility is a key feature and strength of C++ compared to more modern and more fashionable languages.
- This proposal does not break any correct code. It only produces errors for code that definitely dangle.
- Do not deliver safety at the cost of an inability to express the abstractions that are currently at the core of C++ strengths.
- This proposal does not compromise nor remove any of C++ features so abstractions are still there.
- Do not leave us with a “safe” subset of C that eliminates C++’s productivity advantages.
- This proposal works will all existing code. It is purely optin.
- Do not deliver a purely run-time model that imposes overheads that eliminate C++’s strengths in the area of performance.
- This proposal is completely compile time.
- Do not imply that there is exactly one form of “safety” that must be adopted by all.
- This proposal does not conflict with existing safety designs. While most others are runtime, this proposal is purely compile time. As such they are complimentary.
- Do not promise to deliver complete guaranteed type-and-resource safety for all uses.
- This proposal only address dangling of the stack via references which can be easily achieved if library authors provide the compiler a little more information that is needed by library users anyway.
- Do offer paths to gradual and partial adoption – opening paths to improving the billions of lines of existing C++ code.
- This proposal is opt in on a per function basis.
- Do not imply a freeze on further development of C++ in other directions.
- Runtime checks can be performed concurrently.
- Do not imply that equivalent-looking code written in different environments will have different semantics (exception: some environments may give some code “undefined behavior” while others give it (a single!) defined behavior).
- Bad code is bad code regardless of the environment. This proposal makes such bad code more transparent.
DG OPINION ON SAFETY FOR ISO C++ H. Hinnant, R. Orr, B. Stroustrup, D. Vandevoorde, M. Wong
|
Resolution
Now that these types of dangling can be detected, there are some tools that could be provided to developers to make it easier to fix these detected instances of dangling which are NOT a part of this proposal. Let’s go back to our first example.
const int GLOBAL = 42;
[[dependson(left, right)]]
const int& f1(const int& left, const int& right)
{
if(randomBool())
{
return left;
}
else
{
return right;
}
}
const int& f2()
{
int local = 42;
const int& r1 = f1(local, local);
const int& r2 = f1(GLOBAL, GLOBAL);
const int& r3 = f1(42, 42);
const int& r4 = f1(local, GLOBAL);
const int& r5 = f1(local, 42);
const int& r6 = f1(GLOBAL, 42);
if(randomBool())
{
return r1;
}
if(randomBool())
{
return r2;
}
if(randomBool())
{
return r3;
}
if(randomBool())
{
return r4;
}
if(randomBool())
{
return r5;
}
if(randomBool())
{
return r6;
}
int x1 = r3 + 43;
int x2 = r5 + 44;
int x3 = r6 + 45;
return f1(f1(GLOBAL, 4), f1(local, 2));
}
Since locals and temporaries should not be returned from functions, most functions that possess this type of dangling may be in need of some refactoring, perhaps using movable value types. For dangling that occurs in the body of a function, locals need to be moved up in scope and temporaries need to be changed into locals and then perhaps moved up in scope. This results in more lines of code, superfluous naming and excessive refactoring. If the fixed temporary is only ever used in a constant fashion and if it is a literal type and constant initialized than it would likely be manually turned into a global and moved far from the point of use. All of this could be made easier upon programmers with the following features.
- Temporaries that are initially constant referenced, where the type is a literal type and the instance could be constant initialized, then the compiler would automatically promote these to having static storage duration.
- C23 introduced storage-class specifiers for compound literals. If C++ followed suit, than we could be able to apply
static
and constexpr
to our temporaries. Since these two would frequently be used together it could be shortened to constant
or constinit
. C++ could go even farther by introducting a new specifier perhaps called var
for variable scope that would turn the temporary into a anonymously named variable with the same life of the left most instance in the full expression.
This proposal and these three resolutions all satisfy the design group’s opinion on safety for C++.
&✓ |
implicit constant |
explicit constant |
var |
Opinion |
✓ |
✓ |
✓ |
✓ |
Do not radically break backwards compatibility – compatibility is a key feature and strength of C++ compared to more modern and more fashionable languages.
|
✓ |
✓ |
✓ |
✓ |
Do not deliver safety at the cost of an inability to express the abstractions that are currently at the core of C++ strengths.
|
✓ |
✓ |
✓ |
✓ |
Do not leave us with a “safe” subset of C that eliminates C++’s productivity advantages.
|
✓ |
✓ |
✓ |
✓ |
Do not deliver a purely run-time model that imposes overheads that eliminate C++’s strengths in the area of performance.
|
✓ |
✓ |
✓ |
✓ |
Do not imply that there is exactly one form of “safety” that must be adopted by all.
|
✓ |
✓ |
✓ |
✓ |
Do not promise to deliver complete guaranteed type-and-resource safety for all uses.
|
✓ |
✓ |
✓ |
✓ |
Do offer paths to gradual and partial adoption – opening paths to improving the billions of lines of existing C++ code.
|
✓ |
✓ |
✓ |
✓ |
Do not imply a freeze on further development of C++ in other directions.
|
✓ |
✓ |
✓ |
✓ |
Do not imply that equivalent-looking code written in different environments will have different semantics (exception: some environments may give some code “undefined behavior” while others give it (a single!) defined behavior).
|
Summary
The advantages of adopting said proposal are as follows:
- Standardize the documentation of lifetimes of API(s) for developers
- Standardize the specification of lifetimes for proposals
- Produce more meaningful return error messages that doesn’t involve value categories
- Empowers programmers with tools to identify indirect occurences of immediate dangling of references to the stack, simply
- Empowers programmers with tools to identify indirect occurences of return dangling of references to the stack, simply
Frequently Asked Questions
Why not pointers?
References have to be initialized, can’t be nullptr
and can’t be rebound which means by default the lifetime of the instance the reference points to is fixed at the moment of construction which has to exist lower on the stack i.e. prior to reference creation which is known at compile time. This is very safe by default. Pointers and reference classes that has pointer semantics are none of these things. Since they are so dynamic, the relevant metadata would more frequently be needed at run time.
Why not explicit lifetime dependence
on struct
(s)?
- References are rarely used in class definitions. Instead programmers use std::reference_wrapper, smart pointers and plain old pointers. Since all of these are rebindable, they are more runtime than compile time and as such is not the subject of this proposal. See
Why not pointers?
for more information.
Explicit lifetime dependence
on struct
(s)/class
[es] breaks abstraction because the user of a library need to know class/struct implementation details that are likely protected
, private
and internal. For public reference members of struct
instances, the compiler with these enhancements can already propagate these references. Further, explicit lifetime dependence
on functions allows propagating the needed information when applied to constructors, conversion operators and factory functions.
References
Jarrad J. Waterloo <descender76 at gmail dot com>
Reference checking
Table of contents
Abstract
This paper proposes that we allow programmers to provide explicit lifetime dependence information to the compiler for the following reasons:
This paper is NOT about the following:
Rather it is about making those instances of dangling references to the stack which are always bad code, detectable as errors in the language, instead of warnings.
Motivational Example
Disclaimer: I am not a RUST expert. Any RUST examples provided here is to illustrate this feature as a language feature instead of an attribute. This proposal will repeatedly refer to this as an attribute but there is no preference on which, rather it is important to gain the functionality.
What is being asked for is similar to but not exactly like Rust’s feature called
explicit lifetimes
.Example taken from
Why are explicit lifetimes needed in Rust?
[1]Similar but better functionality has been requested in a variety of C++ proposals.
Bind Returned/Initialized Objects to the Lifetime of Parameters, Rev0
[2]indirect dangling identification
[3]Towards memory safety in C++
[4]Rust also allows providing
explicit lifetimes
on struct(s) but that is NOT being asked for in this proposal.Example taken from
Why are explicit lifetimes needed in Rust?
[1:1]Motivation
Having these checks in the language is highly desirable because it is highly effective. Other proposals are advocating for a seperate tool whether that be a static analysis tool or a runtime tool. Those solutions, while needed, are less effective because they are at best only
PPE
, personal protective equipment that only works if you have it installed, turned on, configured, used and acted upon. This proposal, while limited, is highly effective because it eliminates these instances of safety issues as illustrated on the [exponential] safety scale [5].This proposal is also in line with what that C++ community believes and teaches.
“In.force: Enforcement” [6]
…
“This adds up to quite a few dilemmas. We try to resolve those using tools. Each rule has an Enforcement section listing ideas for enforcement. Enforcement might be done by code review, by static analysis, by compiler, or by run-time checks. Wherever possible, we prefer ‘mechanical’ checking (humans are slow, inaccurate, and bore easily) and static checking. Run-time checks are suggested only rarely where no alternative exists; we do not want to introduce ‘distributed bloat’.” [6:1]
…
P.5: Prefer compile-time checking to run-time checking [7]
C++ Core Guidelines [6:2]
Bjarne Stroustrup, Herb Sutter
Other Proposals
Before going into the technical details, it would be good to consider this in light of other safety related proposals.
Impact on general lifetime safety
This proposal does not conflict with
Lifetime safety: Preventing common dangling
[8]. To the contrary, it is complimentary.“1.1.4 Function calls”
“Finally, since every function is analyzed in isolation, we have to have some way of reasoning about function calls when a function call returns a Pointer. If the user doesn’t annotate otherwise, by default we assume that a function returns values that are derived from its arguments.”
Lifetime safety: Preventing common dangling [8:1]
Herb Sutter
Just like the
Lifetime safety: Preventing common dangling
[8:2] paper, this proposal analyze each function in isolation. While that proposal is focused on when programmers do not annotate/document their code, this proposal is focused on when programmers do document their code. That proposal “assume that a function returns values that are derived from its arguments” and because of this assumption has to produce warnings. This proposal makes no assumption at all and consequently can produce errors from the library authors documented intentions. This is both complimentary, independent and both proposals are desired by the C++ community. Since both related features are independent than there is no reason why the compile checks proposal couldn’t be added before the runtime checks proposal.Impact on Contracts
This proposal also does not conflict with contracts. Matter of fact this proposal could be described in terms of contracts, just applied to existing language features. Since, none of the current contract proposals allow applying contracts to existing and future non functional language features such as
return
anddo return
[9], this proposal is complimentary.“7.1 LIFETIME”
“These sources of undefined behavior pertain to accessing an object outside its lifetime or validity of a pointer. By their very nature, they are not directly syntactic. The approach suggested in this proposal is to prohibit the use of certain syntactic constructs which might – under the wrong circumstances – lead to undefined behavior. Those restrictions are syntactic, so clearly will prohibit cases that someone might find useful.” [10]
Contracts for C++: Prioritizing Safety [10:1]
Gabriel Dos Reis
The
Contracts for C++: Prioritizing Safety
[10:2] proposal envisions a predicate calledobject_address
that could be applied via contracts to functions. In contract like terms, this proposal would be advocating foris_global_object_address
,is_local_object_address
,is_temporary_object_address
,is_not_global_local_temporary_object_address
and then applying these predicates toreturn
, reference dereference and the points of reference use. Since all of this is compile time information and the places where applied would always produce errors than there is no need for the programmer to add such checks anywhere because the compiler can do it automatically. Further, as this proposal only serve to identify code that is definitely bad, then this proposal does not “prohibit cases that someone might find useful”.Technical Details
In order to make this proposal work, 2 bits of information is needed per reference at compile time. The 2 bits represents an enumeration of 4 possible lifetime values.
nullptr
and dynamicThis lifetime enumeration gets associated with each reference at the point of construction since references have to be initialized, can’t be
nullptr
and can’t be rebound.The next step is copying a reference copies its lifetime metadata.
1st check: Returning a reference to a local produces an error.
This error doesn’t give programmers much as C++ addressed this in C++23 with
Simpler implicit move
[11]. However, sinceSimpler implicit move
[11:1] was framed in terms of value categories than any error message would also be in terms of value categories. This proposal advises such an error would be expressed in terms of dangling which is more human readable for programmers.Things get really interesting when programmers are allowed to provide explicit lifetime dependence information to the compiler. Unlike Rust’s
explicit lifetimes
, this feature,explicit lifetime dependence
, allows a reference to be tied to the lifetimes of multiple other references. In these cases, the lifetime is the most constrained as intemporary
is more constrained thanlocal
which is more constrained thanglobal
.This is also time to add two more checks.
2nd check: Returning a reference to a
temporary
produces an error.3rd check: Using a reference to a
temporary
after it has been assigned, i.e. on another line of code which is not the full-expression, produces an error.Besides fixing indirect dangling of a local, this also fixes indirect dangling of temporaries which causes immediate dangling.
6.7.7 Temporary objects [class.temporary]
…
4 When an implementation introduces a temporary object of a class that has a non-trivial constructor (11.4.5.2, 11.4.5.3), it shall ensure that a constructor is called for the temporary object. Similarly, the destructor shall be called for a temporary with a non-trivial destructor (11.4.7). Temporary objects are destroyed as the last step in evaluating the full-expression (6.9.1) that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception. The value computations and side effects of destroying a temporary object are associated only with the full-expression, not with any specifc subexpression.
…
(6.12) - A temporary bound to a reference in a new-initializer (7.6.2.8) persists until the completion of the full-expression containing the new-initializer.
[Note 7: This might introduce a dangling reference. - end note]
Working Draft, Standard for Programming Language C++ [12]
If we fixed nothing else identified in this proposal, that would be a welcome reprieve. However, much more can be done simply. Let’s say, instead of adding this lifetime metadata to each reference, we add it to each instance. For references, lifetime metadata would still say that the reference refers to an instance with a particular lifetime but for non reference and non pointer instances, lifetime metadata would indicate that the instance is dependent upon another instance. Let’s see what type of dangling this would mitigate.
Structs and Classes
Before adding this metadata to instances in general, consider a struct that contains references. This shows that the composition and decomposition of references can be handled by the compiler provided the reference is accessible such as
public
.Is there any way we can mitigate dangling when the reference has been hidden by abstractions such as for reference like classes via
protected
,private
access andpublic
accessors. If lifetime metadata are also appied to instances in general, than constructors, conversion operators and factory functions could be annotated to say a returned reference like type is dependent upon another type. Consider thestd::string_view
andsts::string
.GIVEN
This would also work with other reference like types such as
std::span
andfunction_ref
[13]. Just as this proposal does not address pointers because they are more run time than compile time, this proposal would not addressstd::reference_wrapper
since it is rebindable. Nor would this proposal address,std::shared_ptr
andstd::unique_ptr
which like pointers are nullable and rebindable, even though they already have runtime safeties built in.Even though this proposal does not address pointers and pointer like types, it is still useful to non owning versions of these constructs when they are const constructed because they would need to be non null initialized to be usable and wouldn’t be rebindable because they were
const
. For instance,std::reference_wrapper
is rebindable like a pointer so it tends to be more runtime than compile time. However, aconst std::reference_wrapper
can’t be rebound so it would make sense if a programmer bound its lifetime to another instance.This is also time to mention our final check.
4th check: You can’t use a temporary in a
new
expression if the type being instantiated will become dependent upon that temporary.6.7.7 Temporary objects [class.temporary]
…
4 … (11.4.7). Temporary objects are destroyed as the last step in evaluating the full-expression (6.9.1) …
…
(6.12) - A temporary bound to a reference in a new-initializer (7.6.2.8) persists until the completion of the full-expression containing the new-initializer.
[Note 7: This might introduce a dangling reference. - end note]
[Example 5:
– end example]
Working Draft, Standard for Programming Language C++ [12:1]
In the previous example an instance of type S became dependent upon the temporary,
{2, 3}
, because it retained a reference to the temporary. This is detectable because the membermp
is publicly available since the typeS
is a struct.For a class where members
mi
andmp
are abstracted away, the dependence can be expressed in its constructor.Impact on Pattern Matching
This feature can be enhanced further to work with potential future
C++
language features such as pattern matching.WARNING: If pattern patching can allow propagating references from inner scopes to containing scopes than there will be a new category of dangling added to C++ and consequently a weakening of the safety of references.
This paper references the
do expressions
[9:1] paper for the relevant portion of pattern matching because it is more explicit. If the C++ standard goes with an implicit syntax it can still be an issue if it allow propagating references from inner scopes to containing scopes.The simplest example is as follows.
The simplest example we have to be on guard against is as follows.
This paper can similarly handle the indirect cases as was performed for returns.
Do expressions i.e. pattern matching expressions are not completely self contained like functions. They can directly use locals from containing scopes. Do returning these locals COULD be allowed, while
do return
ing locals in the same scope of thedo expression
should DEFINITELY be disallowed.The problem on
rint2
is that what should its lifetime response be; an error or unknown? I could think of a couple of solutions.do expressions
whether explicit or implicit in some other pattern matching proposals fromdo returning
references to locals that are not declared before the top most nesteddo
. “An ounce of prevention is worth a pound of cure.”Regardless of the decision that would need to be made, so much definitively bad dangling has been identified and made more transparent.
So, how does this proposal stack up to the design group’s opinion on safety for C++.
DG OPINION ON SAFETY FOR ISO C++ [14:9]
H. Hinnant, R. Orr, B. Stroustrup, D. Vandevoorde, M. Wong
Resolution
Now that these types of dangling can be detected, there are some tools that could be provided to developers to make it easier to fix these detected instances of dangling which are NOT a part of this proposal. Let’s go back to our first example.
Since locals and temporaries should not be returned from functions, most functions that possess this type of dangling may be in need of some refactoring, perhaps using movable value types. For dangling that occurs in the body of a function, locals need to be moved up in scope and temporaries need to be changed into locals and then perhaps moved up in scope. This results in more lines of code, superfluous naming and excessive refactoring. If the fixed temporary is only ever used in a constant fashion and if it is a literal type and constant initialized than it would likely be manually turned into a global and moved far from the point of use. All of this could be made easier upon programmers with the following features.
static
andconstexpr
to our temporaries. Since these two would frequently be used together it could be shortened toconstant
orconstinit
. C++ could go even farther by introducting a new specifier perhaps calledvar
for variable scope that would turn the temporary into a anonymously named variable with the same life of the left most instance in the full expression. [17]This proposal and these three resolutions all satisfy the design group’s opinion on safety for C++.
constant
constant
Do not radically break backwards compatibility – compatibility is a key feature and strength of C++ compared to more modern and more fashionable languages. [14:10]
Do not deliver safety at the cost of an inability to express the abstractions that are currently at the core of C++ strengths. [14:11]
Do not leave us with a “safe” subset of C that eliminates C++’s productivity advantages. [14:12]
Do not deliver a purely run-time model that imposes overheads that eliminate C++’s strengths in the area of performance. [14:13]
Do not imply that there is exactly one form of “safety” that must be adopted by all. [14:14]
Do not promise to deliver complete guaranteed type-and-resource safety for all uses. [14:15]
Do offer paths to gradual and partial adoption – opening paths to improving the billions of lines of existing C++ code. [14:16]
Do not imply a freeze on further development of C++ in other directions. [14:17]
Do not imply that equivalent-looking code written in different environments will have different semantics (exception: some environments may give some code “undefined behavior” while others give it (a single!) defined behavior). [14:18]
Summary
The advantages of adopting said proposal are as follows:
Frequently Asked Questions
Why not pointers?
References have to be initialized, can’t be
nullptr
and can’t be rebound which means by default the lifetime of the instance the reference points to is fixed at the moment of construction which has to exist lower on the stack i.e. prior to reference creation which is known at compile time. This is very safe by default. Pointers and reference classes that has pointer semantics are none of these things. Since they are so dynamic, the relevant metadata would more frequently be needed at run time.Why not
explicit lifetime dependence
onstruct
(s)?Why not pointers?
for more information.Explicit lifetime dependence
onstruct
(s)/class
[es] breaks abstraction because the user of a library need to know class/struct implementation details that are likelyprotected
,private
and internal. For public reference members ofstruct
instances, the compiler with these enhancements can already propagate these references. Further,explicit lifetime dependence
on functions allows propagating the needed information when applied to constructors, conversion operators and factory functions.References
https://stackoverflow.com/questions/31609137/why-are-explicit-lifetimes-needed-in-rust ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0936r0.pdf ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2742r2.html ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2771r0.html ↩︎
https://www.cdc.gov/niosh/topics/hierarchy/default.html ↩︎
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#inforce-enforcement ↩︎ ↩︎ ↩︎
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#p5-prefer-compile-time-checking-to-run-time-checking ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1179r1.pdf ↩︎ ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2806r1.html ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2680r1.pdf ↩︎ ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2266r3.html ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/n4910.pdf ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p0792r14.html ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2759r1.pdf ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2724r1.html ↩︎
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3038.htm ↩︎
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2658r1.html ↩︎