Document #: | P2748R1 |
Date: | 2023-05-15 |
Project: | Programming Language C++ |
Audience: |
Evolution |
Reply-to: |
Brian Bi <bbi10@bloomberg.net> |
The following code contains a bug: The code initializes a reference
from an object of a different type (the programmer has forgotten that
the first element of the pair is
const
), resulting in the
creation of a temporary. As a result, the reference
d_first
is always dangling:
struct X {
const std::map<std::string, int> d_map;
const std::pair<std::string, int>& d_first;
(const std::map<std::string, int>& map)
X: d_map(map), d_first(*d_map.begin()) {}
};
Luckily, the above code is actually ill formed (11.9.3 [class.base.init]/8). But valid code can contain essentially the same bug:
struct Y {
::map<std::string, int> d_map;
std
const std::pair<std::string, int>& first() const {
return *d_map.begin();
}
};
This code is valid, although compilers might warn. Like the first in this paper, this code snippet always produces a dangling reference. We should make this code likewise ill formed.
A colleague recently reported another example. A program appeared to
be accessing memory that was not safe to access. The bug was ultimately
caused by the following function returning by reference though it should
not have. This bug was difficult to find but would have been easy if the
return
statement were simply ill
formed.
const std::string_view& getString() {
static std::string s;
return s;
}
In [CWG1696], Richard Smith pointed out that, though binding a reference member to a temporary in a mem-initializer was explicitly called out in the Standard as one of the cases in which the lifetime of the temporary is not extended to the lifetime of the reference, no corresponding wording was offered for the case in which the expression that produces the temporary is supplied by a default member initializer.
Initially, the proposed resolution simply resolved the inconsistency in favor of explicitly specifying that brace-or-equal-initializers behave the same way as mem-initializers (i.e., neither extends lifetime). However, at the Issaquah meeting in 2014, making both ill formed was suggested. CWG appears to have accepted this suggestion without controversy. (At the Urbana-Champaign meeting later that year, Issue 1696 was given DR status.)
This change was so uncontroversial because binding a reference to a temporary, when the reference will outlive the temporary and become dangling as soon as the full-expression completes, is always a bug. In some simple cases, a novice programmer might not understand that a temporary must be materialized when binding a reference to a prvalue. On the other hand, the examples given in the introduction represent code that experienced C++ developers can easily write.
The dangling reference created by
X
’s constructor is always a bug,
and the same is true for the dangling reference created by
Y::first
. In fact, one can
imagine some obscure situations in which binding a reference member to a
temporary in a mem-initializer could be useful to cache the
result of an expensive computation, which could then be used by later
mem-initializers and within the compound-statement of
the constructor. In contrast, when binding a returned glvalue to a
temporary, even such obscure, limited applications seem nonexistent.
I propose, therefore, to make binding a returned glvalue to a temporary likewise ill formed.
Note that recent versions of Clang, GCC, and MSVC all issue warnings that explain the creation of the dangling reference. The availability of such warnings raises the question of whether programmers should simply use compiler flags to convert those warnings into errors, thus obtaining all the benefits of this proposal with no need for a language change. However, at least in Clang and GCC, the warnings have false positives, which (as discussed in Section 5) occur because they are not as narrowly scoped as this proposal. More broadly, compiler warnings are no substitute for language rules because the warnings lack formal specification and are not portable.
return
statements?At the February 2023 meeting in Issaquah, EWG asked for improved
wording related to unevaluated contexts. However, no such thing as an
unevaluated return
statement
exists (at least from the core language point of view; see Section 6 for
discussion of the library).
6.3
[basic.def.odr]/3
defines a conversion as potentially evaluated unless it is “an
unevaluated operand, a subexpression thereof, or a conversion in an
initialization or conversion sequence in such a context.” Because a
return
statement is not an
expression statement, the only kind of expression a
return
statement can appear
within is a lambda expression, but the statements in the body of a
lambda expression are not subexpressions of the lambda expression
(6.9.1
[intro.execution]/3.3),
so even if the lambda expression is unevaluated, the statements in its
body are still potentially evaluated.
This definition is not simply a technicality but follows from the
very nature of function definitions in C++. When the body of a lambda
expression is instantiated, a function definition is created, and a
function definition created by an instantiation triggered from an
unevaluated context is no different from any other definition of the
same function. In particular, that function may be ODR-used at some
later point, but the compiler is not expected to instantiate it a second
time, since the instantiation from an unevaluated context is as good as
any other instantiation. Attempting to carve out a narrow exemption that
applies exclusively to return
statements appearing lexically within lambda expressions that are not
potentially evaluated would therefore fail to actually prevent such
return
statements from being
evaluated at run time.
For this reason, my proposal does not include carving out an exemption for lambdas in unevaluated contexts. This exclusion raises the question of whether the proposal would disallow some useful metaprogramming techniques.
[P0315R2] discusses two use cases for
lambdas in unevaluated contexts. In both of these use cases, the lambda
is used only for the signature of its function call operator. In such
cases, the return
statement in
the lambda could be eliminated, and the lambda could be given a trailing
return type instead. Rewriting the code in this fashion is annoying but
will be necessary in only the tiny fraction of cases where lambdas in
unevaluated contexts currently contain
return
statements that would
create dangling references if they were to be evaluated. The benefits of
this proposal outweigh the inconvenience that would be inflicted in
those very few cases.
As evidence that this situation is almost nonexistent, consider that
recent versions of Clang and GCC do not distinguish
return
statements appearing in
unevaluated lambda expressions from those that appear in any other
function and will issue a warning even in cases such as the
following:
::string_view sv;
stddecltype ( [] () -> const std::string_view& {
static std::string s;
return s;
} () ) svr = sv;
I searched the Clang and GCC bug trackers for reports of false
positives for the
-Wreturn-stack-address
and
-Wreturn-local-addr
flags,
respectively. Some false positives were reported, but they generally
appear to be related to these warnings going far beyond the set of
situations that this paper proposes to make ill formed; the warnings
perform a flow analysis to check whether a returned pointer value might
have been derived directly or indirectly from the address of a temporary
or an automatic variable. GCC bug
100403 and Clang bug
44003 are representative of this class of bugs. I found no issues in
which a user opined that the warning should not fire because the
return
statement was in a lambda
expression in an unevaluated context.
std::is_convertible
As pointed out at the February 2023 meeting in Issaquah, the current
definition of the
std::is_convertible
type trait
(21.3.7
[meta.rel]/5) depends on
the well-formedness of a return
statement but is intended to detect implicit convertibility in general.
For this reason, the proposal must ensure that the meaning of
std::is_convertible
does not
change; for example, std::is_convertible_v<int, const double&>
should continue to be true
.
Since, as discussed previously, no such thing as an unevaluated
return
statement exists, giving
a blanket exemption for such nonexistent entities is an impractical
solution to this problem. Instead, three possible approaches present
themselves.
std::is_convertible
in terms of
a piece of code that does not contain a
return
statement.std::is_convertible
in terms of
the core language concept of implicit convertibility.The second approach is feasible if we assume (as current
implementations do) that the To
type must be destructible. In that case, std::is_convertible_v<From, To>
is true if all the following conditions are met.
To
is not an array
type.To
is not a function
type.To
or
From
is cv
void
, then so is the other.To
nor
From
is cv
void
, then requires (void (*f)(To), From&& arg) { f(static_cast<From&&>(arg)) }
is true
.However, since [LWG3400] is
unresolved, the specification of std::is_convertible<From, To>
could possibly be changed to exclude consideration of the destructor
(which appears to imply that the implementation will require compiler
magic). The second approach would therefore assign an interpretation to
the current specification of
std::is_convertible
that would
be contentious in LWG; furthermore, the effort that would be spent in
LWG on codifying this approach would be wasted if LWG later decided to
exclude the destructor. I am therefore not proposing adopting this
approach at this time.
The third approach also suffers from similar issues. Implicit
convertibility is defined by 7.3.1
[conv.general]/3 in
terms of the well-formedness of a hypothetical declaration employing
copy-initialization. Plainly, such a declaration is not well-formed if
the destination type is not destructible, so taking this approach
assumes a particular disposition for LWG3400. Expressing
std::is_convertible
in terms of
the existence of an implicit conversion sequence (as defined by
12.2.4.2.1
[over.best.ics.general])
would assume the opposite disposition, while also subjecting the library
to the unresolved issue that is the subject of [CWG2525].
Therefore, I propose the first approach.
The proposed wording is relative to [N4928].
Strike bullet (6.11) in section 6.7.7 [class.temporary]:
The lifetime of a temporary bound to the returned value in a functionreturn
statement (8.7.4) is not extended; the temporary is destroyed at the end of the full-expression in thereturn
statement.
Insert a new paragraph, 6, at the end of section 8.7.4 [stmt.return]:
In a function whose return type is a reference, a
return
statement that binds the returned reference to a temporary expression ([class.temporary]) is ill-formed.
[Example 2:auto&& f1() { return 42; // ill-formed } const double& f2() { static int x = 42; return x; // ill-formed } auto&& id(auto&& r) { return static_cast<decltype(r)&&>(r); } auto&& f3() { return id(42); // OK, but probably a bug }
— end example]
(Note: See [CWG GitHub issue 200] regarding a possible issue with the above wording.)
Edit 21.3.7 [meta.rel]/5:
The predicate condition for a template specialization
is_convertible<From, To>
shall be satisfied if and only if the return expression in the following code would be well-formed, including any implicit conversions to the return type of the function,() { To testreturn declval<From>(); }
[Note 2: This requirement gives well-defined results for reference types, array types, function types, and cv
void
. — end note]For the purposes of this paragraph, a
return
statement that is ill-formed only because it binds the returned reference to a temporary expression [class.temporary] is considered to be well-formed. Access checking is performed in a context unrelated toTo
andFrom
. Only the validity of the immediate context of the expression of thereturn
statement ([stmt.return]) (including initialization of the returned object or reference) is considered.[Note 3: The initialization can result in side effects such as the instantiation of class template specializations and function template specializations, the generation of implicitly-defined functions, and so on. Such side effects are not in the “immediate context” and can result in the program being ill-formed. — end note]
is_nothrow_convertible
consider
destruction of the destination type?