1. Change history
Since [P0593R0]:
-
Paper expanded from Ville’s original call for solutions to a description of a proposed solution, based on SG12 discussion.
Since [P0593R1]:
Incorporated further SG12 feedback:
-
An explicit syntactic marker is required to indicate that objects should be created. Existing obvious markers, such as the use of
, or simply performing member access on a union, suffice.malloc -
Expand set of implicit lifetime types to require either a trivial default constructor or a trivial copy/move constructor, rather than requiring both.
-
Types with only a trivial default constructor may be suitable for member-by-member construction via class member access, even if the copy or move constructor is non-trivial.
-
Types with only a trivial copy/move constructor may be suitable for initialization by copying (for example) an on-disk representation into memory, even if the default constructor is non-trivial.
-
-
Define the C standard library
andmemcpy
functions as triggering implicit object creation.memmove -
Add description of suggested "typed" form of
.std :: bless
Since [P0593R2]:
-
Removed union member access being sufficient to implicitly create objects based on objections from an implementer.
Since [P0593R3]:
-
Incorporated EWG feedback:
-
Aggregates are considered implicit lifetime types
-
Provide a typed version of
std :: bless
-
-
Added wording.
2. Motivating examples
2.1. Idiomatic C code as C++
Consider the following natural C program:
struct X { int a , b ; }; X * make_x () { X * p = ( X * ) malloc ( sizeof ( struct X )); p -> a = 1 ; p -> b = 2 ; return p ; }
When compiled with a C++ compiler, this code has undefined behavior, because
attempts to write to an
subobject of an
object, and this
program never created either an
object nor an
subobject.
Per [intro.object]p1,
An object is created by a definition, by a new-expression, when implicitly changing the active member of a union, or when a temporary object is created.
... and this program did none of these things.
2.2. Objects provided as byte representation
Suppose a C++ program is given a sequence of bytes (perhaps from disk or from a
network), and it knows those bytes are a valid representation of type
. How
can it efficiently obtain a
that can be legitimately used to access the
object?
Example: (many details omitted for brevity)
void process ( Stream * stream ) { unique_ptr < char [] > buffer = stream -> read (); if ( buffer [ 0 ] == FOO ) process_foo ( reinterpret_cast < Foo *> ( buffer . get ())); // #1 else process_bar ( reinterpret_cast < Bar *> ( buffer . get ())); // #2 }
This code leads to undefined behavior today: within
, no
or
object is created, and so any attempt to access a
object through the
produced by the cast at #1 would result in undefined behavior.
2.3. Dynamic construction of arrays
Consider this program that attempts to implement a type like
(with many details omitted for brevity):
template < typename T > struct Vec { char * buf = nullptr , * buf_end_size = nullptr , * buf_end_capacity = nullptr ; void reserve ( std :: size_t n ) { char * newbuf = ( char * ) :: operator new ( n * sizeof ( T ), std :: align_val_t ( alignof ( T ))); std :: uninitialized_copy ( begin (), end (), ( T * ) newbuf ); // #a :: operator delete ( buf , std :: align_val_t ( alignof ( T ))); buf_end_size = newbuf + sizeof ( T ) * size (); // #b buf_end_capacity = newbuf + sizeof ( T ) * n ; // #c buf = newbuf ; } void push_back ( T t ) { if ( buf_end_size == buf_end_capacity ) reserve ( std :: max < std :: size_t > ( size () * 2 , 1 )); new ( buf_end_size ) T ( t ); buf_end_size += sizeof ( T ); // #d } T * begin () { return ( T * ) buf ; } T * end () { return ( T * ) buf_end_size ; } std :: size_t size () { return end () - begin (); } // #e }; int main () { Vec < int > v ; v . push_back ( 1 ); v . push_back ( 2 ); v . push_back ( 3 ); for ( int n : v ) { /*...*/ } // #f }
In practice, this code works across a range of existing implementations, but according to the C++ object model, undefined behavior occurs at points #a, #b, #c, #d, and #e, because they attempt to perform pointer arithmetic on a region of allocated storage that does not contain an array object.
At locations #b, #c, and #d, the arithmetic is performed on a
, and at
locations #a, #e, and #f, the arithmetic is performed on a
. Ideally, a
solution to this problem would imbue both calculations with defined behavior.
3. Approach
The above snippets have a common theme: they attempt to use objects that they never created. Indeed, there is a family of types for which programmers assume they do not need to explicitly create objects. We propose to identify these types, and carefully carve out rules that remove the need to explicitly create such objects, by instead creating them implicitly.
3.1. Affected types
If we are going to create objects automatically, we need a bare minimum of the following two properties for the type:
1) Creating an instance of the type runs no code. For class types, having a trivially default constructible type is often the right constraint. However, we should also consider cases where initially creating an object is non-trivial, but copying it (for instance, from an on-disk representation) is trivial.
2) Destroying an instance of the type runs no code. If the type maintains invariants, we should not be implicitly creating objects of that type.
Note that we’re only interested in properties of the object itself here, not of its subobjects. In particular, the above two properties always hold for array types. While creating or destroying array elements might run code, creating the array object (without its elements) does not. For similar reasons, it also seems reasonable to permit implicit object creation for aggregate class types even if the aggregate contains an element with a non-trivial destructor.
This suggests that the largest set of types we could apply this to is:
-
Scalar types
-
Aggregate types (arrays with any element type, aggregate classes with any members)
-
Class types with a trivial destructor and a trivial constructor (of any kind)
(Put another way, we can apply this to all types other than function type,
reference type,
, and class types where all constructors are non-trivial
or where the destructor is non-trivial.)
We will call types that satisfy the above constraints implicit lifetime types.
3.2. When to create objects
In the above cases, it would be sufficient for
/
to implicitly create sufficient objects to make the examples work. Imagine
that
could "look into the future" and see how its storage would be
used, and create the set of objects that the program would eventually need.
If we somehow specified that
did this, the behavior of many C-style
use cases would be defined.
On typical implementations, we can argue that this is not only natural, it is
in some sense the status quo. Because the compiler typically does not make
assumptions about what objects are created within the implementation of
, and because object creation itself typically has no effect on the
physical machine, the compiler must generate code that would be correct if
did create that correct set of objects.
However, this is not always sufficient. An allocation from
may be
sequentially used to store multiple different types, for instance by way
of a memory pool that recycles the same allocation for multiple objects of
the same size. It should be possible to grant such cases the same power to
implicitly create objects as is de facto granted to
.
We could specify that implicit object creation happens automatically at any program point that relies on an object existing. This has a great deal of appeal: no explicit program action is ever required to create objects, and it directly describes a simple model where objects are not distinguished from the storage they occupy (this model gives the same results as C’s "effective type" model in most cases). However, it also removes much of the power of scalar type-based alias analysis. The C committee has long been struggling with the conflict between their desire to support TBAA and their version of this rule, as exemplified by C’s DR 236 ([C236]), which lists a "resolution" not reflected by the standard wording and that undesirably grants special powers to function call boundaries (this is one of at least four different and incompatible rules the C committee has at one point or another taken as the resolution to that defect). The lack of a reasonable resolution to these problems, despite them being known for nearly two decades, suggests that this is not a good path forward.
Therefore we propose the following rule:
Some operations are described as implicitly creating objects within a specified region of storage. The abstract machine creates objects of implicit lifetime types within those regions of storage as needed to give the program defined behavior. For each operation that is specified as implicitly creating objects, that operation implicitly creates zero or more objects in its specified region of storage if doing so would give the program defined behavior. If no such sets of objects would give the program defined behavior, the behavior of the program is undefined.
The coherence of the above rule hinges on a key observation: changing the set of objects that are implicitly created can only change whether a particular program execution has defined behavior, not what the behavior is.
We propose that at minimum the following operations be specified as implicitly creating objects:
-
Creation of an array of
,char
, orunsigned char
implicitly creates objects within that array.std :: byte -
A call to
,malloc
,calloc
, or any function namedrealloc
oroperator new
implicitly creates objects in its returned storage.operator new [] -
likewise implicitly creates objects in its returned storage; the allocator requirements should require other allocator implementations to do the same.std :: allocator < T >:: allocate -
A call to
behaves as if itmemmove -
copies the source storage to a temporary area
-
implicitly creates objects in the destination storage, and then
-
copies the temporary storage to the destination storage.
This permits
to preserve the types of trivially-copyable objects, or to be used to reinterpret a byte representation of one object as that of another object.memmove -
-
A call to
behaves the same as a call tomemcpy
except that it introduces an overlap restriction between the source and destination.memmove -
A new barrier operation (distinct from
, which does not create objects) should be introduced to the standard library, with semantics equivalent to astd :: launder
with the same source and destination storage. As a strawman, we suggest:memmove // Requires: [start, (char*)start + length) denotes a region of allocated // storage that is a subset of the region of storage reachable through start. // Effects: implicitly creates objects within the denoted region. void std :: bless ( void * start , size_t length );
In addition to the above, an implementation-defined set of non-standard memory
allocation and mapping functions, such as
on POSIX systems and
on Windows systems, should be specified as implicitly creating
objects.
Note that a pointer
is not considered sufficient to trigger
implicit object creation.
3.3. Type punning
We do not wish examples such as the following to become valid:
float do_bad_things ( int n ) { alignof ( int ) alignof ( float ) char buffer [ max ( sizeof ( int ), sizeof ( float ))]; * ( int * ) buffer = n ; // #1 std :: bless ( buffer , sizeof ( buffer )); return ( * float * ) buffer ; // #2 }
float do_bad_things ( int n ) { union { int n ; float f ; } u ; u . n = n ; // #1 std :: bless ( & u , sizeof ( u )); return u . f ; // #2 }
The proposed rule would permit an
object to spring into existence
to make line #1 valid (in each case), and would permit a
object to
likewise spring into existence to make line #2 valid.
However, these examples still do not have defined behavior under the proposed rule. The reason is a consequence of [basic.life]p4:
The properties ascribed to objects and references throughout this document apply for a given object or reference only during its lifetime.
Specifically, the value held by an object is only stable throughout its
lifetime. When the lifetime of the
object in line #1 ends (when
its storage is reused by the
object in line #2), its value is
gone. Symmetrically, when the
object is created, the object has
an indeterminate value ([dcl.init]p12), and therefore any attempt to
load its value results in undefined behavior.
Thus we retain the property (essential to modern scalar type-based alias analysis) that loads of some scalar type can be considered to not alias earlier stores of unrelated scalar types.
3.4. Constant expressions
Constant expression evaluation is currently very conservative with regard to object creation. There is a tension here: on the one hand, constant expression evaluation gives us an opportunity to disallow runtime program semantics that we consider undesirable or problematic, and on the other hand, users strongly desire a full compile-time evaluation mechanism with the same semantics as the base language.
Following the existing conservatism in constant expression evaluation (eg, the disallowance of changing the active member of a union), we propose that the implicit creation of objects should not be performed during such evaluation.
3.5. Pseudo-destructor calls
In the current C++ language rules, "pseudo-destructor" calls may be used in generic code to allow such code to be ambivalent as to whether an object is of class type:
template < typename T > void destroy ( T * p ) { p ->~ T (); }
When
is, say,
, the pseudo-destructor expression
is specified
as having no effect. We believe this is an error: such an expression should have
a lifetime effect, ending the lifetime of the
object. Likewise, calling a
destructor of a class object should always end the lifetime of that object,
regardless of whether the destructor is trivial.
This change improves the ability of static and dynamic analysis tools to reason about the lifetimes of C++ objects.
3.6. Practical examples
std :: vector < int > vi ; vi . reserve ( 4 ); vi . push_back ( 1 ); int * p = & vi . back (); vi . push_back ( 2 ); vi . push_back ( 3 ); int n = * p ;
Within the implementation of
, some storage is allocated to hold
an array of up to 4
s. Ignoring minor differences, there are two ways
to create implicit objects to give the execution of this program defined
behavior: within the allocated storage, either an
object or an
object is created. Both are correct interpretations of the program,
and naturally both result in the same behavior. We can choose to view the
program as being in the superposition of those two states. If we add a fourth
call to the program prior to the initialization of
, then only
the
interpretation remains valid.
unique_ptr < char [] > Stream :: read () { // ... determine data size ... unique_ptr < char [] > buffer ( new char [ N ]); // ... copy data into buffer ... return buffer ; } void process ( Stream * stream ) { unique_ptr < char [] > buffer = stream -> read (); if ( buffer [ 0 ] == FOO ) process_foo ( reinterpret_cast < Foo *> ( buffer . get ())); // #1 else process_bar ( reinterpret_cast < Bar *> ( buffer . get ())); // #2 }
Note the
implicitly creates objects within the allocated array.
In this case, the program would have defined behavior if an object of type
or
(as appropriate for the content of the incoming data) were
implicitly created prior to
populating its buffer. Therefore,
regardless of which arm of the
is taken, there is a set of implicit
objects sufficient to give the program defined behavior, and thus the behavior
of the program is defined.
3.7. Direct object creation
In some cases it is desirable to change the dynamic type of existing storage
while maintaining the object representation. If the destination type is an
implicit lifetime type, this can be accomplished by usage of
to
change the type, followed by
to acquire a pointer to the
newly-created object. However, for expressivity and optimizability, a combined
operation to create an object of implicit lifetime type in-place while
preserving the object representation may be useful. For this we propose:
// Effects: create an object of implicit lifetype type T in the storage // pointed to by T, while preserving the object representation. template < typename T > T * bless ( void * p );
Note that such an operation is not sufficient to implement
([P0083R3]) for map-like containers.
requires the
ability to take a
and permit mutation of the
portion (without destroying and recreating the
object), even when
is not an implicit lifetime type, so the above operation does not
suffice. However, we could imagine extending its semantics to also permit
conversions where each subobject of non-implicit-lifetime type in the
destination corresponds to an object of the same type (ignoring
cv-qualifications) in the source.
4. Wording
4.1. 6.6.2 Object model [intro.object]
Change in 6.6.2 [intro.object] paragraph 1:
The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is created by a definition (6.1), by a new-expression (7.6.2.4), by an operation that implicitly creates objects (see below), when implicitly changing the active member of a union (10.4), or when a temporary object is created (7.3.4, 6.6.7). [...]
Add a new paragraph at the end of [intro.object]:
Some operations are described as implicitly creating objects within a specified region of storage. For each operation that is specified as implicitly creating objects, that operation implicitly creates zero or more objects of implicit lifetime types (6.7 [basic.types]) in its specified region of storage if doing so would result in the program having defined behavior. If no such sets of objects would give the program defined behavior, the behavior of the program is undefined.
Add an example following the new paragraph:
[ Example:-- end example ]#include <cstdlib>struct X { int a , b ; }; X * make_x () { // The call to std::malloc implicitly creates an object of type X // and its subobjects a and b in order to give the subsequent // class member access operations defined behavior. X * p = ( X * ) std :: malloc ( sizeof ( struct X )); p -> a = 1 ; p -> b = 2 ; return p ; }
Add another paragraph:
An operation that begins the lifetime of an array of,
char , or
unsigned char implicitly creates objects within the region of storage occupied by the array. [ Note: the array object provides storage for these objects. -- end note ] Any implicit or explicit invocation of a function named
std :: byte or
operator new implicitly creates objects in the returned region of storage. Some functions in the C++ standard library implicitly create objects (16.6.x [ptr.bless], 19.10.9.2 [allocator.traits.members], 19.10.12 [c.malloc], 20.5.3 [cstring.syn]).
operator new []
4.2. 6.6.3 Object and reference lifetime [basic.life]
No change in 6.6.3 [basic.life] paragraph 1:
[...] The lifetime of an object
of type
o ends when:
T
if
is a non-class type, the object is destroyed, or
T if
is a class type, the destructor call starts, or
T the storage which the object occupies is released, or is reused by an object that is not nested within
(6.6.2).
o
4.3. 6.7 Types [basic.types]
Change in 6.7 [basic.types] paragraph 9:
[...] Scalar types, trivial class types (10.1), arrays of such types and cv-qualified versions of these types are collectively called trivial types. Scalar types, standard-layout class types (10.1), arrays of such types and cv-qualified versions of these types are collectively called standard-layout types. Scalar types, implicit-lifetime class types ([class.prop]), array types, and cv-qualified versions of these types are collectively called implicit-lifetime types.
4.4. 7.5.4.3 Destruction [expr.prim.id.dtor]
Change in 7.5.4.3 [expr.prim.id.dtor] paragraph 2:
If the id-expression names a pseudo-destructor,
shall be a scalar type and the id-expression shall appear as the right operand of a class member access (7.6.1.4) that forms the postfix-expression of a function call (7.6.1.2). [Note: Such a call
T has no effectends the lifetime of the object ([expr.call], [basic.life]) . —end note]
4.5. 7.6.1.2 Function call [expr.call]
Change in 7.6.1.2 [expr.call] paragraph 5:
[...] If the postfix-expression names a pseudo-destructor, the postfix-expression must be a possibly-parenthesized class member access (7.6.1.4), and the function call
has no effectdestroys the object of scalar type denoted by the object expression of the class member access .
4.6. 10.1 Properties of classes [class.prop]
Add a new paragraph at the end of [class.prop]:
A class S
is an implicit-lifetime class if it
is an aggregate ([dcl.aggr]) or has
at least one trivial, non-deleted constructor and
a trivial, non-deleted destructor.
4.7. 16.6.1 Header < new >
synopsis [new.syn]
Add the following before the declaration of
:
// [ptr.bless] Implicit object creation void std :: bless ( void * start , size_t length ); template < typename T > T * std :: bless ( void * p );
Add the following subclause immediately before 16.6.4 [ptr.launder]:
4.8. 16.6.x Implicit object creation [ptr.bless]
void std :: bless ( void * start , size_t length );
Requires: [,
start ) denotes a region of allocated storage that is a subset of the region of storage reachable through
( char * ) start + length .
start
Effects: Implicitly creates objects (6.6.2 [intro.object]) within the denoted region.
template < typename T > T * std :: bless ( void * p );
Mandates: T
is an implicit lifetime type.
Requires: [,
p ) denotes a region of allocated storage that is a subset of the region of storage reachable through
( char * ) p + sizeof ( T ) .
p
Effects: Implicitly creates objects within the denoted region, including an objectof type
A whose address is
T . The object representation of
p is the contents of the storage prior to the call to
A as if by calling
bless from a copy of the original storage.
std :: memcpy
Returns: A pointer to A
.
4.9. 19.10.9.2 Static member functions [allocator.traits.members]
Add paragraph after 19.10.9.2 [allocator.traits.members] paragraph 1:
Remarks: Implicitly creates objects (6.6.2 [intro.object]) in the reused region of storage.
Add paragraph after 19.10.9.2 [allocator.traits.members] paragraph 2:
Remarks: Implicitly creates objects (6.6.2 [intro.object]) in the reused region of storage.
4.10. 19.10.12 C library memory allocation [c.malloc]
Add a new paragraph after [c.malloc] paragraph 4 in the description of
,
,
, and
:
These functions implicitly create objects (6.6.2 [intro.object]) in the returned region of storage.
4.11. 20.5.3 Header < cstring >
synopsis [cstring.syn]
Change in 20.5.3 [cstring.syn] paragraph 3:
The functions
and
memcpy are signal-safe (16.12.4). Both functions implicitly create objects (6.6.2 [intro.objects]) in the destination region of storage immediately prior to copying the sequence of characters to the destination.
memmove
4.12. C.5 C++ and ISO C++ 2017 [diff.cpp17]
Add an entry to Annex C as follows:
Affected subclause: [basic.life]
Change: A pseudo-destructor call ends the lifetime of the object to which it is applied.
Rationale: Increase consistency of the language model.
Effect on original feature: Valid ISO C++ 2017 code may be ill-formed or have undefined behavior in this International Standard.
constexpr int f () { int a = 123 ; using T = int ; a . ~ T (); return a ; // undefined behavior; previously returned 123 } static_assert ( f () == 123 ); // ill-formed; previously valid
5. Acknowledgments
Thanks to Ville Voutilainen for raising this problem, and to the members of SG12 for discussing possible solutions.