offsetof
well-definedDocument #: | P3407R1 |
Date: | 2025-01-11 |
Project: | Programming Language C++ |
Audience: |
EWG |
Reply-to: |
Brian Bi <bbi10@bloomberg.net> |
I propose a change to the core language specification that would make
it well defined to compute a pointer to the beginning of an object from
a pointer to one of its data members (i.e. by subtracting the
offset of the data member, as given by the
offsetof
macro). Such code, which is
often written in C, arguably had well defined behavior prior to C++17.
The proposed change will standardize existing practice and is
anticipated to have no impact on existing C++ compilers, but will
eliminate the possibility of certain (as yet unimplemented) hypothetical
reachability-based optimizations that were made possible by the C++17
wording.
container_of
. Added a table
comparing the three main design alternatives discussed in this
paper.In C, an intrusive data structure, such as a doubly-linked list, must
be implemented using composition, not inheritance, since C does not have
inheritance. Given a pointer to a node within the data structure,
accessing the rest of the object requires the use of
offsetof
:
struct ListNode {
struct ListNode* prev;
struct ListNode* next;
};
typedef struct {
int data;
struct ListNode node;
} Foo;
* next_foo(Foo* foo) {
Foostruct ListNode* next_node = foo->node;
return (Foo*)((char*)next_node - offsetof(Foo, node));
}
This pattern of casting to char*
,
subtracting the appropriate offsetof
value, and then casting to a pointer to the enclosing type, is often
encapsulated in a macro that is named
container_of
or similar (see
e.g. GitHub
code search)1.
A C++-only project would typically make
ListNode
a base class. Converting a
ListNode*
to
a Foo*
could
then be done easily using
static_cast
,
and offsetof
would be unnecessary.
This option is not available in C. In C, the
container_of
pattern is the only
option, unless the ListNode
can be
arranged to always be the first member of the enclosing struct.
Unfortunately, the operand of the
return
statement in next_foo
has undefined
behavior in C++. This incompatibility between C and C++ should be fixed,
and can be fixed without changing any current C++ compilers.
At the November, 2019 WG21 meeting, EWG approved [P1839R1] in the following poll:
It should be possible to access the entire object representation through a pointer to a char-like type as a DR
Something like P1839 is certainly needed in order to allow the code
given in the previous section to be valid. Currently, casting
next_node
to type char*
does not yield a pointer that points into an array of
char
;
therefore, subtracting any value other than 0 can only have UB
(§7.6.6
[expr.add]2p4.3). To solve this problem,
[P1839R7] proposes that object
representations be made arrays of unsigned char
(and that pointers to
char
also be
allowed to traverse such arrays). This issue has also been pointed out
by [P2883R0], which also noted that,
although this use of offsetof
has UB
in C++, every known C++ implementation “consistently produced the same
behavior as the C program”.
However, EWG did not discuss the issue of reachability.
Therefore, recent revisions of P1839 have been designed to preserve
reachability-based restrictions that currently exist in C++. To put it
another way, if P1839 is adopted, it will not change which
bytes a piece of code is allowed to access, i.e., bytes that it
would already be able to access by calling
memcpy
. In order to allow the code
given in the Introduction to have well defined behavior, we must expand
the set of bytes that are considered reachable from a pointer to a data
member.
To be clear, P1839 could just make the example in the Introduction valid, but this is a separate evolutionary question from the approved direction in P1839. Therefore, the reachability issue has been made the subject of this paper instead of being added to P1839.
Reachability was introduced into C++17 by the adoption of [P0137R1]. The definition of reachability is currently given in §6.8.4 [basic.compound]p6:
A byte of storage b is reachable through a pointer value that points to an object x if there is an object y, pointer-interconvertible with x, such that b is within the storage occupied by y, or the immediately-enclosing array object if y is an array element.
The cumulative effect of all changes in P0137R1 was to make it
impossible for a pointer derived from a given pointer value,
p
, to access bytes that are not
reachable from p
. The adoption of
that paper therefore gave the Committee’s blessing to compiler
optimizations based on the assumption that unreachable bytes
cannot be accessed at all. Unfortunately, in the
next_foo
example given in the
Introduction, the bytes constituting the
foo->data
member are not reachable from a pointer to
foo->node
.
For example, assuming ListNode
and Foo
have been defined as above,
consider the following.
void access_node(ListNode* p);
int use_foo() {
Foo foo;.data = 1;
foo.node.prev = &foo.node;
foo.node.next = &foo.node;
foo(&foo.node);
access_nodereturn foo.data;
}
When the body of use_foo
is
compiled, the compiler is allowed to assume that
foo.data
cannot be modified by access_node
,
even though access_node
is given a
pointer to another member of the
foo
object. In order to allow
access_node
to access the
data
member through
offsetof
and pointer arithmetic, we
must also take away the possibility that a conforming implementation
could unconditionally optimize the
return
statement to return 1;
.
The allowance of this particular reachability-based optimization conflicts with the more important goal of allowing the code given in the Introduction, which would have well defined behavior in C, to have the same behavior in C++. In addition, such optimizations do not seem to have been implemented in any real C++ compilers. Therefore, the changes I propose will not have any impact on existing implementations. I will give more detail in the “Provenance in C++” section below.
In the status quo (prior to the adoption of [P1839R7], if any), reachability can prevent some memory accesses even when no pointer arithmetic is involved. For example:
struct S {
int a[2];
int data;
};
void f1(int* p);
int f2() {
S s;.data = 1;
s(&s.a[0]);
f1return s.data;
}
If f1
is defined as follows:
void f1(int* p) {
reinterpret_cast<S*>(reinterpret_cast<int (*)[2]>(p))->data = 2;
}
then calling f1
has undefined
behavior, because the entire array
s.a
is not
pointer-interconvertible with the element s.a[0]
.
The inner reinterpret_cast
yields a “wrongly typed” pointer: a pointer value that is of type int (*)[2]
,
but points to a single
int
, namely
s.a[0]
;
it does not point to the array
s.a
.
Consequently, the outer reinterpret_cast
,
which attempts to go from the first member of a standard-layout struct
to the struct itself (allowed in C++17), cannot work; instead another
wrongly typed pointer is produced: a value of type
S*
that
points to s.a[0]
(not s
). Dereferencing this pointer
yields an lvalue that does not refer to an
S
object, which renders the
attempted access to data
UB
(§7.6.1.5
[expr.ref]p9).
The
std::launder
function, which can accept a pointer and return a different pointer
value that holds the same address, does not help, because it has a
reachability restriction: calling
std::launder
on a wrongly typed pointer picks out the object of the correct type that
lives at the address that the pointer holds, but if there are bytes
reachable from that object that are not reachable from the object that
the original pointer points to, the behavior is undefined (§17.6.5
[ptr.launder]p2).
Therefore, the implementation can assume that the call to
f1
in
f2
never modifies
s.data
: if
any attempt were made to do so, then the behavior of the program would
be undefined.
In [P1839R7], I have attempted to ensure
that the proposed wording is consistent with the reachability
restrictions that exist in current C++, because there is no record of
EWG having discussed the question of whether those restrictions should
be relaxed. If the get_next_foo
example is to be made well-defined, then some reachability-based
assumptions that are currently allowed to implementations must be
invalidated. This paper proposes to do just that.
The C standard does not currently have a notion of provenance, but it is widely assumed that one ought to exist. For example, in the following translation unit:
void evil(void);
int main(void) {
int x = 1;
();
evilreturn x;
}
notwithstanding that evil
might
be able to “guess” the address of x
based on knowledge of the platform ABI, it is widely agreed that
evil
should be allowed to neither
read nor write the value of x
, and,
therefore, the compiler can eliminate
x
and optimize the last statement to
return 1;
.
GCC and Clang both perform this optimization at
-O1
and
higher.
One can say that even if evil
correctly guesses the numerical value of
x
’s address, casting that numerical
value to int*
would yield a pointer that lacks provenance and, therefore,
causes UB when dereferenced. Such provenance-based restrictions on the
use of pointers do not exist in the current C standard, but work is
underway on a Draft Technical Specification for pointer provenance in C
(referred to as the “Provenance TS” from this point onward). The latest
version of the Provenance TS is [N3057].
In the Provenance TS, values of pointer-to-object type3 are
augmented to include provenance, which may be empty. A non-empty
provenance is the ID of a storage instance, and a pointer value
whose provenance is the ID of a storage instance I can be used
only to access bytes that lie within I. In the example above, a
storage instance is created when x
is defined. In contrast to the address that a pointer value represents,
there is no way to directly change the provenance of a pointer, other
than by storing into it another pointer value that has the desired
provenance. That is, no cast or other operation in
evil
can construct a pointer value
whose provenance is the ID of x
.
Therefore, the implementation can assume that any pointer constructed by
evil
that happens to represent the
address of x
cannot be used to
access x
, since the provenance of
such a pointer value is either empty or a storage ID other than that of
x
.
Although the Provenance TS doesn’t explicitly state that subobjects have the provenance of their complete object, the definition of “storage instance” given in section 3.20 of Annex C implies that only a single storage instance is created by an object definition. A note to section 3.20 states that two subobjects within an object of structure type share a storage instance.
Therefore, under the Provenance TS, if the address of a subobject is taken, the resulting pointer’s provenance is a storage ID that contains at least the complete object4. Therefore, all bytes of a complete object are always reachable starting from a valid pointer to any subobject.
C++ has had a provenance-based pointer model since [P0137R1]. However, the C++ standard does not use the term “provenance”. Instead, every dereferencable pointer in C++ has a unique object or function to which it points. But the set of bytes that an object pointer can reach is not necessarily limited to the bytes occupied by the object that the pointer points to. For example, a pointer to any element of an array can be used to access any byte of the array, including bytes that are occupied by other elements. C++ is more restrictive than the C Provenance TS: all bytes reachable from the pointer value “pointer to o” (where o is an object) lie within o’s complete object, but not all bytes of a complete object are reachable from a pointer to a subobject. In particular, as stated previously, if a pointer points to a non-static data member of a standard-layout struct other than the first non-static data member, no other members are reachable from that pointer.
To look at it from the point of view of the compiler, all
provenance-based optimizations that are valid in C are also valid in
C++. For example, Clang, GCC, and MSVC are all capable of performing the
optimization mentioned in the previous section (i.e. that the
value of x
is not accessed by
evil
). Since C++ is stricter than C,
some provenance-based optimizations that are not valid in C are valid in
C++. However, I have not been able to find any cases in which
C++ implementations exploit provenance-based optimizations that are not
valid in C. For example, in the following translation unit:
struct S {
int x;
int y;
};
void f4(int* p);
int f3() {
S s;.x = 1;
s(&s.y);
f4return s.x * s.x;
}
even at maximum optimization levels, Clang, GCC, and MSVC all
generate a load of
s.x
and an
imul
instruction on x86-64; no
implementation assumes that, because only the address of
s.y
escapes
from f3
, the value of
s.x
cannot
be changed.
I believe that the reason why such optimizations are not performed is
that C++ implementations wish to maintain a reasonable degree of
compatibility with C. Since C code often uses the
container_of
idiom, which could be
used to obtain a pointer to s
given
a pointer to
s.y
,
implementations make allowances for the same operation to take place in
a C++ program. Therefore, not only do implementations not currently
perform this optimization, but it is unlikely that future versions will
do so, either. Implementations are more constrained by the needs of
their users, in this case, than by the availability of compiler
engineers to implement the optimization.
Similarly, the function f1
defined earlier could be given the following definition in C. The offset
value will always be 0 in this case, so the subtraction can be omitted
without changing the meaning.
void f1(int* p) {
(S*)((char*)p - offsetof(struct S, a[0]))->data = 2;
}
Therefore, even in C++ mode, implementations do not assume that
f1
cannot change the value of
data
, even though the reachability
rules of the language permit optimizations based on this assumption.
Clang, GCC, and MSVC all emit both a store to
s.data
before the call to f1
and a load
after.
The overly strict reachability rules adopted in C++17 have an additional disadvantage besides limiting compatibility with C: they create a category of constructs that:
My opinion is that the Committee should not create new forms of UB that meet the above criteria, and should strongly consider removing any such UB that already exists in the language. UB that is actually exploited by compilers for optimization purposes makes the use of C++ less safe; UB that is not currently exploited still has a negative impact on the perception of how safe C++ is, and is scary to beginners, who don’t have enough context to distinguish between benign UB that is unlikely to ever be exploited and dangerous UB that may eventually result in an unbounded set of possible executions.5 I do not mean to suggest that all or even most UB can be removed from C++, but when the two criteria above are met, I think the cost/benefit analysis heavily favors giving the construct a defined behavior.
I believe that a better way to obtain the optimizations that such UB
is meant to enable is to provide mechanisms to opt in: that is, language
or library features whose sole purpose is to cause UB, which can then be
used to optimize; experts can use such features to produce faster code,
while beginners can easily avoid them because they cannot be used by
accident while writing code that uses other C++ features. (The [[assume]]
attribute is a well-known example of this genre.) It seems much more
defensible to provide “sharp tools” for experts to use in order to
improve performance than to build sharp edges into the most basic
language constructs, making it difficult for beginners to use them
safely.
Consider again this example from the previous section:
struct S {
int x;
int y;
};
void f4(int* p);
int f3() {
S s;.x = 1;
s(&s.y);
f4return s.x * s.x;
}
This paper proposes that f4
would
have the ability to modify
s.x
, and
that if there is sufficient interest from C++ experts in having a way to
tell the compiler that
s.x
cannot be reached through the pointer passed to
f4
, a new mechanism can be added to
the language. This possibility is discussed in Appendix A.
To make the C++ standard match existing practice of implementations
and to bless container_of
-like
constructs in C++, it is necessary to permit pointer arithmetic within
objects, which is already being proposed by [P1839R7], and also to relax the
reachability rules in C++. However, this paper does not propose to relax
the C++ reachability rules all the way to the “complete object or
allocation” model proposed by the C Provenance TS because doing so is
not necessary to solve the immediate problem. Instead, it suffices to
allow a pointer to an object to reach all bytes of the complete object.
For example, this paper does not propose to enable the use of flexible
array members in C++, which are allowed by the C Provenance TS because
the trailing bytes belong to the same storage instance (allocation) as
the preceding members. The
container_of
technique was valid in
C++ prior to C++17 and this paper aims to restore the status quo
ante, not to propose a new feature that has never been in C++.
Because typical container_of
macros in C use a cast to char*
(not unsigned char*
),
this paper proposes that a cast to char*
be allowed to yield a pointer to an object’s object representation;
pointers to unsigned char
and
std::byte
are also supported, as these types are already exempt from the strict
aliasing rule (§7.2.1
[basic.lval]p11).
char*
that are already well-defined?In some cases, a C-style cast to char*
already has well-defined behavior in C++ that is different than
producing a pointer to the object representation. One of these cases is
when the operand points to an object of class type that has a conversion
function to cv char*
.
I do not propose to change the behavior of such casts in C++; doing so
would be a disastrous breaking change that is not needed for C
compatibility, because C does not have conversion functions. The
remaining two cases are:
const_cast
because the operand has type cv char*
or array of cv
char
. (This
includes the case where no conversion is neede at all.)reinterpret_cast
followed by an optional
const_cast
because there is a “real” cv
char
(not an
element of an object representation) that is located at the address
represented by the operand and is pointer-interconvertible with it.I searched GitHub for uses of
container_of
and uses of
offsetof
for the purpose of reaching
an enclosing struct. In the 65 files that I analyzed manually, I found
two files in which the pointer from which the
offsetof
value is subtracted points
to an array of
char
. (In
one of these cases, the array was a flexible array member, which is not
part of standard C++, but is often accepted as an extension.) That is,
the relevant details of the code are similar to:
struct S2 {
int data;
char buf[100];
};
int get_data(char* p) {
return ((struct S2*)(p - offsetof(S2, buf)))->data;
}
void f5() {
S2 s;// ...
(s->buf);
get_data// ...
}
In C++, this code performs out-of-bounds array arithmetic, and thus
exhibits UB even before the attempt to access
data
.
Essentially, this gives us three design options to deal with Case 1.
char*
is exempt from bounds checking, just as it’s exempt from the strict
aliasing rule. In other words, while a char*
may point to a specific
char
object
during constant evaluation, in all other cases it merely points to a
byte of storage, and pointer arithmetic that would reach any other byte
in the same complete object is permitted. In this case, cv
unsigned char*
would also be exempt from bounds checking (for symmetry with the strict
aliasing rule). This might have a negative impact on performance
relative to the status quo if compilers are currently relying on the
assumption that a pointer into a
char
array
that is a subobject cannot be used to perform pointer arithmetic outside
the bounds of the array. However, I have not yet found any examples
where compilers do use such assumptions for optimization. The more
likely impact is on sanitizers and static analyzers: they might be
forced to disable bounds checking for char*
and unsigned char*
,
which would reduce their ability to detect UB.char*
to char*
or a similar cast (as described in Case 1) sometimes changes
the pointer value such that the above example would have defined
behavior if p
were to be cast to
char*
prior to the pointer arithmetic. (A similar allowance would be made for
casts to unsigned char*
.)
In many cases in the real world, such a cast might be present because it
will have been introduced by a generic
container_of
-like macro that is not
“aware” of the fact that the pointer argument, in some particular cases,
has type char*
already. However, this design has two disadvantages. First, some
compilers might simply ignore casts from char*
to char*
at some early stage of semantic analysis, so that at some later stage
they are not aware that the cast is there at all, so the cast cannot
achieve its purpose of giving the program defined behavior; it is not
clear how much work would be required to change the implementations.
Second, it would violate the current design in which a C-style cast is
equivalent to trying C++-style casts in a particular order; instead, the
C-style cast would have the additional power of producing a pointer to
the object representation instead of performing a no-op
const_cast
.char*
.
The example above would continue to have UB, regardless of whether an
additional cast is inserted. We would still solve 99% of the problem,
because in the vast majority of cases, the subobject pointer points to
an object of struct or union type, not a
char
.To understand the practical implications of these options, consider
the following three possible definitions of
get_data
:
return ((struct S2*)(p - offsetof(S2, buf)))->data;
return ((struct S2*)((char*)p - offsetof(S2, buf)))->data;
return ((struct S2*)((unsigned char*)p - offsetof(S2, buf)))->data;
In the status quo, even assuming [P1839R7] is adopted, the behavior is undefined in C++ in all three cases (A, B, and C). Some or all of these cases are well defined under C6 and under the proposed options discussed above:
Case
|
ISO C with Provenance TS
|
Option 1
|
Option 2
|
Option 3
|
---|---|---|---|---|
A | Undefined | Well defined | Undefined | Undefined |
B | Undefined | Well defined | Well defined | Undefined |
C | Well defined | Well defined | Well defined | Well defined |
This paper proposes option 3 as the conservative option, without
prejudice to adopting something similar to option 1 or 2 in the future.
The status of Case A and Case B in ISO C is a bit unclear; if the C
standard were to be clarified to make Case A and Case B definitely well
defined, their specification strategy could provide inspiration for a
corresponding change in C++ to preserve compatibility. Assuming such a
change is not made, the small amount of C code that is similar to Case A
or Case B could be rewritten so that, if the subobject has type
char
, then
the pointer arithmetic is done using unsigned char*
(Case C), and vice versa; it would then have the desired
behavior in both C and C++ under option 3.
For Case 2, I also found two examples in the 65 files that I analyzed
in which the subobject pointer points to a struct that is
pointer-interconvertible with an unsigned char
subobject. I didn’t find any examples with
char
, but
given that examples exist that use unsigned char
,
I assume there are others that use
char
. Such
code would have relevant details similar to:
struct S3 {
char a;
int b;
};
struct S4 {
char c;
struct S3 d;
};
struct S4* get_s4(struct S3* s3) {
return (struct S4*)((char*)s3 - offsetof(S4, d));
}
In current C++, the cast to char*
yields a pointer to the a
subobject.
Note, however, that the entire example has undefined behavior because of
the subsequent pointer arithmetic. If we change the rules of C++ so that
the cast would be allowed to yield a pointer to the object
representation of the S4
object, we
could make this example well-defined when it currently is not. In order
to avoid changing the behavior of any code that is already well-defined,
we could say that the status quo interpretation of the cast takes
precedence, and an pointer to the object representation is obtained only
when the former interpretation would produce undefined behavior. This
specification strategy is similar to that of implicit object creation,
in which the specific objects that are created may only be determined by
the details of a later operation, which would have UB other than under
one particular choice of objects to create.
container_of
?Standardizing container_of
could
provide a solution to the char*
/unsigned char*
problem: if the operand is of type char*
,
it would be cast to unsigned char*
,
and vice versa, thus ensuring that a pointer to the object
representation can always be obtained (In C, this behavior can be
implemented using a _Generic
expression.) Cases A and B in the previous subsection would remain
undefined but such undefined behavior would not be invoked as long as
container_of
is used. However,
pursuing standardization of
container_of
must begin in WG14, not
WG21, and in any case, such work would have to be in addition
to this paper, not instead of it. Casting to the other pointer type is
of no use if reachability restrictions are not also relaxed in C++.
The offsetof
macro is
conditionally-supported when its type argument is not a standard-layout
class (§17.2.4
[support.types.layout]p1).
Any struct that is valid in C will produce a standard-layout class when
its definition is compiled as C++ code. Therefore, for purposes of C++
compatibility, we do not necessarily need to allow all bytes of a
non-standard-layout class to be reachable from a pointer to one of its
subobjects.
However, limiting the changes in this paper to standard-layout classes has some downsides, and no known upsides:
offsetof
to
reach the beginning of the class would silently acquire undefined
behavior.container_of
to be used in C++ would
simplify the use case discussed therein. (And perhaps the C++ standard
should be amended to require
offsetof
to be supported for
non-standard-layout types that are aggregates, but that’s beyond the
scope of this paper.)This wording is a modified version of the wording in [P1839R7] and is relative to working draft [N5001].
Modify §6.7.2 [intro.object]p3 as follows:
If a complete object is created ([expr.new]) in storage associated with another object e of type “array of N
unsigned char
” other than a synthesized object representation ([basic.types.general]) or of type “array of Nstd::byte
” ([cstddef.syn]), that array provides storage for the created object if […]
Modify §6.7.2 [intro.object]p4 as follows:
An object a is nested within another object b if
- a is a subobject of b, or
- b provides storage for a, or
- a and b are the object representations of two objects o1 and o2, where o2 provides storage for o1, or
- there exists an object c where a is nested within c, and c is nested within b.
Modify §6.7.2 [intro.object]p10 as follows:
Unless an object is a bit-field or a subobject of zero size, the address of that object is the address of the first byte it occupies. Two objects with overlapping lifetimes that are not bit-fields may have the same address if
- one is nested within the other,
- at least one is a subobject of zero size and they are not of similar types ([conv.qual]),
or- at least one is a synthesized object representation or element thereof, or
- they are both potentially non-unique objects;
otherwise, they have distinct addresses and occupy distinct bytes of storage.
Modify §6.7.2 [intro.object]p14 as follows:
Except during constant evaluation, an operation that begins the lifetime of an array of
unsigned char
orstd::byte
other than a synthesized object representation ([basic.types.general]) implicitly creates objects within the region of storage occupied by the array.
Edit §6.7.4 [basic.life]p1 as follows:
[…] The lifetime of an object of type
T
other than an element of a synthesized object representation ([basic.types.general]) begins when
- storage with the proper alignment and size for type
T
is obtained, and- if it is not a synthesized object representation, its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),
except […]. The lifetime of an object o of type
T
other than an element of a synthesized object representation ends when:
- if
T
is a non-class type, the object is destroyed, or- if
T
is a class type, the destructor call starts, or- the storage which the object occupies is released, or is reused by an object that is
notneither nested within o ([intro.object]) nor nested within the object of which o is the object representation, if any ([basic.types.general]).When evaluating a new-expression, […]
[Example 1: […] end example]
A synthesized object representation is not considered to reuse the storage of any other object.
Insert a new paragraph after §6.7.4 [basic.life]p3 as follows:
The lifetime of a reference begins when its initialization is complete. The lifetime of a reference ends as if it were a scalar object requiring storage.
[Note 1: [class.base.init] describes the lifetime of base and member subobjects. —end note]
[For an object o of class type, the lifetimes of the elements of the synthesized object representation begin when the construction of o begins and end when the destruction of o completes. Otherwise, the lifetimes of the elements of the synthesized object representation (if any) are the lifetime of o.
Modify §6.8.1 [basic.types.general]p4 as follows and add a paragraph after it:
The object representation of a complete object type
T
is the sequence of Nbytes taken up by aunsigned char
objectsnon-bit-fieldcomplete object of typeT
, where N equalssizeof(T)
. The value representation of a typeT
is the set of bits in the object representation ofT
that participate in representing a value of typeT
. The object and value representation of anon-bit-fieldcomplete object of type cvT
are the bytes and bits, respectively, of the object corresponding to the object and value representation of its type; the object representation is considered to be an array of N cvunsigned char
if the object occupies contiguous bytes of storage ([intro.object]). The object representation of a bit-field object is the sequence of N bits taken up by the object, where N is the width of the bit-field ([class.bit]). The value representation of a bit-field object is the set of bits in the object representation that participate in representing its value. Bits in the object representation of a type or object that are not part of the value representation are padding bits. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.
For a complete object o with type cv
T
whose object representation is an array A:
- If o has type “array of cv
unsigned char
”, then A is o.- Otherwise, A is said to be a synthesized object representation, and is distinct from any object that is not an object representation.
[Note: In particular, when an array B of Nunsigned char
provides storage for an object o of size N, the object representation of o is a different array that occupies the same storage as B. —end note]
For each element e of A:[Note: Attempting to access an element of a synthesized object representation of a volatile object results in undefined behavior ([dcl.type.cv]). —end note]
- If e occupies the same storage as an object having type cv
char
, cvunsigned char
, or cvstd::byte
that is either o or a non-bit-field subobject thereof, the value of e is the value congruent ([basic.fundamental]) to that of the subobject.- Otherwise, for each bit b in the byte of o that corresponds to e, let b’ be the corresponding bit of e and let p(b) be the smallest subobject of o that contains b, other than an inactive union member or subobject thereof. If p(b) is a union object or is not within its lifetime or has an indeterminate value, or if b is not part of the value representation of p(b), then b’ has indeterminate value. Otherwise, if b has an erroneous value, then b’ has an erroneous value. Otherwise, b’ has an unspecified value that is neither indeterminate nor erroneous; such a bit retains its value until p(b) is subsequently modified.
[Note: An object representation is always a complete object. —end note]
Modify §6.8.4 [basic.compound]p5 as follows:
Two objects a and b are pointer-interconvertible if they have the same address and:
they are the same object, orone is a union object and the other is a non-static data member of that object ([class.union]), orone is a standard-layout class object and the other is the first non-static data member of that object or any base class subobject of that object ([class.mem]), orthere exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.- they have the same complete object, or
- the complete object of one is the object representation of the complete object of the other.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via areinterpret_cast
([expr.reinterpret.cast]).
[Note: Areinterpret_cast
([expr.reinterpret.cast]) never converts a pointer to a to a pointer to b unless a and b are pointer-interconvertible. —end note][Note: A standard-layout class object is pointer-interconvertible with its first non-static data member (if any) and each of its base class subobjects ([class.mem]). An array object and an object that the array provides storage for are not pointer-interconvertible. —end note]
Modify §6.8.4 [basic.compound]p6 as follows:
A byte of storage b is reachable through a pointer value that points to an object x if
there is an object y, pointer-interconvertible with x, such that b is within the storage occupied by y, or the immediately-enclosing array object if y is an array elementb is within the storage occupied by x’s complete object.
Modify §7.2.1 [basic.lval]p11 as follows:
An object of dynamic type
T
obj is type-accessible through a glvalue of typeT
ref ifT
ref is similar ([conv.qual]) to:
T
obj,- a type that is the signed or unsigned type corresponding to
T
obj, or- a
char
,orunsigned char
,std::byte
type ,if the object is an element of an object representation ([basic.life.general]).If a program attempts to access ([defns.access]) the stored value of an object through a glvalue through which it is not type-accessible, the behavior is undefined. […]
[Note 11: […]]
[Example 2: An element of an object representation can be accessed through a glvalue of typechar
,unsigned char
,signed char
,std::byte
, or a cv-qualified version of any of these types. —end example]
Drafting note: Because we don’t guarantee that all complete
objects are contiguous (see [P1945R0]) it cannot always be guaranteed
that, e.g., a reinterpret_cast
to unsigned char*
will yield a pointer to an element of an object representation: no
synthesized object representation is present at all in the discontiguous
case. In those cases, we do not attempt to specify the behavior of
accesss the original object through a glvalue of char-like type, so we
shouldn’t claim that it’s well defined to do so.
Modify §7.3.2 [conv.lval]p3.4, as amended by the proposed resolution of [CWG2901], as follows:
- Otherwise, the object indicated by the glvalue is read ([defns.access]). Let V be the value contained in the object. If
T
is an integer type or cvstd::byte
, the prvalue result is the value ofT
congruent ([basic.fundamental]) to V, and V otherwise. […]
Modify §7.6.1.9 [expr.static.cast]p13 as follows:
[…] Otherwise, if the original pointer value points to an object a,
and there is an object b of type similar tolet S be the set of objects that are pointer-interconvertible with a and have type similar toT
that is pointer-interconvertible ([basic.compound]) with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.T
.
- If S contains a, the result is a pointer to a.
- Otherwise, the result is a member of S whose complete object is not a synthesized object representation if any such result would give the program defined behavior. If there are multiple possible results that would give the program defined behavior, the result is an unspecified choice among them.
- Otherwise (i.e. when there are no such members of S that would give the program defined behavior), if the object representation of a’s object is an array A,
T
is similar to the type of A, and A is a member of S, the result is a pointer to A.- Otherwise, if the object representation of a’s complete object is an array and
T
is cvunsigned char
, the result is a pointer to the element of that object representation that has the same address as a.- Otherwise, if
T
is cvchar
or cvstd::byte
, or an array of one of these types, letU
be the type obtained fromT
by replacingchar
orstd::byte
withunsigned char
. If astatic_cast
of the operand toU*
would be well-formed and would yield a pointer to an object representation or element thereof, the result of the cast toT*
is that pointer value.- Otherwise, the result is a pointer to a.
Otherwise, if the original pointer value points past the end of an object a:
- If the object representation of the complete object of a is an array A,
T
is similar to the type of A, and a has the same address as A, the result is&
A+1
.- Otherwise, if the object representation of the complete object of a is an array A and
T
is cvunsigned char
, the result is a pointer to the element of A (possibly the past-the-end element) that has the same address as the one represented by the operand.- Otherwise, if
T
is cvchar
or cvstd::byte
, or an array of one of these types, letU
be the type obtained fromT
by replacingchar
orstd::byte
withunsigned char
. If astatic_cast
of the operand toU*
would be well-formed and would yield a pointer value defined by one of the above cases, the result of the cast toT*
is that pointer value.- Otherwise, the result is the value of the operand.
Modify §7.6.6 [expr.add]p6 as follows:
For addition or subtraction, if the expressions
P
orQ
have type “pointer to cvT
”, where, one of the following shall hold:T
and the array element type are not similar, the behavior is undefined.
T
is similar to the array element type, orT
is similar tochar
orstd::byte
and the pointer value points to a (possibly-hypothetical) element of an object representation.Otherwise, the behavior is undefined.
Modify §9.2.9.2 [dcl.type.cv]p5 as follows:
If an attempt is made to access an element e of a synthesized object ([basic.types.general]) and e overlaps the storage occupied by a volatile object (including a subobject) that is within its lifetime, the behavior is undefined. Otherwise, the
Thesemantics of an access through a volatile glvalue are implementation-defined. If an attempt is made to access an object defined with a volatile-qualified type through the use of a non-volatile glvalue, the behavior is undefined.
The C programming language already contains an opt-in feature that
can be used to tell the compiler that a pointer to part of an object
cannot be used to access other parts of the same object. That feature is
the restrict
keyword. Using
restrict
, the definition of the
function f3
given previously could
be changed to:
int f3(void) {
struct S s;
.x = 1;
s{
struct S* restrict p = &s;
(&s.y);
f4return p->x * p->x;
}
}
In the example above, if
s.x
is
accessed through an lvalue that is based on the restricted
pointer p
and
s.x
is
modified at any point during the execution of the block in which
p
is defined, then all accesses to
s.x
during
that block must be through lvalues that are based on
p
. The first condition (that
s.x
is
accessed through an lvalue based on
p
) is already met by the return
statement in f3
; the second
condition will be met if f4
attempts
to modify
s.x
. In that
case, all accesses to
s.x
during
the lifetime of p
would need to be
through lvalues based on p
, but the
modification in f4
could not be, so
the behavior would be undefined. The compiler can assume that this
scenario does not occur, and that
s.x
will
still have the value 1 upon return from
f4
.
GCC does not actually perform this optimization, even with
-O3
. I can
only speculate as to the reason: I suspect that this is not the kind of
optimization that restrict
was
designed to enable, and that such an optimization is simply not very
useful. However, let’s assume for the sake of argument that some experts
would benefit from being given a tool to enable such an optimization in
C++: one that (unlike the current reachability rules in C++) could
actually be used by implementations without breaking compatibility with
C. What might that tool look like?
restrict
itself is unlikely to be
added to C++. If we were to design a different feature for this purpose,
we would probably want it to be in a form that could also be added to
C.
For example, we could change the definition of pointer values in the
C++ standard so that, in the case of an object pointer, the value not
only identifies the object that the pointer value points to or past the
end of, but also includes a reachable range, which is a
contiguous set of bytes; a pointer could be used to access memory only
at addresses that lie within the pointer value’s reachable range. This
provenance model is the one used by CHERI, which refers to the reachable
range as the bounds of a pointer value. The CHERI C/C++
Programming Guide [CHERI] states that the subobject
bounds feature (described in Section 4.3.3), in which taking the
address of a subobject produces a pointer value whose bounds are
narrowed to the memory occupied by the subobject, is not enabled by
default, and when enabled, breaks code that uses the
“containerof
pattern” (p. 16); such
code must be modified to opt out of subobject bounds. However,
CHERI aims to provide improved safety (e.g., by “[preventing] an
overflow on [an array subobject] from affecting the remainder of the
structure”); when the objective of narrowing bounds is to create
potential UB and enable additional optimizations, an opt-in mechanism is
more appropriate. Such an opt-in mechanism, that would be based on Core
wording that defines reachable ranges, might be a library function like
the following:
/// If `p1` is a null pointer, return `p1`. Otherwise, return a pointer that
/// points to or past the end of the same object `o` as `p1` but whose
/// reachable range consists of the bytes in [p2, p3). The storage occupied by
/// `o` shall be a subrange of [p2, p3), which shall be a subrange of the
/// reachable range of `p1`; otherwise, the behavior is undefined.
void* narrow_reachable_range_to(void* p1,
const void* p2,
const void* p3);
The same library function could also be available in C; for example,
it could be in the <stdlib.h>
header. The previously given example would then become:
int f3(void) {
struct S s;
.x = 1;
s((int*)narrow_reachable_range_to(&s.y, &s.y, &s.y + 1));
f4return s.x * s.x;
}
The C++ standard library could provide more convenient (presumably
templated) facilities built on top of
narrow_reachable_range_to
.
This paper does not propose to add reachable ranges to the C++
standard, nor a library function similar to
narrow_reachable_range_to
. This
Appendix merely aims to describe one possibility as to how the
optimizations that the paper seeks to invalidate could be recovered by a
future opt-in mechanism.
In cases where the macro’s name is precisely
container_of
, it appears that it
usually refers to the version defined by the Linux kernel. This version
uses void*
,
not char*
;
pointer arithmetic using void*
is not proposed by this paper. However, char*
is used in many other cases.↩︎
All citations to the Standard are to working draft N5001 unless otherwise specified.↩︎
In C,
void
is an
object type.↩︎
Note that the Provenance TS does not state that two
different complete objects always have different storage IDs. According
to section 3.20, a single allocation creates a single storage instance.
For example, when malloc
succeeds,
it returns a pointer to “the allocated storage instance” (per section
7.22.3.4).↩︎
An example of dangerous UB is reading from uninitialized variables. I’ve observed recent versions of Clang eliding branches along which uninitialized variables are read, causing unit tests to fail when Clang was upgraded. Such behavior will become (mostly) disallowed in C++26 due to the adoption of [P2795R5].↩︎
§6.2.6.1p7 in [N3057] refers to the “byte array of the
storage instance”, implying that pointer arithmetic can be used to
traverse the entire storage instance. However, a pointer to the first
element of buf
does not appear to be
specified to be interchangeable with a pointer to the corresponding
element of the byte array of the storage instance. The latter value must
be obtained from the former through conversion, as in Case C. In Case B,
§6.5.4p6 of the C23 draft would appear to apply; it states that “A cast
that specifies no conversion has no effect on the type or value of the
expression”. Therefore, casting from char*
to char*
behaves as if the cast were absent.↩︎