Document Number: | P2188R1 |
Date: | 2020-07-15 |
Author: | Anthony
Williams Just Software Solutions Ltd |
Audience: | EWG |
This paper relates to P1726: Pointer lifetime-end zap and provenance, too. It is not a "competing" paper, but provides an alternative look at the same issues.
R0 of this paper received extensive discussion on the EWG reflector. In light of that discussion, I have updated this paper to make it clear which aspects I believe are important, and how it can work with provenance-based analysis.
All the examples are based on production code that I have seen in use throughout my career. They all "work" with existing code, but I understand that slight variations on them may be optimized in ways that no longer work by current compilers.
One of the baseline goals of C++ is backwards compatibility: don't break users' code unnecessarily. I would therefore like to see a way to preserve as much of these as possible, without unnecessarily hindering optimizers, and while making sure that those that are broken break noisily, so users can more easily migrate to an alternative approach.
The standard says that pointers are scalar types
and trivially copyable types. Consequently the "pointer zap" from
the final sentence of [basic.stc] p4 ( Any other use of an invalid
pointer value has implementation-defined behavior
), and especially
note 31 (Some implementations might define that copying an invalid
pointer value causes a system-generated runtime fault.
) is clearly
incompatible with pointers being trivially copyable types from
[basic.types] p3, since that indicates that the value of a pointer is
entirely derived from its value representation.
Consequently we need to do something to make it clear what is permitted of users and optimizers.
I have removed the proposed wording from this paper, as I don't think it addresses all the issues.
Pointers have value semantics rather than identity semantics: if I copy a pointer it is equivalent to the original. An important aspect of that is that if two values are equal they can be used interchangeably.
void f(int* p,int* q){ *q=99; bool same=false; if(p==q){ *p=42; same=true; assert(*q==42); } assert(same?(*q==42):(*q==99)); }
Assuming p
and q
are not uninitialized
values, and q
is known to point somewhere valid, this code
should work. If it doesn't, then equality is broken for pointers.
I understand that if p
and q
have different
provenance then the compiler might want to treat them differently. That's
fine, but if they are to be treated differently then they should not
compare equal.
If I compare two pointers in one place and then compare the same two pointers, or copies of them, in another place, then comparisons must yield the same result. Again, this is a fundamental aspect of value semantics.
bool compare(int* const p, int* const q){ return p==q; } void f(int* const p, int* const q){ bool const same=(p==q); g(p,q); assert(same==(p==q)); assert(same==compare(p,q)); }
Assuming p
and q
are not uninitialized
values, this code should work, irrespective of what g
does. Either the values are the same, or they are not. This must work
across translation units and whether or not there is inlining, otherwise
equality is broken for pointers.
If a function has a local variable x
that is not passed to
other functions by pointer or reference, then the compiler should be able to
asssume that variable is unchanged by calls to other functions.
void f(){ int x=42; g(); assert(x==42); }
The assert should never fire, and the compiler should be able to assume that to be the case.
To my mind, this is still consistent with my other points: there are no
pointers to x
, so any pointers used in g
cannot be
equal to them.
If I have a pointer p
to an object which is destroyed by
whatever means, such as the object it points to going out of scope, or being
deleted, then dereferencing it is undefined behaviour.
Likewise, if the pointer is otherwise invalid, such as being a one-past-the-end pointer, then dereferencing it is undefined behaviour.
However, if the pointer p
is compared to another
pointer q
then the compiler may have to integrate the
provenance of q
with the provenance of p
if q
is known to be valid, and they compare equal.
If I compare it to another pointer q
which I know to be
valid, and they are equal, then p
must now be valid too, as
per my point 1 above.
void f(){ int * const p=new int(42); delete p; int * const q=new int(99); if(p==q){ assert(*p==99); } }
It is perfectly acceptable to me that p==q
returns false
, and I would encourage implementers to try to
ensure that it does. However, if it returns true
then p
must now be assumed to point to the same object
as q
.
In sequential code this seems contrived and unlikely, but in concurrent code such as several of the examples from P1726 this can be important.
memcpy
on a pointer#include <assert.h> #include <string.h> int main() { int *x= new int(42); int *y= nullptr; memcpy(&y,&x,sizeof(x)); assert(x == y); assert(*y==42); }
Here, we use memcpy
to copy the bits of a pointer from one
pointer to another. The second pointer is now valid and points to the same
thing the original did because pointers are trivially copyable
([basic.types] p3).
memcpy
via a buffer#include <assert.h> #include <string.h> int main() { int *x= new int(42); int *y= nullptr; unsigned char buffer[sizeof(x)]; memcpy(buffer, &x, sizeof(x)); memcpy(&y, buffer, sizeof(x)); assert(x == y); assert(*y == 42); }
Here, we use memcpy
to copy the bits of a pointer from one
pointer to a buffer, and then from that buffer to another pointer. The
second pointer is now valid and points to the same thing the original did
because pointers are trivially copyable ([basic.types] p2).
reinterpret_cast
to an integer#include <assert.h> #include <string.h> #include <stdint.h> int main() { int *x= new int(42); int *y= nullptr; uintptr_t temp= reinterpret_cast<uintptr_t>(x); y= reinterpret_cast<int *>(temp); assert(x == y); assert(*y == 42); }
Here we rely on the provision of [expr.reinterpret.cast] p5 that a pointer may be cast to an integer and back and retain its value.
memcpy
with modifications#include <assert.h> #include <string.h> int main() { int *x= new int(42); int *y= nullptr; unsigned char buffer[sizeof(x)]; memcpy(buffer, &x, sizeof(x)); for(auto &c : buffer) { c^= 0x55; } for(auto &c : buffer) { c^= 0x55; } memcpy(&y, buffer, sizeof(x)); assert(x == y); assert(*y == 42); }
Now we take example 1 a step further: we perform a reversible
modification on the bits in the buffer after the
first memcpy
, then reverse that modification
and memcpy
it back. Since the bits in the buffer now hold
their original values, we can copy them to a pointer, which will have the
same value, because pointers are trivially copyable.
memcpy
and write to file#include <assert.h> #include <string.h> #include <stdio.h> int main() { int *x= new int(42); int *y= nullptr; unsigned char buffer[sizeof(x)]; memcpy(buffer, &x, sizeof(x)); auto file= fopen("tempfile", "wb"); auto written= fwrite(buffer, 1, sizeof(buffer), file); assert(written == sizeof(buffer)); fclose(file); memset(buffer, 0, sizeof(buffer)); file= fopen("tempfile", "rb"); auto read= fread(buffer, 1, sizeof(buffer), file); assert(read == sizeof(buffer)); fclose(file); memcpy(&y, buffer, sizeof(x)); assert(x == y); assert(*y == 42); }
This time we are copying the pointer to a buffer, writing our bytes to a file, clearing the buffer and reading the bytes back from the file, then copying the bytes back to the pointer. If our file is unmodified then the buffer will have the same contents after reading as it did before writing, so copying the buffer back to the pointer yields the same value, and the pointer is again valid and points to the same object.
#include <assert.h> #include <string.h> #include <stdio.h> #include <new> struct X { int i; }; int main() { X *x= new X{42}; X *y= nullptr; unsigned char buffer[sizeof(x)]; memcpy(buffer, &x, sizeof(x)); x->~X(); new(x) X{99}; memcpy(&y, buffer, sizeof(x)); assert(x == y); assert(y->i == 99); assert(x->i == 99); }
This time, we destroy the pointed-to object and recreate a new object with a new value at the same memory location.
The pointer x
still holds the same bit pattern, and still
points to a valid object, so both the original pointer x
and
the newly constructed copy y
point to the new object, and all
is well by [basic.life] p8.
delete
and new
the object#include <assert.h> #include <string.h> #include <stdio.h> #include <new> struct X { int i; }; int main() { X *x= new X{42}; X *y= nullptr; unsigned char buffer[sizeof(x)]; memcpy(buffer, &x, sizeof(x)); delete x; y= new X{99}; unsigned char buffer2[sizeof(x)]; memcpy(buffer2, &y, sizeof(x)); if(memcmp(buffer, buffer2, sizeof(x))) { printf("Different address\n"); return 0; } memcpy(&x, buffer2, sizeof(x)); assert(x == y); assert(y->i == 99); assert(x->i == 99); }
This time, we destroy the pointed-to object with delete
and recreate a new object
with a new value with new
.
We then copy the new pointer into a buffer and compare the buffers. If the buffers are different, then the pointers are clearly different and our test doesn't work, so we stop.
If the buffers are the same, then we copy the new buffer (which is a copy of our new pointer) into the old pointer.
x
is now a copy of the raw bits of our new pointer, so
everything must work.
delete
and new
the object again#include <assert.h> #include <string.h> #include <stdio.h> #include <new> struct X { int i; }; int main() { X *x= new X{42}; X *y= nullptr; unsigned char buffer[sizeof(x)]; memcpy(buffer, &x, sizeof(x)); delete x; y= new X{99}; unsigned char buffer2[sizeof(x)]; memcpy(buffer2, &y, sizeof(x)); if(memcmp(buffer, buffer2, sizeof(x))) { printf("Different address\n"); return 0; } assert(x == y); assert(y->i == 99); assert(x->i == 99); }
This is the same as example 7, except we don't copy the raw bits from the new buffer over our old pointer.
We know that the bits of x
and the bits of y
are the same because we compared them with memcmp
. Since the
pointers are trivially copyable, the value of the pointer is determined by
the value representation, which is the set of bits of
the object representation. Since we know the object
representation is the same, the value representation must be
the same, so the pointers must have the same value.
Since the pointers must have the same value, x
must be equal
to y
, and must point to the same object, and all is well.
std::atomic
to hold the pointer#include <assert.h> #include <string.h> #include <stdio.h> #include <new> #include <atomic> struct X { int i; }; int main() { X *x= new X{42}; X *y= nullptr; std::atomic<X *> p(x); delete x; y= new X{99}; X *temp= y; if(!p.compare_exchange_strong(temp, y)) { printf("Different address\n"); return 0; } assert(x == y); assert(y->i == 99); assert(x->i == 99); }
This is the same as example 8, except instead of
using memcmp
to determine the equivalence, we
use compare_exchange_strong
, which compares pointer as-if
with memcmp
.
std::atomic
to hold the pointer, comparison the other way round#include <assert.h> #include <string.h> #include <stdio.h> #include <new> #include <atomic> struct X { int i; }; int main() { X *x= new X{42}; X *y= nullptr; delete x; y= new X{99}; std::atomic<X *> p(y); if(!p.compare_exchange_strong(x, nullptr)) { printf("Different address\n"); return 0; } assert(x == y); assert(y->i == 99); assert(x->i == 99); }
This is the same as example 9, except that rather than comparing
the temp
value copied from y
with our stored
pointer, we store the new value in the atomic, and compare it to our
original x
. This still works because
the compare_exchange_strong
compares as-if
using memcmp
, so we are comparing the object representation
of x
against the object representation of the copy
of y
stored in p
: if the pointers have the same
object representation then they have the same value representation, so must
be the same and point to the same object.
All standard references are to the C++ working draft from the 2020-04 mailing: N4861.
Thanks to Richard Smith, Hubert Tong, Peter Sewell, Martin Eucker, Peter Dimov, Jens Gustedt, Hans Boehm, Jens Maurer, Roger Orr, Ville Voutilainen, Bronek Kozicki, Balog Pal, Andrey Erokhin, Niall Douglas, Gabriel dos Reis, Olivier Giroux, Caleb Substrum, Alsdair Meredith, Nathan Myers, Bjarne Stroustrup, Nevin Liber, JF Bastien, Paul McKenney, Maged Michael, and others for their comments on the first revision of this paper.