<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Fri, 9 Feb 2018 at 13:08, Myria <<a href="mailto:myriachan@gmail.com">myriachan@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Is it worth mentioning that an implementation may have other<br>
mechanisms that create storage in the manner of malloc? For example,<br>
it'd make sense for VirtualAlloc or mmap to create implicit objects<br>
just like the standard malloc functions.<br></blockquote><div><br></div><div>Sounds like a good change to me.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
In terms of pointer arithmetic, what becomes defined as a result?<br>
Does the following evil code work?<br>
<br>
struct X { int a; int b; int c; };<br>
int does_this_return_4() {<br>
alignas(X) std::byte s[sizeof(X)];<br>
X *x = reinterpret_cast<X *>(s);<br>
x->c = 4;<br>
return (&x->a)[2];<br>
}<br></blockquote><div><br></div><div>No objects are created between the line referencing x->c and the line referencing x->a. For the x->a line to be valid, there must be an array of ints within its lifetime containing at least three ints starting at &x->a. For the x->c line to be valid, there must be an int object within its lifetime named by x->c. So, if the function has defined behavior, we can conclude that the X object and the int[3] object have overlapping storage and lifetime, which means one of those objects must be nested within the other. But we know that neither can be a subobject of the other (both objects only have subobjects of type `int`), and neither provides storage for the other (neither transitively contains an array of char-like type). So we arrive at a contradiction and can conclude that the behavior must be undefined.</div><div><br></div><div>I'd note that if we want optimizations similar to GCC's and Clang's path-sensitive TBAA to be valid (in particular, if we can conclude that a store to a "c member of X" cannot alias a load of "index >=2 of int[]"), the above must be UB. As usual, there's a balance to be had here between allowing evil-but-potentially-meaningful code and allowing useful-but-potentially-overly-aggressive optimizations.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
The below is a silly idea I had that is admittedly extreme, but would<br>
preserve a lot of type-based alias analysis.<br>
<br>
A simpler memory model that might work is defining memory as scattered<br>
arrays of bytes: Each byte has metadata specifying either "none" or a<br>
non-byte (non-char/unsigned char/std::byte) scalar type and an offset<br>
into that scalar type. Writing to a non-byte scalar changes the<br>
metadata for those bytes to the type that is written. Reading scalars<br>
as a non-byte type requires that all bytes have either "none" or a<br>
"compatible" type with incrementally-increasing offsets starting at 0.<br>
Writing byte types sets the type of those bytes to "none". Any byte<br>
may be read as a byte type. Pointer arithmetic would be valid so long<br>
as the pointer does not cross the byte array that was allocated,<br>
except that a pointer may point one past the end of such a byte array.<br>
<br>
Classes in this model would not factor into the type system at all;<br>
instead, for purposes of the memory model, members of a class would<br>
just be offsets into the byte array representing the class instance.<br>
This would preserve such semantics as reading sockaddr::sa_family from<br>
what was written as sockaddr_in6::sin6_family. It would also allow a<br>
lot of other shenanigans we probably don't want to encourage.<br></blockquote><div><br></div><div>That certainly seems to allow all the programs I could imagine wanting to allow, and does still allow simple scalar TBAA (but not path-sensitive TBAA, nor more sophisticated optimizations such as narrowing the accessible portion of an object based on access path). The above is spiritually pretty similar to C's "effective type" rule.</div><div><br></div><div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">My general goal with this sequence of papers (p0137 and now p0593) has been to try to reduce the grey area between "clearly valid" and "clearly UB", into which both well-intentioned programs and well-intentioned optimizers and static and dynamic analysis tools often tread, down to a much finer dividing line that can be reasonably explained and understood, with escape hatches where necessary so people can still express what they need to express. But I think the above approach strays a bit too far towards permissiveness -- most people don't write evil code that needs that kind of rule most of the time, and setting the rule up that way means that most code will be paying for flexibility it doesn't need, violating a fundamental tenet of C++.</div></div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">I think we probably do want additional language support to make sockaddr's shenanigans work (leaving this to the implementation to sort out doesn't seem like the best approach, although it might be tempting). I personally don't have a solid idea of what shape that would take, though. And I think it is reasonable to expect the definition of sockaddr to be changed to support this (perhaps adding some annotation or attribute).</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Melissa<br>
<br>
On Fri, Feb 9, 2018 at 11:52 AM, Richard Smith <<a href="mailto:richardsmith@google.com" target="_blank">richardsmith@google.com</a>> wrote:<br>
> Hi all,<br>
><br>
> Please find attached a revised version of P0593 based on the excellent<br>
> discussion and feedback at the Albuquerque meeting. Please let me know if<br>
> you have any comments; I believe our plan was to discuss this again at<br>
> Jacksonville, and all being well, to forward it to EWG at that meeting.<br>
><br>
> Best regards,<br>
> Richard<br>
><br>
> _______________________________________________<br>
> ub mailing list<br>
> <a href="mailto:ub@isocpp.open-std.org" target="_blank">ub@isocpp.open-std.org</a><br>
> <a href="http://www.open-std.org/mailman/listinfo/ub" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/ub</a><br>
><br>
_______________________________________________<br>
ub mailing list<br>
<a href="mailto:ub@isocpp.open-std.org" target="_blank">ub@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/ub" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/ub</a><br>
</blockquote></div></div>