<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 12, 2019, at 8:09 AM, Peter Sewell <<a href="mailto:Peter.Sewell@cl.cam.ac.uk" class="">Peter.Sewell@cl.cam.ac.uk</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Hi all,<br class=""><br class="">perhaps I can reset this discussion, which got bogged down in largely<br class="">independent questions about pointer equality, by summarising the<br class="">basic provenance proposal. Try this below, extracted from n2363.<br class="">Comments on this?<br class=""><br class="">best,<br class="">Peter<br class=""><br class=""><br class=""><br class="">C pointer values are typically represented at runtime as simple<br class="">concrete numeric values, but mainstream compilers routinely exploit<br class="">information about the "provenance" of pointers to reason that they<br class="">cannot alias, and hence to justify optimisations. This is<br class="">long-standing practice, but exactly what it means (what programmers<br class="">can rely on, and what provenance-based alias analysis is allowed to<br class="">do), has never been nailed down. That's what the proposal does.<br class=""><br class=""><br class="">The basic idea is to associate a *provenance* with every pointer<br class="">value, identifying the original storage instance (or allocation, in<br class="">other words) that the pointer is derived from. In more detail:<br class=""><br class="">- We take abstract-machine pointer values to be pairs (pi,a), adding a<br class=""> provenance pi, either @i where i is a storage instance ID, or the<br class=""> *empty* provenance, to their concrete address a.<br class=""><br class="">- On every storage instance creation (of objects with static, thread,<br class=""> automatic, and allocated storage duration), the abstract machine<br class=""> nondeterministically chooses a fresh storage instance ID i (unique<br class=""> across the entire execution), and the resulting pointer value<br class=""> carries that single storage instance ID as its provenance @i.<br class=""><br class="">- Provenance is preserved by pointer arithmetic that adds or subtracts<br class=""> an integer to a pointer.<br class=""><br class="">- At any access via a pointer value, its numeric address must be<br class=""> consistent with its provenance, with undefined behaviour<br class=""> otherwise. In particular:<br class=""><br class=""> -- access via a pointer value which has provenance a single storage<br class=""> instance ID @i must be within the memory footprint of the<br class=""> corresponding original storage instance, which must still be<br class=""> live.<br class=""><br class=""> -- all other accesses, including those via a pointer value with<br class=""> empty provenance, are undefined behaviour.<br class=""><br class="">Regarding such accesses as undefined behaviour is necessary to make<br class="">optimisation based on provenance alias analysis sound: if the standard<br class="">did define behaviour for programs that make provenance-violating<br class="">accesses, e.g.~by adopting a concrete semantics, optimisation based on<br class="">provenance-aware alias analysis would not be sound. In other words,<br class="">the provenance lets one distinguish a one-past pointer from a pointer<br class="">to the start of an adjacently-allocated object, which otherwise are<br class="">indistinguishable.<br class=""><br class="">All this is for the C abstract machine as defined in the standard:<br class="">compilers might rely on provenance in their alias analysis and<br class="">optimisation, but one would not expect normal implementations to<br class="">record or manipulate provenance at runtime (though dynamic or static<br class="">analysis tools might).<br class=""><br class=""><br class="">Then, to support low-level systems programming, C provides many other<br class="">ways to construct and manipulate pointer values:<br class=""><br class="">- casts of pointers to integer types and back, possibly with integer<br class=""> arithmetic, e.g.~to force alignment, or to store information in<br class=""> unused bits of pointers;<br class=""><br class="">- copying pointer values with memcpy;<br class=""><br class="">- manipulation of the representation bytes of pointers, e.g.~via user<br class=""> code that copies them via char* or unsigned char* accesses;<br class=""><br class="">- type punning between pointer and integer values;<br class=""><br class="">- I/O, using either fprintf/fscanf and the %p format, fwrite/fread on<br class=""> the pointer representation bytes, or pointer/integer casts and<br class=""> integer I/O;<br class=""><br class="">- copying pointer values with realloc; and<br class=""><br class="">- constructing pointer values that embody knowledge established from<br class=""> linking, and from constants that represent the addresses of<br class=""> memory-mapped devices.<br class=""><br class=""><br class="">A satisfactory semantics has to address all these, together with the<br class="">implications on optimisation. We've explored several, but our main<br class="">proposal is "PNVI-ae-udi" (provenance not via integers,<br class="">address-exposed, user-disambiguation).<br class=""><br class="">This semantics does not track provenance via integers. Instead, at<br class="">integer-to-pointer cast points, it checks whether the given address<br class="">points within a live object that has previously been *exposed* and, if<br class="">so, recreates the corresponding provenance.<br class=""><br class="">A storage instance is deemed exposed by a cast of a pointer to it to<br class="">an integer type, by a read (at non-pointer type) of the representation<br class="">of the pointer, or by an output of the pointer using %p.<br class=""><br class="">The user-disambiguation refinement adds some complexity but supports<br class="">roundtrip casts, from pointer to integer and back, of pointers that<br class="">are one-past a storage instance.<br class=""></div></div></blockquote><div><br class=""></div><div>Thanks for that summary.</div><div><br class=""></div><div>I _think_ that that also covers XOR linked lists, right? (<a href="https://en.wikipedia.org/wiki/XOR_linked_list" class="">https://en.wikipedia.org/wiki/XOR_linked_list</a>)</div><div><br class=""></div><div><span class="Apple-tab-span" style="white-space:pre">        </span>Daveed</div><br class=""><blockquote type="cite" class=""><div class=""><div class=""><br class=""><br class=""><br class=""><br class="">On 02/04/2019, Peter Sewell <<a href="mailto:Peter.Sewell@cl.cam.ac.uk" class="">Peter.Sewell@cl.cam.ac.uk</a>> wrote:<br class=""><blockquote type="cite" class="">Dear UB folk,<br class=""><br class="">continuing the discussion from last year at EuroLLVM, the GNU Tools<br class="">Cauldron,<br class="">and elsewhere, we (the WG14 C memory object model study group) now<br class="">have a detailed proposal for pointer provenance semantics, refining<br class="">the "provenance not via integers (PNVI)" model presented there.<br class="">This will be discussed at the ISO WG14 C standards committee at the<br class="">end of April, and comments from the community before then would<br class="">be very welcome. The proposal reconciles the needs of existing code<br class="">and the behaviour of existing compilers as well as we can, but it doesn't<br class="">exactly match any of the latter, so we'd especially like to know whether<br class="">it would be feasible to implement - our hope is that it would only require<br class="">minor changes. It's presented in three documents:<br class=""><br class="">N2362 Moving to a provenance-aware memory model for C: proposal for C2x<br class="">by the memory object model study group. Jens Gustedt, Peter Sewell,<br class="">Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.<br class="">This introduces the proposal and gives the proposed change to the standard<br class="">text, presented as change-highlighted pages of the standard<br class="">(though one might want to read the N2363 examples before going into that).<br class=""><a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf" class="">http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf</a><br class=""><br class="">N2363 C provenance semantics: examples.<br class="">Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin<br class="">Uecker.<br class="">This explains the proposal and its design choices with discussion of a<br class="">series of examples.<br class="">http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf<br class=""><br class="">N2364 C provenance semantics: detailed semantics.<br class="">Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.<br class="">This gives a detailed mathematical semantics for the proposal<br class="">http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf<br class=""><br class="">In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an<br class="">executable version of the semantics, with a web interface that<br class="">allows one to explore and visualise the behaviour of small test<br class="">programs, stepping through and seeing the abstract-machine<br class="">memory state including provenance information. N2363 compares<br class="">the results of this for the example programs with gcc, clang, and icc<br class="">results, though the tests are really intended as tests of the semantics<br class="">rather than compiler tests, so one has to interpret this with care.<br class=""><br class="">best,<br class="">Peter (for the study group)<br class=""><br class=""></blockquote>_______________________________________________<br class="">ub mailing list<br class=""><a href="mailto:ub@isocpp.open-std.org" class="">ub@isocpp.open-std.org</a><br class="">http://www.open-std.org/mailman/listinfo/ub<br class=""></div></div></blockquote></div><br class=""></body></html>