[ub] Justification for < not being a total order on pointers?

Jeffrey Yasskin jyasskin at google.com
Tue Aug 27 17:48:50 CEST 2013


Thanks for chiming in!

On Tue, Aug 27, 2013 at 1:00 AM, Xavier Leroy <Xavier.Leroy at inria.fr> wrote:
>> [Semantics of pointer comparisons]
>> Xavier Leroy, the author of CompCert, and I have had long discussions
>> about this about a year ago.  He is on the list, so he will add
>> clarifications or corrections when he gets a chance.  Last time (last
>> Spring) we talked again about this, he said he has made some adjuments,
>> but I don't recall the details.
>
> Yes, I'll chime in.  Just a bit of context for the members of this
> list: as part of the formal verification of the CompCert compiler, I
> and others have been giving formal semantics to a large subset of ISO
> C, and (with Tahina Ramananandro and Gabriel Dos Reis) to some aspects
> of C++ too.
>
> >From the standpoint of formal semantics, it is easier to account for
> undefined behaviors (when everything can happen, incl. crashing) than
> for defined-but-unspecified behaviors (when several but not all
> outcomes are possible).  And of course accounting for
> defined-and-uniquely-specified behaviors is easier too.
>
> The example that Gabriel was alluding to is equality tests between a
> pointer "one past" the end of an object and a pointer to the beginning
> of another object.  In C and C++, if I'm not mistaken, this is defined
> behavior but can return true or false at the whim of the
> implementation.  This nondeterminism was difficult to account for in
> CompCert's semantics, so in the end we treat it as undefined behavior,
> while still defining comparisons involving pointers "one past" as long
> as the pointers are within the same object.
>
> Coming back to the proposal of making "<" a total (but unspecified
> order on pointers):
>
> - As a semanticist, I think it would be challenging to formalize, for
>   the general reasons given above.

How do you model std::less<T*>, which already requires this total but
unspecified order?

> - As a programmer, I see how this could be very useful to implement
>   various generic data structures, or to choose a "canonical"
>   representative between several semantically-equivalent data structures.
>
> - As a compiler writer, I can see a few useful optimizations that this
>   change would invalidate.  Here is an example:
>
>       int x[2];
>       int * p;
>       ...
>       if (p <= x + 1) {...} else {...}
>
>   With the current semantics, "p <= x + 1" is only defined if p points
>   within x, so it is equivalent to "p != x + 2", which gives useful
>   aliasing information in the two branches of the conditional.
>   This is a contrived example, though.
>
> - As an amateur of garbage-collected languages, I see a problem with
>   relocating garbage collection.  (C++ doesn't require a GC, but it is
>   my understanding that the C++ committee is trying to make C++
>   compatible with the use of a GC.)  A relocating GC can move objects
>   around, changing their absolute addresses.  So, the unspecified
>   total order on pointers could change during the execution of the
>   program: before a GC, we have "p < q"; then q is moved, and then
>   "p > q".  This would make the total order unusable to
>   e.g. order binary search trees...


More information about the ub mailing list