[ub] Type punning to avoid copying

Ion Gaztañaga igaztanaga at gmail.com
Tue Jul 30 01:03:04 CEST 2013


El 28/07/2013 18:44, Gabriel Dos Reis escribió:

> We shouldn't be doing anything in rash.

Of course, it was just an idea to see if it was in the good direction. 
Probably what I could propose wouldn't make sense for anyone with a bit 
of knowledge of how compilers work and the impact that type of 
suggestions could have on optimizers. Please take my comments more like 
questions from a library writer that wants to understand the problem 
from the compiler point of view.

> Also, I think we shouldn't be doing anything that attempts to circumvent
> constructors.  This issue has very subtle aspects.

Although I know I'm repeating myself, I still can't see how constructors 
will solve the problem, at least if we want to standardize the behavior 
all current compilers have in such use cases. E.g.: we can (at maximum 
optimization levels) construct an object in shared memory in one 
process, map it in another, and just work with it. Is there a need for 
an explicit "pseudo-constructor" (as it shouldn't touch the bytes of the 
object representation) in the second process when the "real" constructor 
was already called in the first process?

I suggested "reinterpret_cast" because semantically it's close to what 
we want to achieve: a indication to the compiler that it should 
"reinterpret" that storage as a new, full-working object, bypassing 
standard constructor/destructor lifetime semantics. The object was 
"externally" modified.

Note that the object can be destroyed without the compiler knowledge 
(another process just calls the destructor in the storage), and another 
type of object could be constructed in that storage. A compiler can't 
have the details on how that object was initialized and when was be 
destroyed, only the programmer knows the details.

Even if (according to my poor suggestion) the reinterpret_cast of 
*pointers* change the lifetime of *objects* (from the compiler point of 
view, from the whole system point of view, the lifetime of the object 
could be different), that shouldn't be a problem if rules are clear for 
both the compiler and the programmer:

///...

1) reinterpret_cast<MyClass*>(&raw_storage);

///Some code
///...

2) reinterpret_cast<MyClass2*>(&raw_storage);

The compiler is instructed so that starting from line 2), it should 
behave as if the the object initialized in 1) was destroyed and another 
object was created in the same place. The precise time this happened is 
not important for the compiler, as the programmer will take care of not 
accessing the object through the pointer obtained in 1) while the 
destruction+mutation operation is being performed. This can be achieved 
with just usual flow control guarantees (like in the network processing 
example, when calling something like "get_packet" in the loop) or in the 
case of shared memory using synchronization primitives.

In case reinterpret_cast was repeated to obtain a pointer to the same 
"object":

///...

1) reinterpret_cast<const MyClass*>(&raw_storage);

///Some code
///...

///Note that the same cast as 1) is applied again
2) reinterpret_cast<const MyClass*>(&raw_storage);

///Some other code
///...

The compiler would think that the object in 1) has been destroyed and 
rebuilt in 2) even if externally no object was modified. That could hurt 
a bit the performance as the compiler must be conservative and it can't 
assume the value of the object has not changed. Programmers writing such 
code can avoid redundant reinterpret_casts, if lifetime semantics are clear.

> Again, I would plead that we approach the issue and possible
> resolutions, not as an arms race against optimizers.  Today we are
> worrying about program transformations that can dispense with certain
> operations ("optimizations"), tomorrow we will worry about safety;
> after tomorrow, security; etc.

I agree. However, if it's possible, legalizing current practice (or at 
least a subset that guarantees current performance + usability), should 
be tried. I hope compiler experts in this group could at least explain 
why the current UB practice still works as expected and what type of 
rules compilers follow to detect such UB practice.

Best,

Ion


More information about the ub mailing list