ISO/IEC JTC1 SC22 WG21 N3445 = 12-0135 - 2012-09-23
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Introduction
Background
Problem
Approaches
Solution
Either Or
Undefined Behavior
Compiler Dealiasing
Discussion
References
Efficiency and expressiveness are hallmarks of C++, but sometimes those hallmarks conflict in ways that force programmers to compromise on one or the other. The progammer's choice of passing a given argument by const reference or by value is one of those compromises. That compromise has become more of a problem with modern calling conventions.
In this paper, we describe the problem, discuss possible approaches to reduce the problem, and explore a solution that introduces a new language feature.
Consider the following code for some class type
.
extern type va1(type input);
extern type va2(type input);
void vf1(type& output, type input) {
output += va1(input);
output += va2(input);
}
Passing class types by value is expensive, as the compiler must often
In addition to the expenses enumerated above, there is additional expense in reading the bytes because aliasing prevents eliminating some of those indirect accesses, and because there is increased memory cache pressure.
To avoid this expense, the traditional response in C is to pass the type "by pointer". However, in C++, the type can be passed by const reference, which provides both the syntactic convenience of pass-by-value and the efficiency of pass-indirectly.
The performance difference, coupled with the convenience,
has resulted in an automatic tendency of programmers
to pass classes by const reference.
The class template complex
(26.4 Complex numbers [complex.numbers])
is a prime example of this tendency.
Unfortunately, the semantics are not the same, as there is now the potential for a parameter to alias with other objects. In the following rewrite of the above example, the first statement may modify the input parameter, thus introducing an error into the code at the second statement.
extern type ra1(const type& input);
extern type ra2(const type& input);
void rf1(type& output, const type& input) {
output += ra1(input);
output += ra2(input);
}
Programmers generally have three responses.
- Ignore aliasing.
As actual aliasing is uncommon, programmers often forget to consider it, or decide that it will "never happen".
- Document aliasing.
The programmer of a function may document when arguments may not alias. In C, the
restrict
type qualifier [C11restrict] provides this documentation. In essence, this approach gives up on trying to match the operational semantics of primitive scalar types.- Overcome aliasing.
Programmer of a function may write extra code to overcome aliasing. This response eases the burden on the calling programmer at the expense of some loss in efficiency.
The simplest technique to overcoming aliasing is to copy the potentially aliasing parameter.
void rf2(type& output, const type& input) {
type temp = input;
output += rf1(temp);
output += rf2(temp);
}
This technique is both more complex and less efficient than simply passing the parameter by value. While the technique can be useful when dealing with legacy interfaces, it should not be a primary technique.
The next technique is to detect the aliasing and copy the parameter only when necessary.
void rf3(type& output, const type& input) {
if ( &output == &input ) {
type temp = input;
output += ra1(temp);
output += ra2(temp);
} else {
output += ra1(input);
output += ra2(input);
}
}
This technique introduces a comparison and may introduce an instruction pipeline bubble. For small classes, this overhead may exceed the cost of simply copying the class.
The next technique is to write the function in two phases. The first phase reads from parameters and writes only to temporaries, not potentially aliasable objects. The second phase writes results. This technique may not be possible if subordinate function calls mutate arguments.
void rf4(type& output, const type& input) {
type temp1 = ra1(input);
type temp2 = ra2(input);
output += temp1;
output += temp2;
}
This technique requires allocating more local storage, which puts pressure on the memory cache.
One can also mix the last two techniques.
void rf5(type& output, const type& input) {
if ( &output == &input ) {
type temp1 = ra1(input);
type temp2 = ra2(input);
output += temp1;
output += temp2;
} else {
output += ra1(input);
output += ra2(input);
}
}
The value in this technique depends on several attributes of the platform.
The latest generation of calling conventions will pass small classes in registers, provided that those classes have a trivial copy constructor and trivial destructor. (This convention was often introduced with the introduction of 64-bit addresses, such as AMD64 [AMD64abi], IA-64 [IA64abi], and SPARC V9 [SparcV9abi].) Thus small classes can have value parameter performance near that of primitive scalar types. Just as importantly, aliasing for such parameters is not possible, and thus they are safer.
The problem is that a programmer writing portable software (or defining widely used interfaces) cannot know whether the software will run on an old calling convention or on a new one. Furthermore, the programmer cannot know what constitutes 'small'. Thus, the programmer has no choice but to pessimize some platforms.
Small classes are quite common. They are used for handles, numbers, coordinates, etc. These classes are often critical to overall application performance. Being unable to write efficient portable programs for these classes is a significant problem.
There are at least three approaches to reducing the problem.
- Pass small classes by value.
We can change our habits to pass trivially copyable classes of sixteen bytes or less by value. With this approach, complex numbers would primarily be passed by value, not by reference as they are in the class template
complex
(26.4 Complex numbers [complex.numbers]). This approach avoids aliasing issues, enables removing some indirect references when accessing parameters, but may introduce copying on older platforms. The net performance change is unclear.- Add the
restrict
qualifier to C++.The
restrict
qualifier would at least enable the compiler to remove some indirect references when accessing reference parameters. This approach effectively requires the calling programmer to avoid aliasing. It also does not take full advantage of modern calling conventions.- Add another language feature.
Adding a more carefully targeted language feature may make code less sensitive to the platform and enable compilers to better optimize the program.
As the first two approaches are relatively well understood, we consider only the third approach in detail.
Our solution is to introduce a syntax for 'input' parameters. These parameters give the compiler the choice of passing the parameter by reference or by value. The Ada programming languages uses this solution for its parameter modes [AdaLRMparam] [AdaRDparam].
Pragmatically, the choice of how to pass such paramters will be defined by some combination of the platform ABI [AMD64abi] [IA64abi] [SparcV9abi] and the C++ ABI [ItaniumC++abi] [SparcC++abi].
As a strawman, we propose a ptr-operator of a declarator of the form
- ptr-operator::
|
declaration attribute-specifier-seqopt
but with the additional constraints
that the const
qualifier must be present
and that the operator may only be used on parameters.
Example vf1
becomes:
extern type oa1(const type| input);
extern type oa2(const type| input);
void of1(type& output, const type| input) {
output += oa1(input);
output += oa2(input);
}
This example works well when the 'input' parameter is actually passed by value. However, when passed by reference, aliasing issues arise and the feature needs detailed semantics for aliasing. We discuss various choices in these semantics below.
One semantics choice is that the parameter is either exactly const reference or exactly (const) value. The programmer may assume only the intersection of guarantees.
Under this choice,
overcoming aliasing with the technique
used in rf3
has the virtue that if pass-by-value is chosen,
then the condition becomes statically false,
and the condition and corresponding dead code may be eliminated.
There is no loss in efficiency for the fast case.
Another choice is to make aliasing undefined behavior. This approach has been used, for example, in Fortran66 [Fortran66args], Ada83 [AdaLRMparam] [AdaRDparam], and C11 [C11restrict]. Taking this approach in C++ would not be novel.
In practice, read-read aliasing causes no trouble, so we only need to make write-write and read-write aliasing undefined. Going further, writes to mutable members that do not change the abstract state of an object produce well-defined behavior.
Fortran, Ada, and C make the prohibition on aliases a global program property, which makes it an incomputable property. In the worst case, this approach could lead to many latent bugs. However, the approach has proven workable in practice because programmers need only follow a few rules to avoid the problem. Calling programmers ensure that arguments do not alias each other. Called programmers ensure that the functions do not access objects accessible by callers.
While in this choice programmer are ultimately responsible for avoiding aliasing, static analysis tools and runtime checks can programmers substantially.
We can choose to require that the compiler dealias the arguments. This dealiasing is really only feasible for aliasing between parameters. We must rely on programmers to avoid aliases on non-local objects.
Only this choice tries to preserve the illusion that operations on classes are the same as operations on primitive types.
There are a few strategies in implementing the dealiasing.
- Detect in Callers
The compiler does alias analysis at the call site and copies arguments that it cannot prove are unaliased. Many aliases can be eliminated a-priori because temporary values cannot be aliased. The down side is that alias checking is potentially O(n^2).
- Detect in Callees
The compiler does conflict analysis in the function implementation and copies parameters that it cannot prove are do not conflict. This approach limits dealiasing to the function body, rather than the more numerous function call sites. The down side is that alias checking is potentially O(n^2).
- Leak Information
The compiler can leak information about conflicts from the callee to the caller, enabling a joint detection in both callers and callee. Then the caller can make minimal copies, only those at the intersection of aliases and conflicts. The down side is that alias checking is potentially O(n^2). While this strategy works well in source-based environments, it works less well on systems exploiting ABIs. (Thanks to Chandler Carruth).
- Multiple Implementations
The compiler can produce multiple instances of the function, each representing protecting against some subset of conflicts. At the call site, the compiler matches the aliases against the versions of the callee and chooses to call the one needing the least number of dealiasing copies. (Thanks to Chandler Carruth).
Suppose we choose to adopt both of the first two approaches above —
pass small classes by value and add the restrict
qualifier.
What do we then loose by
not adopting the 'input' parameter solution presented?
restrict