Pass by Const Reference or Value

ISO/IEC JTC1 SC22 WG21 N3445 = 12-0135 - 2012-09-23

Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org

Introduction
Background
Problem
Approaches
Solution
    Either Or
    Undefined Behavior
    Compiler Dealiasing
Discussion
References

Introduction

Efficiency and expressiveness are hallmarks of C++, but sometimes those hallmarks conflict in ways that force programmers to compromise on one or the other. The progammer's choice of passing a given argument by const reference or by value is one of those compromises. That compromise has become more of a problem with modern calling conventions.

In this paper, we describe the problem, discuss possible approaches to reduce the problem, and explore a solution that introduces a new language feature.

Background

Consider the following code for some class type.

extern type va1(type input);
extern type va2(type input);

void vf1(type& output, type input) {
    output += va1(input);
    output += va2(input);
}

Passing class types by value is expensive, as the compiler must often

In addition to the expenses enumerated above, there is additional expense in reading the bytes because aliasing prevents eliminating some of those indirect accesses, and because there is increased memory cache pressure.

To avoid this expense, the traditional response in C is to pass the type "by pointer". However, in C++, the type can be passed by const reference, which provides both the syntactic convenience of pass-by-value and the efficiency of pass-indirectly.

The performance difference, coupled with the convenience, has resulted in an automatic tendency of programmers to pass classes by const reference. The class template complex (26.4 Complex numbers [complex.numbers]) is a prime example of this tendency.

Unfortunately, the semantics are not the same, as there is now the potential for a parameter to alias with other objects. In the following rewrite of the above example, the first statement may modify the input parameter, thus introducing an error into the code at the second statement.

extern type ra1(const type& input);
extern type ra2(const type& input);

void rf1(type& output, const type& input) {
    output += ra1(input);
    output += ra2(input);
}

Programmers generally have three responses.

Ignore aliasing.

As actual aliasing is uncommon, programmers often forget to consider it, or decide that it will "never happen".

Document aliasing.

The programmer of a function may document when arguments may not alias. In C, the restrict type qualifier [C11restrict] provides this documentation. In essence, this approach gives up on trying to match the operational semantics of primitive scalar types.

Overcome aliasing.

Programmer of a function may write extra code to overcome aliasing. This response eases the burden on the calling programmer at the expense of some loss in efficiency.

The simplest technique to overcoming aliasing is to copy the potentially aliasing parameter.

void rf2(type& output, const type& input) {
    type temp = input;
    output += rf1(temp);
    output += rf2(temp);
}

This technique is both more complex and less efficient than simply passing the parameter by value. While the technique can be useful when dealing with legacy interfaces, it should not be a primary technique.

The next technique is to detect the aliasing and copy the parameter only when necessary.

void rf3(type& output, const type& input) {
    if ( &output == &input ) {
        type temp = input;
        output += ra1(temp);
        output += ra2(temp);
    } else {
        output += ra1(input);
        output += ra2(input);
    }
}

This technique introduces a comparison and may introduce an instruction pipeline bubble. For small classes, this overhead may exceed the cost of simply copying the class.

The next technique is to write the function in two phases. The first phase reads from parameters and writes only to temporaries, not potentially aliasable objects. The second phase writes results. This technique may not be possible if subordinate function calls mutate arguments.

void rf4(type& output, const type& input) {
    type temp1 = ra1(input);
    type temp2 = ra2(input);
    output += temp1;
    output += temp2;
}

This technique requires allocating more local storage, which puts pressure on the memory cache.

One can also mix the last two techniques.

void rf5(type& output, const type& input) {
    if ( &output == &input ) {
        type temp1 = ra1(input);
        type temp2 = ra2(input);
        output += temp1;
        output += temp2;
    } else {
        output += ra1(input);
        output += ra2(input);
    }   
}

The value in this technique depends on several attributes of the platform.

Problem

The latest generation of calling conventions will pass small classes in registers, provided that those classes have a trivial copy constructor and trivial destructor. (This convention was often introduced with the introduction of 64-bit addresses, such as AMD64 [AMD64abi], IA-64 [IA64abi], and SPARC V9 [SparcV9abi].) Thus small classes can have value parameter performance near that of primitive scalar types. Just as importantly, aliasing for such parameters is not possible, and thus they are safer.

The problem is that a programmer writing portable software (or defining widely used interfaces) cannot know whether the software will run on an old calling convention or on a new one. Furthermore, the programmer cannot know what constitutes 'small'. Thus, the programmer has no choice but to pessimize some platforms.

Small classes are quite common. They are used for handles, numbers, coordinates, etc. These classes are often critical to overall application performance. Being unable to write efficient portable programs for these classes is a significant problem.

Approaches

There are at least three approaches to reducing the problem.

Pass small classes by value.

We can change our habits to pass trivially copyable classes of sixteen bytes or less by value. With this approach, complex numbers would primarily be passed by value, not by reference as they are in the class template complex (26.4 Complex numbers [complex.numbers]). This approach avoids aliasing issues, enables removing some indirect references when accessing parameters, but may introduce copying on older platforms. The net performance change is unclear.

Add the restrict qualifier to C++.

The restrict qualifier would at least enable the compiler to remove some indirect references when accessing reference parameters. This approach effectively requires the calling programmer to avoid aliasing. It also does not take full advantage of modern calling conventions.

Add another language feature.

Adding a more carefully targeted language feature may make code less sensitive to the platform and enable compilers to better optimize the program.

As the first two approaches are relatively well understood, we consider only the third approach in detail.

Solution

Our solution is to introduce a syntax for 'input' parameters. These parameters give the compiler the choice of passing the parameter by reference or by value. The Ada programming languages uses this solution for its parameter modes [AdaLRMparam] [AdaRDparam].

Pragmatically, the choice of how to pass such paramters will be defined by some combination of the platform ABI [AMD64abi] [IA64abi] [SparcV9abi] and the C++ ABI [ItaniumC++abi] [SparcC++abi].

As a strawman, we propose a ptr-operator of a declarator of the form

ptr-operator::
| declaration attribute-specifier-seqopt

but with the additional constraints that the const qualifier must be present and that the operator may only be used on parameters.

Example vf1 becomes:

extern type oa1(const type| input);
extern type oa2(const type| input);

void of1(type& output, const type| input) {
    output += oa1(input);
    output += oa2(input);
}

This example works well when the 'input' parameter is actually passed by value. However, when passed by reference, aliasing issues arise and the feature needs detailed semantics for aliasing. We discuss various choices in these semantics below.

Either Or

One semantics choice is that the parameter is either exactly const reference or exactly (const) value. The programmer may assume only the intersection of guarantees.

Under this choice, overcoming aliasing with the technique used in rf3 has the virtue that if pass-by-value is chosen, then the condition becomes statically false, and the condition and corresponding dead code may be eliminated. There is no loss in efficiency for the fast case.

Undefined Behavior

Another choice is to make aliasing undefined behavior. This approach has been used, for example, in Fortran66 [Fortran66args], Ada83 [AdaLRMparam] [AdaRDparam], and C11 [C11restrict]. Taking this approach in C++ would not be novel.

In practice, read-read aliasing causes no trouble, so we only need to make write-write and read-write aliasing undefined. Going further, writes to mutable members that do not change the abstract state of an object produce well-defined behavior.

Fortran, Ada, and C make the prohibition on aliases a global program property, which makes it an incomputable property. In the worst case, this approach could lead to many latent bugs. However, the approach has proven workable in practice because programmers need only follow a few rules to avoid the problem. Calling programmers ensure that arguments do not alias each other. Called programmers ensure that the functions do not access objects accessible by callers.

While in this choice programmer are ultimately responsible for avoiding aliasing, static analysis tools and runtime checks can programmers substantially.

Compiler Dealiasing

We can choose to require that the compiler dealias the arguments. This dealiasing is really only feasible for aliasing between parameters. We must rely on programmers to avoid aliases on non-local objects.

Only this choice tries to preserve the illusion that operations on classes are the same as operations on primitive types.

There are a few strategies in implementing the dealiasing.

Detect in Callers

The compiler does alias analysis at the call site and copies arguments that it cannot prove are unaliased. Many aliases can be eliminated a-priori because temporary values cannot be aliased. The down side is that alias checking is potentially O(n^2).

Detect in Callees

The compiler does conflict analysis in the function implementation and copies parameters that it cannot prove are do not conflict. This approach limits dealiasing to the function body, rather than the more numerous function call sites. The down side is that alias checking is potentially O(n^2).

Leak Information

The compiler can leak information about conflicts from the callee to the caller, enabling a joint detection in both callers and callee. Then the caller can make minimal copies, only those at the intersection of aliases and conflicts. The down side is that alias checking is potentially O(n^2). While this strategy works well in source-based environments, it works less well on systems exploiting ABIs. (Thanks to Chandler Carruth).

Multiple Implementations

The compiler can produce multiple instances of the function, each representing protecting against some subset of conflicts. At the call site, the compiler matches the aliases against the versions of the callee and chooses to call the one needing the least number of dealiasing copies. (Thanks to Chandler Carruth).

Discussion

Suppose we choose to adopt both of the first two approaches above — pass small classes by value and add the restrict qualifier. What do we then loose by not adopting the 'input' parameter solution presented?

References

[AdaLRMparam]
Ada '83 Language Reference Manual, Section 6.2 Formal Parameter Modes, http://archive.adaic.com/standards/83lrm/html/lrm-06-02.html#6.2
[AdaRDparam]
Rationale for the Design of the Ada Programming Language, Section 8.2 Parameter Modes http://archive.adaic.com/standards/83rat/html/ratl-08-02.html#8.2
[AMD64abi]
System V Application Binary Interface, AMD64 Architecture Processor Supplement, Draft Version 0.99.6, http://www.x86-64.org/documentation/abi.pdf, Michael Matz, Jan Hubička, Andreas Jaeger, Mark Mitchell, July 2012, Chapter 3: Low-Level System Information, 3.2 Function Calling Sequence
[C11restrict]
ISO/IEC 9899:2011 Programming languages -- C, http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf, Section 6.7.3.1 Formal definition of restrict
[Fortran66args]
USA Standard FORTRAN (USAS X3.9-1966), (also known as Fortran 66), ftp://ftp.nag.co.uk/sc22wg5/ARCHIVE/Fortran66.pdf, section 8.3.2 Referencing External Functions, section 8.4.2 Referencing Subroutines
[IA64abi]
Itanium Software Conventions and Runtime Architecture Guide, http://www.intel.com/content/dam/www/public/us/en/documents/guides/itanium-software-runtime-architecture-guide.pdf, Section 8.5 Parameter Passing
[ItaniumC++abi]
Itanium C++ ABI, http://mentorembedded.github.com/cxx-abi/abi.html
[SparcC++abi]
The C++ Applicatio Binary Interface, SPARC Processor Supplement Sun Microsystems, Inc., December 1995, Section 3.3: Function Calling Sequence
[SparcV9abi]
SPARC Compliance Definition 2.4.1 http://www.sparc.org/standards/64.psabi.1.34.ps.Z, SPARC Internatlional, Inc., July 1999, Chapter 3: Low-Level System Information, Low-Level System Information (64-bit psABI), Function Calling Sequence