JTC1/SC22/WG21
N1827
N1827=05-0087 An Explicit Override Syntax for C++
2005-08-29
Chris Uzdavinis [cuzdav@gmail.com]
Alisdair Meredith [wg21@alisdairm.net]
I. Table of Contents
Table of Contents
Motivation and Scope
Impact On the Standard
Design Decisions
Proposed Text for the Standard
II. Motivation and Scope
This proposal suggests the inclusion of two new, related, facilities
to the C++ language regarding the declaration of virtual functions.
First, the ability to explicitly declare a function to override
another. Second. the ability to explicitly declare tha a function
does not override another. The compiler will be able to detect when
these intentions are not met and issue the appropriate diagnostic.
,----
| Why is this important? What kinds of problems does it address, and
| what kinds of programmers is it intended to support? Is it based on
| existing practice?
`----
Virtual functions are a cornerstone to traditional C++
object-oriented programming. They are well understood, are easily
learned by novices, are very powerful, and have widespread use.
However, they could be improved. There are certain subtle yet
significant errors that continue to impact programmers writing and
overriding virtual functions, and there are simple solutions to
prevent them from occurring, but require compiler support. These
problems are categorized below.
1) When intending to override a virtual function but accidently
declaring a new function instead, silent errors can result.
Since the derived function is not overriding the base class
function when this was intended, the program is faulty and will
run incorrectly. Unfortunately, there is currently no way to
automatically detect this problem, which shows up with different
flavors:
* misspelling the base class function name, thereby creating a
new function in the derived class
* providing a mismatching argument list, either by
different arity or type
2) When intending to declare a virtual new function but accidently
and unintentionally overriding a base function (that happens to
share its signature in a base class), the results are almost
always wrong. The function may not do what the base class author
requires, and may be called at the wrong times since it
accidently overrode a virtual function. Unfortunately, there is
currently no way to automatically detect this problem.
3) Modifying a base class virtual function signature affects all
derived classes that override it. Failing to update the derived
class signatures accordingly is hard or in some cases impossible
to detect, especially if the base class is from a third-party
library. After such a change, the derived class that previously
overrode the base function no longer does. Behavior has thus
changed silent and subtly. What used to be the overrider is now
a declaration for a new, unrelated function. Derived class
authors may not notice this change, and again, there is currently
no way to automatically detect this problem.
4) Adding a new base class virtual function that happens to match
the signature of a pre-existing derived class function can change
the meaning of the derived function. Previously, the derived
class's function was part of only the derived class's interface,
but suddenly without touching the derived class, it became an
overrider for the new base class function. Quite likely it does
not implement the indented semantics the base class author had,
and certainly was not the intent of the derived class author.
Unfortunately, there is no way to automatically detect this
problem either.
All of these problems can be easily be detected at compile time if
the programmer were able to indicate the intent for writing a
virtual function. For example, if the base class declaration had a
facility to declare a virtual function to be "new" in some way, that
would indicate two things: first, that the programmer explicitly
wishes to introduce a new function, and second, that he also wishes
to disallow that function to ever override another. Such ability
would completely eliminate category 2 and 4 (above) problems for all
code using that facility, because the compiler could catch the
problem at compile-time and issue a diagnostic when it detects the
specification being violated.
Similarly, when writing a function in a derived class, if the
compiler knew that the programmer intended the function to override
a base member, it could easily issue a diagnostic if the function
does not in fact override anything. Such a compile-time error would
completely eliminate categories 1 and 3 for all code using that
facility, by transforming subtle runtime errors into simple compile
time errors.
C++ has many precedents for introducing new features to enable the
programmer to more precisely indicate their intentions, (such as C++
casts, the "explicit" keyword, the "mutable" keyword, and so on.)
It is well understood that such practices result in better, less
error-prone code, and simultaneously, more readable code since the
programmer's intent appears in the code itself.
Thus, we propose a new syntax to allow a programmer to help
eliminate these problems, by allowing a better indication of intent
when writing virtual function declarations. These extra specifiers
are optional, meaning they do not have to be used and existing code
will continue to work as it is, but once introduced into a function
signature, the extra specifications are required in all derived
classes.
,----
| Is there a reference implementation
`----
The Microsoft specification for their ECMA C++/CLI bindings proposes
these same concepts for "ref" classes, though we differ in both the
syntax and precise semantics.
The idea of explicitly overriding virtual functions is common in
other object-oriented languages, including C#, Delphi, and Java
(since version 1.5).
III. Impact On the Standard
,----
| What does it depend on, and what depends on it? Is it a pure
| extension, or does it require changes to standard components? Can it
| be implemented using today's compilers, or does it require language
| features that will only be available as part of C++0x?
`----
This proposal introduces language features that would only be
available as part of C++0x. It depends on extending some syntax for
declaring virtual functions (in a backwards-compatible way). The
end result (when using our new syntax) is that the compiler will
have additional reasons to reject code that previously would
compile, since the compiler can detect discrepancies between the
programmer-specified intent and "reality".
Since nothing currently depends on these new features, and because
the proposed syntax is currently invalid (and thus not present in
any code), no existing program can possibly be affected. New code,
however, that takes advantage of the proposed new features will
enjoy additional compiler-enforced restrictions. The runtime
behavior of the virtual functions remains completely unchanged in
every way.
IV. Design Decisions
Override Specifiers
-------------------
The semantics for explicit override specifiers in a derived class
are straight forward: if the overriding function uses explicit
override qualifiers, it must explicitly list all the immediate base
classes which contain the virtual function signature it is
overriding. That is, if any base that contains a function being
overridden but is not listed, the program is ill-formed.
A virtual function declared "new" in a base class may only be
overridden by a derived class function that uses explicit override
specifiers. If a derived virtual function declaration would
override an explicitly "new" base function, but does not use
explicit override specifiers, the program is ill-formed.
Thus, once a function is explicitly "new", it can only be overridden
using explicit override specifiers.
[Example:
struct A {
virtual void f(); // derived may explicitly override
virtual void g(); // derived may explicitly override
virtual void h() : new; // derived _must_ explicitly override
};
struct B : A {
virtual void f(); // OK, backward compatibility
virtual void g() : A; // OK, optionally explicit
virtual void h(); // Error: requires overrider list
};
--end example]
As shown above, the author of a derived class may opt to explicitly
override a base function that is not explicitly "new". This will
still allow the compiler to report some problems that it may detect
such as A::g() later being removed, or catching if B::g() failed to
match the signature of the base class (due to programmer error).
,----
| Why did you choose the specific design that you did? What
| alternatives did you consider, and what are the tradeoffs? What are
| the consequences of your choice, for users and implementers? What
| decisions are left up to implementers?
`----
SEMANTIC DESIGN DECISIONS
Overall, we wanted to solve all four of the categories of problems
listed in section II. We wanted to make the syntax easy to use,
easy to understand, easy to work with in the simple cases
(e.g. single inheritance), and not too hard in the complicated
cases (complex multiple inheritance.)
The semantics could have taken several directions, while still
remaining true to the overall goal of introducing finer-grained
virtual function declarations. The main alternate semantics we
considered are those as proposed by Microsoft in their Ecma C++/CLI
proposal. Behaving exactly as proposed in Microsoft's Ecma proposal
has a lot of merit for the simple fact that Microsoft is behind it
and they have a lot of standing. Also, some people will learn it
their way using CLI/C++ and assume it applies to "real" C++.
However, while we like the general idea, and do not object to
alternate specifications, we still had to consider many of them and
chose differing behaviors from the CLI/C++ proposal:
* Microsoft's CLI/C++ proposal chose "new" to hide functions
in the base class, while we propose that "new" functions do
not hide a base of the same signature and instead result in
a compile-time error.
We believe that introducing "new" virtual functions with the
same names, arity, and argument and return types in derived
class that are unrelated to a base virtual functions can
result in hard-to-read code, subtle errors, and unexpected
runtime results. Hiding names is problem enough in C++
today and we do not belive the language would benefit from
adding more ways to hide names.
* They introduce a new keyword ("override"), and we do not.
We are reluctant to introduce new keywords because there
are many people who object to doing so. (Though we don't
think that adding a new keyword is as traumatic as some
like to claim.) Our proposal syntax, when used, adds a
colon followed by a comma separated list of base classes
the function overrides, or the keyword "new".
* They allow overriding functions with different names; we do
not. We believe that his complicates code, confuses
entry-level programmers, yet adds little significant
benefit that cannot be handled in other ways without the
feature.
* We added a requirement that derived classes must use the
explicit overrider syntax when any base function being
overridden explicitly is declared to be "new". This
will only affect future code only, however, since no
existing code has any "new" virtual functions today.
We realize that not every possible problem can be solved.
Introducing compile-time errors, even when at the programmer's
request, can potentially cause their own set problems.
SYNTAX DESIGN DECISIONS
The syntax is straight forward. To specify a virtual function to be
"new", the function declaration includes a colon followed by the
keyword "new", after the cv-qualifiers, and before the pure
specifier (if present.) The syntax for a virtual function to
explicitly override its base, insert a colon followed by a
comma-separated list of (immediate) base classes that it overrides.
[Example:
struct A {
virtual void f() : new = 0; // explicit new
};
struct B : A {
virtual void f() : A; // explicit overrider
};
--end example]
We have discussed numerous alternate syntaxes, and judged them for
several categories that are important but subjective, including
readability, flow, proximity of the specifiers to that which they
affect, similarity to other C++ syntactic designs, and overall
adherence to the "spirit" of the C++ syntax. However, we feel that
the concept of explicitly marking virtual functions is more
important than the final syntax chosen to represent them. While we
believe that the syntax we present is good, there are countless
alternate possible syntaxes and virtually any would be an
improvement over not having them at all.
We decided to use the colon to introduce the explicit override
syntax as it nicely separates out the cv-qualifiers, and because it
has similar appearance and semantics as base-specifier-list used to
indicate class inheritance (which is a comma-separated list of base
classes introduced by a colon.) The parallels to inheritance and
overriding virtual functions, in C++ at least, are obvious.
Additionally, the pure-specifiers flow nicely when the explicit
overriders are listed before them. First some rejected examples for
the explicit "new" syntax:
struct B1 { virtual void a(); };
struct B2 { virtual void a(); };
struct B3 { virtual void a(); };
struct X : A, B, C {
// "new" virtual function alternatives
virtual void f() = 0 new; // rejected
virtual void g() = 0, new; // rejected
virtual void h() = 0 : new; // rejected
virtual void i() new = 0; // rejected
};
Obviously this is subjective, however briefly explained why we
discarded the above alternatives: f() appears to be initializing the
function to "0 new", which is misleading. Therefore, the new must
appear before the pure specifier. g() has a similar problem, where
"0, new" raises needless questions surrounding the comma operator
and reads badly. h() isn't horrible, but the new specifier seems
too far apart from the function on which it is referring. i() isn't
bad, but we felt some separation from the rest of the declaration
would be clearer, especially considering cv-qualifiers like X::a().
For the explicit override syntax, there were more considerations, as
there can be a list of base classes whose functions are being
overridden. We considered dozens of syntaxes, some even bizarre.
struct X {
virtual void b() : new;
};
struct Y : X {
virtual void b() = X::b; // 1 like C++/CLI
virtual void b() = X; // 2 C++/CLI "lite"
virtual = X void b(); // 3 associate override with virtual
virtual<X> void b(); // 4 template-like syntax
virtual void b() <X>; // 5 other template-like syntax
virtual void b() const X; // 6 bad with cv-qualifiers
virtual void b() const : X; // 7 our choice
};
We like 1 as a base, since Microsoft has already written a standard
that includes similar features. However, writing some example code
with it started to feel quite redundant and repetitive. The
natural successor would be the "lite" variation, as seen in #2.
This has the appearance, as does #1, of assigning functions, which
is absolutely portraying (in our opinion) the wrong mental picture.
Thus, we tried associating the specifiers with the virtual keyword
(#3), as a way to group it with the feature responsible for it all
and hated it. Unfortunately, these had the strange appearance that
made it harder to read the function declaration, and did not appear
to improve the anything, and actually made it worse. #4 and #5 was
a short experiment trying template syntax but it failed namely
because these features are not template related in any way and the
code is harder to read. #6 was an attempt to get rid of the
equal-sign for the function list, since we're not assigning nor
initializing anything, but the CV-qualifiers bleed into the base
class list. It needed separation.
Eventually we decided the syntax should parallel that of
inheritance, since we that is so intimately related to virtual
functions (at least in the C++ object model.) The colon, like it's
used to introduce the list of base classes in a class declaration,
was a natural selection to introduce the list of base classes
containing the function being overridden. It associates the
specifiers with the function without placing incorrect meaning on
it, (as an equal-sign does), and is visually pleasing even with pure
specifiers. Since this indication only affects compile-time rules,
and in no way affects the signature of the function, we felt
separating this indicator from the CV-qualifiers with some character
was important. Thus, we chose #7.
SEMANTIC DECISIONS
New Function Specifiers
The main choice for the "new" keyword semantics was what should
happen when the specification is violated. That is, if a base class
indeed has a same-signature virtual function, what should be done?
One choice hides the base and introduce a new function anyway. The
other choice (which we took) rejects the code at compile time.
Name hiding is known to be associated with bugs, and its details are
not widely understood by many programmers. We believe that hiding
names in general is far less understood by the community at large,
even with today's C++ rules, and the subtle problems that additional
name-hiding will introduce with virtual functions in a hierarchy
would have more drawbacks in the long term. Thus we hesitate to
introduce more name-hiding when it is not necessary.
First, one important goal is for the compiler to find and alert the
programmer to potential problems. Allowing a program to compile and
run in this situation (a derived class function hides a base that it
otherwise would override) is dangerous. For example, if a 3rd party
supplies your base class, and in an upgrade they add a new virtual
function that your derived class already declares as being "new",
there are serious problems that need addressing and should not
simply be ignored. If the code compiles, then the derived class is
ignoring the recently-added function, and the base and derived
classes are operating in conflicting ways regarding the meaning of
that particular function. The derived class author may not even
realize that the base has changed and that such a problem exists.
Also, people may expect certain polymorphic behavior that they will
not get. Worse, when you author a most-derived class, you must
investigate the entire hierarchy to determine which base class's
virtual function you're actually overriding, especially if you
intend to override the most-base class version, you have to ensure
that no intermediate base redeclares the function as being "new".
On the other hand, given a compile-time error when this happens, the
library customer has immediate awareness of the problem. If a
library interface changes, the client may very well need to be aware
of such a major thing, and must resolve it. While it is certainly
more convenient to "just compile", that does nothing for
correctness, and finding compile-time problems are generally
preferable to finding runtime problems, all else equal.
The additional overhead on the programmer, in re-specifying all the
base classes which contain functions that are being overridden is
not taken lightly. While one could argue that specifying that a
function "overrides something" and left it at that, the notation
certainly would be a little easier (in the case of multiple
inheritance with multiple simultaneous overrides), but it would be
at the cost of not being able to detect accidental overrides due to
base class changes (either inserting new virtual functions or
changing the interface of an existing function, and the derived
class silently overrides another function.) Further, since the
common case is single inheritance, the syntax is simple. This point
is arguable on either side, and the simplicity of notation may very
well be worth sacrificing the compiler's ability to detect this
problem. (Either way, override specifiers in some form are still
considered to be superior to none at all.) We chose to maximize
potential to detect errors.
V. Proposed Text for the Standard
Section 9.2 (and Annex A), modification to the grammar:
member-declarator:
declarator override-constraint-specifier(opt) pure-specifier(opt)
declarator constant-initializer(opt)
identifier(opt) : constant-expression
override-constraint-specifier
: override-constraint
override-constraint:
new
overridden-name-list
overridden-name-list:
id-expression
overridden-name-list , id-expression
9.2 p7
The decl-specifier-seq is omitted in constructor, destructor, and
conversion function declarations only. The member-declarator-list
can be omitted only after a class-specifier, an enum-specifier, or
a decl-specifier-seq of the form friend elaborated-type-specifier.
! An override-constraint-specifier and/or a pure-specifier shall be
used only in the declaration of a virtual function (10.3).
10.3/p2
(BEFORE)
If a virtual member function vf is declared in a class Base and in a
class Derived, derived directly or indirectly from Base, a member
function vf with the same name and same parameter list as Base::vf
is declared, then Derived::vf is also virtual (whether or not it is
so declared) and it overrides Base::vf. For convenience we say
that any virtual function overrides itself. Then in any well-formed
(AFTER) If a virtual member function vf is declared in a class Base
with either no override-constraint-specifier or with the
override-constraint of 'new', and in a class Derived, derived
directly or indirectly from Base, a member function vf with the same
name and same parameter list as Base::vf is declared, then
Derived::vf is also virtual (whether or not it is so declared) and
it overrides Base::vf if either 1) it lists Base in its
overridden-name-list, or 2) neither Base::vf nor Derived::vf have an
override-constraint-specifier. If Derived overrides a bast function
with an override-constraint of 'new' then Derived::vf must have an
overridden-name-list, and it must list every immediate base class
containing the function signature being overridden. If Derived::vf
specifies an override-constraint of 'new', but would otherwise
override any base function, the program is ill-formed. Destructors
may not have an override-constraint-specifier. [Example:
class B {
public:
virtual void e() = 0;
virtual void f() = 0;
virtual void g() : new;
virtual void h() : new;
};
class B2 {
public:
virtual void g : new;
virtual void h();
};
class D : public B {
virtual void e(); // OK, current C++
virtual void f() : B; // OK, explicit override
virtual void g(); // Error, B::g is new, and
// requires explicit
// override specifications
virtual void i() : B; // Error: no B::i
};
class D2 : public B, public B2 {
virtual void g : B, B2; // OK
virtual void h : B; // Error, omits listing B2
--end example]
When multiple base classes offer the same function for a derived
class to override, if the derived class uses explicit override
syntax it must list all the base classes offering the function. The
base-override-list must always be complete (specifying all bases it
can possibly override) and accurate (listing only bases it can
override.) [Example:
class B1 {
public:
virtual void f() = 0;
};
class B2 {
public:
virtual void f() = 0;
};
struct D : public B1, public B2 {
virtual void f() : B1, B2; // OK: explicitly override
// both base functions
};
struct D2 : public B1, public B2 {
virtual void f() : B1; // error: does not list B2
};
-- end example]