ISO:       WG21/N0369
            ANSI:      94-0009
            Author:    John Max Skaller
            Date:      27 January 1994
            Reply to:  maxtal@suphys.physics.su.oz.au


The "ONE DEFINITION RULE"
-------------------------

A "definition" means sequence of tokens parsed as a definition.
[Editor to use italics]

The "One Definition Rule" is a constraint on the writer
of source files that are pre-processed into translation units.

"Diagnosable" in this paper means the language processor
is required to issue a diagnostic.

Parts of the ODR are not diagnosable, allowing the language translator
to assume the programmer obeyed the ODR without checking.
This means the programmers is responsible to follow the rules
of the ODR or accept undefined behaviour.

Other parts or associated rules are diagnosable because they
can be, and this is a constraint on the language processor
to check for a violation and issue a diagnostic.

The "category" of an entity in this paper is either

    (1) namespace,
    (2) function,
    (3) variable,
    (4) type,
    (5) enumerator, or
    (6) template,

and refers to all the things which can be declared, defined,
or constructed and given some form of user defined name in C++.

A function, variable, or enumerator has a type.
Namespaces dont have types.
Classes and typedef names are types.
Class templates are parameterised types.
Function templates have parameterised types.

The "type" of an entity in this paper is its category,
and, if it has a type, its type,
or, if it is a type, that type.

The exact proposal for the ODR is given below. The structure
of the proposal is:

   A) ODR:
   The ODR specifies how many definitions, and where
   entities with external linkage may have. The basic rule
   is: only one definition is permitted except for classes,
   inline functions, and templates, which may have one
   per translation unit, provided they're token equivalent.

   B) EQUIVALENT DEFINITIONS:
   Token equivalence is defined as an identical sequence of
   tokens, with each identifier that refers to an entity
   outside the definition and which does not have external
   linkage required to be equivalently defined in each
   translation unit (that is, the specification is recursive).


   C) LINKAGE SPECIFICATIONS:
   Everything to which the ODR ought to apply is deemed
   to have external linkage. This includes all member functions,
   of a class with external linkage. Inline extern globals
   are permitted as a side effect. It includes static local
   variables of functions with external linkage.

   However, the ODR may cause functions with internal linkage
   to be constrained to be equivalent, in this case the
   form of the definition is enhanced by additional
   constraints:inline extern functions may not refer to
   static variables with internal linkage.

   D) TEMPLATE NAME BINDING.
   The exact rule for name binding of templates are, or should be,
   specified in the Working Paper on templates.

   E) DECLARATION MATCHING.
   When a name binds to an entity with external linkage,
   the declarations must match (there is no issue of
   equivalent definitions: if the definitions are duplicated,
   the ODR applies independently, but we may only see a
   declaration, and token equivalence does not apply to declarations.

   F) UNNAMED TYPES.
   Some types in classes with external linkage are unnamed.
   Matching rules that work by linkage are required if these
   are to be supported. Like names in the unnamed namespace,
   we synthesise a unique name that the user cant actually
   use: the entities have external linkage.


Problem - Functions
-------------------

In C, a function may have only one definition in a program.
Duplicate definitions can be detected by the linker.
If the definition of a function is visible, calling the function can
sometimes be optimised by inlining.
If a function is to be used in more that one translation unit,
then optimisation opportunities are lost if the definition
cannot be given in each translation unit.

Therefore, C++ allows multiple definitions of functions in a program.
It is clearly not the intent of allowing multiple definitions that the
definitions define completely different functions and so the question
arises what constitutes equivalent definitions in principle,
what rules should be specified in the Standard to constrain duplicate
definitions, and which of these rules requires a diagnostic be issued
if the rule is broken.

Problem - Variables
-------------------

In C, whole variables may only have one definition in a program.
Again, duplicate definitions can be detected by the linker.
C++ follows the C rules for ordinary variables, however the advent
of templates introduces complications in that the need for instantiation
of variables may arise in more than one translation unit.
Note there is no issue of "duplicate" definitions here:
variables created by instantiating a template do not have a definition
in the precise, grammatic sense used in this paper.

Problem - Types
---------------

In C, types in different translation units are distinct.
However, types must be used in declaring functions and variables,
and linkage across translation unit boundaries must require some form
of compatibility of the types involved so that code generated in
a translation unit with access only to a declaration does not corrupt memory.

In C, this problem is addressed by introducing the notion of compatible types.
These rules allow structurally equivalent declarations to be considered
equivalent, and permits structurally equivalent structure definitions in
distinct translation units to be associated.

[These rules are flawed, since the relation so defined is not
an equivalence relation. In particular, its not transitive,
since A in translation unit 1 may be structurally equivalent
to B in translation unit 2 which is structurally equivalent to C in
translation unit 1 which is not structurally equivalent to A.
Its easy to show that A is both structurally equivalent
to itself and not, a contradiction rendering structural
equivalence inconsistent.]

C++ has a completely different strategy.
Instead of utilising some notion of structural compatibility,
C++ requires name equivalence.

This operates in two ways: first, the structural equivalence of declarators
is the same as for C, but such declarators are considered to be type names.
For example: int* is equivalent to int*, not because they are structurally
equivalent, but because 'int*' is deemed a name. For this reason,
a typedef name is not the name of a type, but the name of an alias.
There are rules for name formation that 'factor out' any typedefs,
the phrase used to describe names identical after typedefs have
been replaced is 'modulo typedefs'.

Name identity is exactly a form of equivalance, for example
given

  typedef const int CI;

the names

  const int*
  int const*
  CI*
  CI const*

are identical despite being "spelled" differently. Its important
to note this case: the rules for equivalence of definitions
given below, namely token equivalence, are stricter than
the rules for name identity: none of the above name
would be accepted as alternatives for the others in token
equivalent definitions.

Secondly, class types in C++ may have external linkage, so that the use of
a type name -- or type name suitably composed from a type name with
external linkage -- in two translation units denotes the same type.
It is clear that the definitions or declarations of types
with external linkage must be compatible in some sense similar to that
used by C.

The change in orientation to use of names rather than structural equivalence
for types is a consequence of a corresponding weakening of the
C rules for functions:
Not only may several functions have the same name, which is called
overloading, but inlining allows duplicate definitions of the same function.
Thus the requirements on types are strengthened.
Indeed, functions are identified by a combination of their fully
qualified name and signature, and the signature constitutes type information.
The difficulty with types is not merely a matter of optimisation, however:
duplicate definitions of types are mandatory since they are required
even for mere declaration of variables and functions.

Problem - enumerators and typedefs
----------------------------------

In C, an enumerator is simply a named constant integer value.
The enumerator name is always translation unit local.

In C++, the status of enumerators is not clear.
First, there is no need to speak of equivalence for enumerators:
two enumerators either have the same value or they dont, the values
are known at compile time.

Second, there is no particular reason that enumerators should
have external linkage, or in any way require that the same
enumerator name have the same value in distinct translation units.

However, the token equivalent formulation of the ODR is based
on sequences of tokens of declarations and definitions
being equal, and identifiers binding to equivalent definitions.

Such a formulation requires equivalence to be defined for every
category of user defined name, so typedefs and enumerators must
be included in the list.


Problem - templates
-------------------

A function or class template has a definition, and these
are considered equivalent or not depending on the same
general token equivalence rules as for other definitions.

There are a couple of problems with templates. One is
that it isnt clear if or when template names have external linkage.
If the dont, the ODR doesnt apply to them.

The second problem is specifying the linkage of template
instances. For classes, the usual rules apply, but for
functions that is not adequate. Its necessary to know,
because functions with external linkage may not have argument
types depending on type with internal linkage, and because
the address of a function with internal linkage cannot
be the same as that of a function with the same name
and signature in another translation unit,
whereas the address of a function with external linkage
is required to be the same in all translation units.

However, there are several additional special,
and severe problems with templates.

First, its necessary to note that the ODR and the definition
of token equivalence is a constraint applied to the user.
So there is no issue of generated functions being equivalent
in the sense of the ODR: if behavioural equivalence is required
for consistency of the language, it must be guarranteed by
other means.

Typically, a mechanism for specifying the
semantics of generated entities is specified by an algorithm
which is context sensitive in ways that matter only if the
ODR is already broken. For example, the generated copy constructor
is defined to copy members and bases, and the behaviour of
such a constructor is the same independently of when, where, or
how its generated: the ODR is not required and cannot apply.

The problem that arises is with specialisations.
A specialisation, in some sense, is a deliberate hijacking
of the semantics of what would otherwise be a generated
definition. More than one specialisation is in error,
and the ordinary ODR applies to specialisations with
external linkage. But the ODR cant help prevent hijacking,
because there is no "definition" of the unspecialised
template instance.


Problem - Declaration matching
------------------------------

The attributes of declarations in a single translation unit,
and the corresponding attributes of associated declarations
or definitions need not match exactly.
Certain attributes must be specified in each declaration or definition
in order that the declarations and definitions be bound together,
so as to be considered to name the same entity
(in addition to rules about the scopes of the declarations and definitions).

These attributes are the fully qualified name in the case of a variable or type,
or the name and signature in the case of a function.
In addition, other attributes must have non-conflicting specifications if the identifying attributes match. These include any linkage specification, the type of a variable, the return type of a function, and default arguments of prior declaration must be duplicated in subsequent ones.
In addition to rules for intra-translation unit consistency,
there are rules for consistency across translation unit boundaries.
These rules must apply if the declarations or context nominate
the same identifying information and specify external linkage.
Declaration matching is not covered in this paper.

Equivalent Definitions
----------------------

Given that two declarations with external linkage match,
any definitions must also be equivalent in some sense to ensure consistency.

Two definitions are semantically equivalent if they may be used
interchangeably without any detectable behavioural differences:
this does not include memory use or execution speed.

Unfortunately, this condition is inherently undiagnosable:
the problem is equivalent to the halting problem.

What we need is to specify a condition that is both mechanically
testable and for which any two definitions that compare equivalent
are also semantically equivalent.

If certain semantically equivalent definitions turn out to
not compare equivalent by the mechanical tests, that is
merely a consequence of the undecidable nature of semantic
equivalence.

What we aim for is to cover a large class of common cases.
In particular, the intended method of sharing definitions
between translation units is by #including header files.


The Solution: the "One Definition Rule"
----------------------------------------

When the ODR permits more than one definition of a function, type,
or variable in different translation units, the understanding
is that the entity has external linkage.

Entities with internal linkage are translation unit local,
and may never have more than one definition---the issue of multiple
definitions and their equivalence can only arise across translation
unit boundaries, and then only when the definitions are
associated by use of the same external name.

In particular, we're talking about multiple definitions of the _same_ entity.
In the following, the word "class" generally means "class", "struct"
or "union".

Templates have external linkage, this permits multiple definitions
of templates.
The address of a function or variable with external linkage is unique,
whereas the addresses of functions with internal linkage are distinct.
This is because entities with the same external name are the same entity,
and so must have the same address, whereas entities with internal linkage
are not associated across translation unit boundaries,
and being distinct entities cannot have the same address.

In the following, a name "interior" to a definition is one defined
in the body of the definition, other entities are "exterior" to the
definition. For example, local static variables are
interior to the definitions of their containing functions,
as are local classes.

PROPOSAL: EQUIVALENCE
---------------------

For the purpose of the ODR, two definitions or expressions are equivalent
if they are token equivalent. The use of "token equivalent" indicate the
particular form of equivalence described below,
and also indicates the context (since equivalent has many meanings).

Two definitions or expressions are said to be "token equivalent"
if the following conditions are met.
Each rule is introduced by first indicating a problem and then the solution.


----------------------------------------------------------------------------
TE1) "same tokens"

Problem: Consider the example

  // EX1
  // file 1
  struct X {
    static int f() { return 1; }
  };

  // file 2
  struct X {
    int a;
    static int f() { return 2; }
  };

There are two classes and two functions both with external linkage
but the definitions are not semantically equivalent.

Solution: The rule

  +----------------------------------------------------------------+
  |                                                                |
  | Equivalent definitions or expressions consist of the same      |
  | sequence of tokens.                                            |
  |                                                                |
  +----------------------------------------------------------------+

Example: The sequence

  // EX2
  int f() { return 1; }

given in two translation units are token equivalent and equivalent.

Resolution: EX1 doesn't obey TE1.

----------------------------------------------------------------------------
TE2) "recursive application"

Problem: In the two files given below the definitions of f()
obey TE1 but are not semantically equivalent:

  // EX3
  // file 1
  enum {one};
  static int x;
  int f() { return x+one; }

  // file 2
  enum {two, one};
  static float x;
  int f() { return x+one; }

Rule:

  +----------------------------------------------------------------+
  |                                                                |
  | If a token in one definition or expression is bound to a       |
  | single entity with internal linkage outside the definition     |
  | or expression,                                                 |
  | then the corresponding token in the other definition           |
  | shall be bound to a single entity with the same type           |
  | outside the definition, and the definitions of these two       |
  | entities shall be equivalent;                                  |
  |                                                                |
  | except that in the case of an enumerator the definition        |
  | of the enumeration types shall be equivalent and the           |
  | values of the enumerators equal.                               |
  |                                                                |
  +----------------------------------------------------------------+

Resolution: The f() in EX3 does not obey TE2.

Comment: If "no linkage" is introduced, then the above rule must
be modified to say "internal or no linkage".

Comment: External linkage is not covered by this rule.
That is because there need not be a visible definition of an entitiy
with external linkage, and because the ODR will independently require
equivalence of any definitions for that name anyhow.

----------------------------------------------------------------------------
TE3) "identity of externals"

Problem: In the example the definitions of f() obey
TE1-2 but are not equivalent.

  //EX4
  // file 1
  namespace X { int g() { return 1; } }
  using namespace X;
  int f() { return g(); }

  // file 2
  namespace Y { int g() { return 1; } }
  using namespace Y;
  int f() { return g(); }

Rule:

  +----------------------------------------------------------------+
  |                                                                |
  | If a token in one definition or expression is bound to an      |
  | entity with external linkage, then the corresponding token     |
  | in the other definition or expression                          |
  | shall be bound to the same entity.                             |
  |                                                                |
  +----------------------------------------------------------------+

Resolution: The f() in EX4 a bind to two distinct g()
so EX4 doesn't obey TE3.

----------------------------------------------------------------------------
TE4) "implicitly evaluated default arguments are equivalent"

Problem. The f() in EX5 below does obey TE2 and are not equivalent:

  // EX5
  // file 1
  extern int g(int x = 1);
  int f() {return g(); }

  // file 2
  extern int g(int x = 2);
  int f() {return g(); }

Rule:

  +----------------------------------------------------------------+
  |                                                                |
  | If corresponding function calls in equivalent definitions      |
  | or expressions                                                 |
  | cause a default argument to be evaluated the argument          |
  | expressions shall be equivalent.                               |
  |                                                                |
  +----------------------------------------------------------------+

Resolution: f() in EX5 does not obey TE3.


----------------------------------------------------------------------------
TE5) "No address taking of internal linkage"

Problem: The f() in EX6 are not equivalent:

  //EX6
  // file 1
  static int x;
  int *f() {return &x; }

  // file 2
  static int x;
  int *f() {return &x; }

Rule:

  +----------------------------------------------------------------+
  |                                                                |
  | The address of a variable or function with internal linkage,   |
  | or local reference bound to one, shall not be taken in two     |
  | equivalent definitions or expressions.                         |
  |                                                                |
  +----------------------------------------------------------------+

Comment: The reason I permit local references
bound to variables with internal linkage is explained after the next rule.

----------------------------------------------------------------------------
TE6) "Limited use of variables with internal linkage"

Problem: f()

  //EX7
  // file 1
  static int x;
  const int y = 1;
  int f() {return x; } // true

  // file 2
  static int x;
  int f() {return x; } // false

  +----------------------------------------------------------------+
  |                                                                |
  | In two equivalent definitions or expressions                   |
  | a variable with internal linkage or a reference bound to one   |
  | shall not be used in an expression, or be passed to a function,|
  | unless it is const and not volatile and has no mutable         |
  | components and has equivalent initializers                     |
  | and has a type with external linkage and any default arguments |
  | of the constructor are equivalent.                             |
  |                                                                |
  +----------------------------------------------------------------+

For example:

  //EX8: equivalent
  const int x = 1;
  int f() {return 1;}

  //EX9: inequivalent
  static int x =1;
  int f() {return 1;}

Comment: The explanation of TE5 and TE6 is as follows.
Taking the address of a variable with internal linkage
yields a pointer which may be dynamically manipulated
so that its use cannot be easily detected, so we do not
permit the address taking itself.

Similarly, passing a variable with internal linkage by reference
to a function allows that function to take the address.

Disallowing that permits us to assume that any parameters
of the function have external linkage, and so taking
the address of a parameter is allowed.

However, binding a local reference to an entity with internal
linkage is permitted, because its known when this occurs
statically, and the use of the reference itself checked,
as if it were a variable with internal linkage.

In-class references must be treated as parameters of functions
because they're bound by mem-initialisers.

The set of conditions that form the exception to the rule may
be described as "constants". Provided the constants have
equivalent initialisers and cant be changed in any way,
they have the same value in both translation units,
and so there is no reason to prohibit their use.

There is, however, every reason to allow them:

  // EX10
  const int n = 10;
  int f() { return n; }

is sure to be common usage. The same applies to inline
functions: functions are inherently "const". So there is
no reason to exclude calling a function with internal linkage.
But there is every reason to allow them:

  // EX11
  inline int max(int x, int y) { return x>y ? x : y; }
  int f(int x, int y) { return max(x,y); }

is sure to be common usage.


----------------------------------------------------------------------------
TE7) "Equivalence of enumerations"

  +----------------------------------------------------------------+
  |                                                                |
  |  If an enumerator is used, in equivalent defintions or         |
  |  expressions, then the enumeration types shall be equivalent.  |
  |                                                                |
  +----------------------------------------------------------------+

Comment: Enumerators dont have "definitions", only the enumeration
types in which they are assigned a value. Requiring the definitions
to be equivalent is necessary for two reasons: you can do overload
resolution based on an enumerator argument, so the type is
important, and, the size of the enumerator itself is determined only
by the definition enumerator.


----------------------------------------------------------------------------
TE8)

  +----------------------------------------------------------------+
  |                                                                |
  | The following are  explicitly permitted in equivalent          |
  | definitions or expressions                                     |
  |                                                                |
  | 1) any use of a variable, function or type with external       |
  |     linkage,                                                   |
  |    since these independently obey the ODR                      |
  |    and the definitions will thus be equivalent                 |
  |                                                                |
  | 2) calls that cause default arguments to be evaluated,         |
  |    provided the arguments are equivalent                       |
  |                                                                |
  | 3) calling functions with internal linkage provided they are   |
  |    equivalent                                                  |
  |                                                                |
  | 4) use of a constant with equivalent value and type            |
  |                                                                |
  |                                                                |
  +----------------------------------------------------------------+


NOTE: so far we have only defined "token equivalence". We have
not specified any rules for C++.

===========================================================================


Token equivalence can only be checked at link time,
and only then if the associated translation unit source code is available.
However, assuming token equivalence, the other restrictions
on inline functions and classes can be checked.

PROPOSAL: INLINE EXTERN FUNCTION RULES
--------------------------------------

IN1)

  +----------------------------------------------------------------+
  |                                                                |
  | An inline extern function shall not refer to a variable        |
  | with internal linkage unless it is const, not volatile,        |
  | has no mutable components, and is initialised by default       |
  | or with a constant initializer.                                |
  |                                                                |
  +----------------------------------------------------------------+
  [Diagnosable]

Comment: This rule relates to conditions TE5 and TE6.
My argument for requiring this rule is that it is in fact
diagnosable and there is no purpose in declaring a function
inline if it is NOT to be defined in more than one
translation unit. Therefore, by assuming an inline definition
will be used in another translation unit, we can diagnose
an important class of potential violations of the ODR.

We do so at the expense of diagnosing an otherwise legitimate
use of an inline function in a single place: however,
if its not to be shared, it could have been given
internal linkage.

Comment: There already exist inline extern functions: static
class functions have external linkage by prescription but
may also be inline.


IN2)

  +----------------------------------------------------------------+
  |                                                                |
  |  By prescription a local static member of                      |
  |  an inline function with external linkage has external linkage.|
  |                                                                |
  +----------------------------------------------------------------+
  [Prescriptive]

This is necessary so that inline extern functions access the
same local statics in all translation units, and the
single static is initialised only once.

By deeming external linkage, the ODR applies to the initialiser.


===========================================================================

PROPOSAL RULES FOR NUMBER (and nature) OF DEFINITIONS (ODR)
-----------------------------------------------------------

These rules describe the One Definition Rule proper.

----------------------------------------------------------------------------
OD1) "No more than one definition per translation unit"

  +----------------------------------------------------------------+
  |                                                                |
  | There shall be no more than one definition of each variable,   |
  | function, named class, named enumeration type or template      |
  | given in a translation unit.                                   |
  |                                                                |
  +----------------------------------------------------------------+
  (Diagnosable)

----------------------------------------------------------------------------
OD2) "You cant change the linkage"

  +----------------------------------------------------------------+
  |                                                                |
  | The definition of a function or variable shall not specify     |
  | different linkage than a declaration it is bound to.           |
  |                                                                |
  +----------------------------------------------------------------+
  (Diagnosable)

Example:
  extern int x;
  static int x; // ill-formed


----------------------------------------------------------------------------
OD3) "You cant change the (return) type"

  +----------------------------------------------------------------+
  |                                                                |
  | The definition of a function shall specify the same return     |
  | type as a declaration it is bound to.                          |
  |                                                                |
  | The definition of a variable shall specify the same type as    |
  | a declaration it is bound to.                                  |
  |                                                                |
  +----------------------------------------------------------------+
  (Diagnosable)

Example:

  extern const int f();
  int f(); // ill-formed
  extern int x;
  float x; // ill-formed

----------------------------------------------------------------------------
OD4) "Functions must be defined in a program if they're used"

  +----------------------------------------------------------------+
  |                                                                |
  | If a function is explicitly declared and                       |
  | (a) a call exists, or                                          |
  | (b) its address is taken, or                                   |
  | (c) it is one of                                               |
  |   an impure virtual member function, or                        |
  |   a pure or impure virtual destructor, or                      |
  |   a member delete operator,                                    |
  | of a class for which a whole object, base subobject,           |
  | member subobject, or array element is constructed,             |
  | then there shall be at least one definition of the function    |
  | in the program,                                                |
  | otherwise a definition for the function need not be given.     |
  |                                                                |
  +----------------------------------------------------------------+
  (Diagnosable)

Comment: In theory, nothing needs to be defined unless its used.
For example, there's no need to define a function whose address
is taken unless the pointer is dereferenced: even use in
comparisons doesnt require a definition.

In practice, supporting such dynamism is not in the spirit of
C++ and we should not grant implementations such freedom to
fail to diagnose potential errors.

Requiring virtual functions to be defined if an object is
instantiated protects programmers: only really smart
implementations could detect whether a virtual function
is called or not.

The reason is the separate compilation and linkage model
and archaic linker technology which C++ is designed to
support.

Had C++ been designed for modern computing, without
compatibility restrictions, there would be no requirement
for the sort of complex equivalence rules presented here.
The issue would not and does not arise in other languages
which allow pre-compiled interfaces to be used directly
and require the environment to provide this support.

For this reason I specify that virtual functions ought
to be defined at the point of construction. The rule is just
the same as the rule preventing abstract classes being
instantiated: there is no reason for that restriction
if the pure virtuals are not called.

The reason for including member delete operators is that
these are called via a virtual destructor, and so act
as if they were virtual functions.

----------------------------------------------------------------------------
OD5) "Variables must be defined in a program if they're used"

  +----------------------------------------------------------------+
  |                                                                |
  |  Exactly one definition in a program shall be given for        |
  |  a non-local variable with static storage class,               |
  |  unless it is ignorable and unused,                            |
  |  in which case no definition is required:                      |
  |  for the purposes of this rule only                            |
  |    (a) A type is ignorable if it a builtin type, or            |
  |    (b) it is an aggregate of ignorable types                   |
  |  and                                                           |
  |     a variable is used if it is used in an expression,         |
  |     other than as an argument of the [sizeof] operator         |
  |     or [typeid] operator.                                      |
  |                                                                |
  +----------------------------------------------------------------+
  (Diagnosable)

Example:

  extern int x;
  struct { int a; } y;
  main() {
    sizeof(x); // no definition required
    typeid(y); // no definition required
  }

Comment:
There is no rule requiring local variables to have a single definition.

----------------------------------------------------------------------------
OD6) "Types must be defined in a translation unit if they're used"

  +----------------------------------------------------------------+
  |                                                                |
  | At least one definition of a class or enumeration shall be     |
  | given in a translation unit if the class or enumeration        |
  | is used in such a way that the size of the type must be known, |
  | a member function called, or a component or enumerator used.   |
  |                                                                |
  | A definition of a type T is not required                       |
  | for any of the following                                       |
  |                                                                |
  | (a) Passing a T by value                                       |
  | (b) declaring a function type                                  |
  | (c) instatiation of a pointer to T or a qualified version      |
  |     thereof, recursively                                       |
  | (d) initialisation of a reference of type T or qualified       |
  |     version thereof or a reference to any pointer type         |
  |     as listed in (c) above                                     |
  | (e) cast to or from void*                                      |
  |                                                                |
  | A definition of a type T is required for any of the following  |
  |                                                                |
  | (a) argument of the sizeof or typeid operator                  |
  | (b) instantiation of an object                                 |
  | (c) call of a member function or access to a data member       |
  | (d) formation of a temporary including return by value         |
  | (e) declaration of a non-static data member                    |
  |                                                                |
  |                                                                |
  +----------------------------------------------------------------+
  (Diagnosable)

  Example:
  struct X;      // declare X is a struct type
  struct X* x1;  // use X in pointer formation
  X* x2;         // use X in pointer formation
  enum E* e;     // use E in pointer formation

Comment: The two lists above are intended to be exhaustive.
They're probably not. Every use should be categoriesed one way or the
other in accordance with the descriptive subclause.

----------------------------------------------------------------------------
OD7) "Duplicate type definitions are OK if they're equivalent"

  +----------------------------------------------------------------+
  |                                                                |
  | There may be more than one definition of a named enumeration,  |
  | named class type, or template in a program provided the        |
  | definitions are token equivalent.                              |
  |                                                                |
  +----------------------------------------------------------------+
  (Not diagnosable)

Comment: This clause is a major goal of the paper.
Unnamed types are covered separately.

----------------------------------------------------------------------------
OD8) "Duplicate function definitions are OK if they're equivalent and inline"

  +----------------------------------------------------------------+
  |                                                                |
  | There may be more than one definition of a function in a       |
  | program provided the definitions are equivalent,               |
  | and provided each definition of the function is explicitly     |
  | marked inline, or an in-class definition is given.             |
  |                                                                |
  +----------------------------------------------------------------+
  (Not diagnosable)

Comment: its important to understand the implied assumption
in this clause. The little word 'a' in "a function" is the key
word in the clause (and the previous one). The implication is that
its a single function we're talking about that may have more than
one definition. Its clear we cant be talking about distinct functions
having more than one definition which must be equivalent:
two functions do not constitute "a function".

The implied requirement is that the functions must have external
linkage because that is the way in which its possible
to require the function names refer to the same function in
different translation units.

The existence of member functions which have internal linkage
and yet must have equivalent definitions is, in my opinion,
inconsistent; by definition such functions are distinct because
they are translation unit local.

No linkage is even more absurd: there is no question that
the integrity of a class across translation units requires
all members to be associated by name, which is the
meaning of external linkage in C++.

The ARM specification of internal linkage for inline members
is based on a misconception and hangover from C.
C does not attribute linkage to types, but C++ does: thats
because in C++ types are name based.

The ARM, however, does not support "no linkage" for any global
names: all names either have internal or external linkage.
Attributing internal linkage to inlines is based on the
idea that separate instances are created in each translation unit:
copies of the function.

The consequence is that no requirement of equivalence follows,
but must be explicitly specified: this is unrelated to
the One Definition Rule.

The problem is that an inconsistency is almost guarranteed if the
address of such instances are taken. This clearly detects
that the definitions are not equivalent, yet they
are required to be.

The addresses of two entities with internal linkage in
different translation units are required to be distinct.

The address of a member function, on the other hand,
is required to be unique: by definition of token
equivalence, taking the address of a member with internal
linkage from any subject to the equivalence requirement
ensures that requirement is broken: in effect,
the address of a member with internal linkage can only
be taken by another function with internal linkage,
and only then if that function is never called by a
function with external linkage, directly or indirectly.

However the ARM is reasonbly clear that the inline
keyword, and inclass definitions, are only hints
to the compiler and do not have such gross effects on
semantics: in particular there is no restriction
on taking the address of an inline function.

I dont believe any alternatives exist other than:

  a) require member functions to have external linkage if the class does
  b) disallow taking the address of inline functions

I vote for (a). Please note again that if member functions have
linkage, then taking their address must have one of
two consequences

  a) External linkage:
     the address is program unique and translation unit independent

  b) Internal linkage:
     the address is distinct in each translation unit.

There is no option that the addresses in (b) may or may not
be distinct: they're required to be by specification of
internal linkage. The only way to prevent such translation unit
locality leaking into the semantics is not to take the
address: specifying (b) effectively prevents most address
taking.

Its possible that non-static member functions can be specified
to have no linkage. After all, conceptually a member is simply
an offset into a structure: linkage is not required to
associate such offsets, structural equivalence is enough,
and token equivalence is a stronger condition.

----------------------------------------------------------------------------
OD9) "Typeinfo tells the type"

  +----------------------------------------------------------------+
  |                                                                |
  | The typeinfo objects referenced by the typeid operator         |
  | shall compare equal for the same variable.                     |
  |                                                                |
  +----------------------------------------------------------------+
  (Requirement on typeid operator)

Comment: Uwe suggests this condition is not strong enough, and that
the references should be to the same typeinfo object. That may
be the case and should be decided by core.

For the purposes of the ODR, such a stricter requirement is not
necessary.

This innocent looking clause is pivotal to the understanding of
the meaning of linkage for types. A type has an address
which can be compared with another type's address, and
these addresses shall be equal if and only if the types
are the same type: this is the behavioural, testable definition
of the phrase "same type". The address of a type is the value
of the typeid operator.

The typeid operator is useless if comparing a single variable
with itself yields an inequality. Yet exactly this is mandatory
if the type of a variable with external linkage has internal
linkage: this situation is untenable, variables with external
linkage must have types with external linkage.


The point of this clause is as follows:

  // file 1
  class T { .. } x;
  X* px = x;
  typeinfo& tx = typeid(px);

  // file 2
  class T { .. };
  extern X x;
  extern typeinfo& tx;
  main()
  {
    return typeid(&x) == tx;
  }

OD9 is unequivocable: the program must return 1.
If it doesnt, then the ODR is violated, though not necessarily
clause OD9: it may be the two definitions of T are not
equivalent.

One purpose of introducing this rule is to be found in the next
clause.

----------------------------------------------------------------------------

OD10) "Unnamed types."

  +----------------------------------------------------------------+
  |                                                                |
  | Two unnamed type definitions denote distinct types;            |
  | unless the tokens of their definitions are in corresponding    |
  | positions in two definitions of the same class or function,    |
  | in which case they denote the same type.                       |
  |                                                                |
  +----------------------------------------------------------------+


Comment: all members of classes with external linkage
have external linkage, including nested types, and including
unnamed types.

The problem of unnamed types in classes is that
instances of these types with external linkage may exist,
and OD9 requires the type of a variable to be unique
in a testable way: how then can an unnamed type have
external linkage?

Sean Corfield and Jerry Schwarz suggested the answer:
the external name of the type is synthesised by assigning
it a unique position number within a class,
function or translation unit.

This is possible for classes and functions because they are
encapsulated and obey the token equivalence rule:
the sequence number of the first token of the definition
would do.

Its not possible to do this for namespaces: two definitions
of an unnamed type with identical token sequence in different
translation units are distinct, no matter what namespace
they are in, because the same trick is used on a per translation
unit basis, and the same UNIQUE translation unit qualifier
used in the external name of entities with external linkage
in the unnamed namespace is used in the external name.

For example:

  // file 1
  struct X {
    enum {zero} z;
  };
  X x;
  typeinfo &tz = typeid(x.z);

  // file 2
  struct X {
    enum {zero} z;
  };
  extern X x;
  extern typeinfo& tz;
  main()
  {
    return tz == typeid(x.z);
  }

The main() function must return 1 or OD9 is violated.
The type of a variable cant be allowed to differ or
inconsistency results.

The alternative of banning unnamed types in classes would
break too much code.


----------------------------------------------------------------------------
PROPOSAL: LINKAGE SPECIFICATIONS
--------------------------------

These specifications are more or less needed for the ODR
as I have characterised it to operate.

LK1) Named members of a class with external linkage have external linkage.
     (Prescriptive)

LK2) Friends of a class with external linkage have external linkage.
     (Prescriptive)

LK3) Non-member functions may also have both external linkage and be inline.
     (Prescriptive)

LK4) Local static variables of functions with
     external linkage have external linkage.
     (Prescriptive)

LK5) Local classes of functions with external linkage have external linkage.
     (Prescriptive)


Alternative.
------------

This alternative is slightly more radical, but also more consistent.
The idea is that all entities with linkage have external linkage.
Each entity with linkage has an external name, if the name
contains the translation unit local UNIQUE name, then the names
can't link across translation units, and we can say the entity
also has internal linkage: which does not mean it doesnt have
external linkage.

I think this is an important simplification. Its a matter
of optimisation alone that the compiler does not need to
put external names with internal linkage into object files,
because the linker could never link them.

In fact, that may not be true, now that we have RTTI.

This alternative attributes linkage to every name, including
"int" and "int*". Therefore, most of the linkage issues
simply evaporate.

Not all linkage issues evaporate, but their form becomes
simpler and more manageable: the issues all have identical
form:

     What is the external name of this entity?

Thats it, no other questions about linkage have any meaning,
since they're all resolved automatically and trivially
by this formulation.

Specifying external names for entities corresponds to existing
practice (mangled names). It also solves a major problem in
the Working Paper on an unrelated issue: what is a name,
and how can we refer to "a name".

In terms of this proposal, the answer is: name lookup
associates a short name with an external name.
The word name means external name usually in the Working Paper,
even though the short name is the one visible in the code
example being described: the binding to the external name
is assumed unless the issue being discussed is the
name lookup algorithm itself.

----------------------------------------------------------------------------

DECLARATION MATCHING
--------------------

Declaration matching rules have two parts. The first part
establishes when two declarations refer to the same entity.
The second part establishes whether the declarations are consistent.

We need some terminology.

Definition: Object Indicator
----------------------------

An "object indicator" is one of:

  a) unqualified
  b) const
  c) volatile
  d) const volatile
  e) constructor
  f) destructor
  g) static

Definition: Family name.
-----------------------

The name of a set of overloaded functions is called the family
name and denotes a single enity sometimes known as a family
of functions.

The name of a class type is also a family name whose members
are the constructors of that type.

The name of a destructor also constitues a family name consisting
of a ~ token followed by a class name: the family consists of
at most one function.

The name of an operator constitues a family name and consists of
several tokens: the keyword "operator" followed by the operator
symbol, by [ ] delete, or by new [ ].

The family name of a user defined conversion operator consist of
the keyword operator and the name of the type: the family
consists of at most one function.

Definition: Short name
----------------------

The short name of an entity is its name as declared and
is the shortest unique name identifying the entity
in its declarative region.


Definition: Full name.
----------------------

The full name of an entity is a program wide unique
name of an entity consisting of the pair of the short name with
the fully qualified namespace name of the innermost
declarative region containing the name's declaration, followed
by the containing function name if necessary, followed by
synthesised names uniquely representing containing blocks.

In the case of nonstatic base class subobject members the full path
in the lattice is given: these entities are quite distinct
from members, just as a base class is not a class.

Comment: the "Full name" is a fiction that enables us to
uniquely identify an entity.

Definition: function signature
------------------------------

The signature of a functions is a finite sequence of
parameter type signatures.

A parameter type signature is an equivalence class of function
parameter declarators. The equivalence relation which
specifies this class is that two function parameter
declarators are equivalent if the unqualified versions ofthe
types they name are the same, or are both the ellipsis (...).

The signature of a function consists of the sequence of parameter
type signatures of a declaration of the function.

Definition: individual function name
------------------------------------

An individual function name is a triple consisting of the family
name of the function, and object inidicator and the inidividual
function signature.

The object indicator of a global function or static member function
is static, wherease that of a non-static member function corresponds
to the cv-qualification of the this pointer, except that
constructors and destructors have the appropriate indicators.


Definition: Function template family name
-----------------------------------------

The name of set of overloaded function templates is called
the family name and denots a single entity sometimes known
as a family of overloaded function templates.

The name of an operator function template constitues a family name
and consist of several token: the keyword operator followed
by the operator symbol, new [ ] or [ ] delete.

Definition: Individual function template name
---------------------------------------------

An individual template function name is a triple consisting of the
family name of the function template, a template formal signature
and a parameterised function signature.

The template formal signature consists of a finite sequence
of formal parameter signatures.

A formal parameter signature is either a parameter type signature
or the keywords class. [Proposed extension: or namespace,
or an inidividual class template name, or some sort of name
denoting functions .. I dont know what is correct here.
To be resolved at San Diego.]

A parameterized function signature is a finite sequence of parameterized
function parameter type signatures. [Gak .. these names are long -:]

A parameterized function parameter type signature is an equivalence
class of parameterized function parameter declarators. The equivalence
relation which specifies this class is that two parameterized function
parameter declarators are equivalent if all substitutions of the
same formall parameters by position yield equivalent parameter
type signatures.

[My fingers hurt. An ascription clause is required here.]

RULE 1 Matching declarations
----------------------------

If two declarations declare a type or class template in the same
declarative region then they declare the same entity if the same
name is declared.

If two declarations declare a function in the same namespace
or class, or in any local scope nested directly or indirectly in a
local scope nesting in the same namespace or class, then
they declare the same entity if they sp[ecify the same individual
function name.

If two declarations declare a variable in the same namnespace or class
or in any local scope nested directly or indirectly in a local scope
nested in the same namespace or class, then they declare the same enitity
if they declare the same name.

If two declarations declare a function template then they declare the
same entity if they specify the same individual function template name.

If a declaration in a class has the same inidividual function name
as a virtual function in a direct or indirect base then
the function overrides the base function.

COMMENT.

These rules cover all declarations, including definitions,
except for template specialisations. This needs to be
resolved at San Diego. Its not possible to match
specialisations in the usual sense, because there is nothing
to match to.

In addition, overriding rules are required to determine
when a virtual function overrides another.

RULE 2: Same linkage
--------------------

If two declarations of a variable, type, or individual
function declare the same entity then the declarations
shall not specify different linkage.

RULE 3: Same return type
------------------------

If two declarations of a variable declare the same entity,
then the type must be named and have the same name in both declarations.

If two declarations of an individual function declare the same entity,
then the return type must be named and have the same name
in both declarations.

If two declarations of a function template declare the same entity,
then the parameterized return types must have the same parameterized
function parameter type signature.

If a virtual member function overrides another, then the
overriding function must have the same type, or
if the overriden function returns a pointer or reference,
it may be a pointer or reference (respectively)
to an accessible base which is not more cv-qualified.


----------------------------------------------------------------------------

Acknowledgments
---------------
Bill Gibbons, Andrew Koenig, Josee Lajore, Bjarne Stroustrup,
Erwin Unruh; Jerry Schwartz, Sean Corfield,
the usual apologies to others I cant remember,
and the usual claim that all the mistakes are mine.