ISO: WG21/N0369 ANSI: 94-0009 Author: John Max Skaller Date: 27 January 1994 Reply to: maxtal@suphys.physics.su.oz.au The "ONE DEFINITION RULE" ------------------------- A "definition" means sequence of tokens parsed as a definition. [Editor to use italics] The "One Definition Rule" is a constraint on the writer of source files that are pre-processed into translation units. "Diagnosable" in this paper means the language processor is required to issue a diagnostic. Parts of the ODR are not diagnosable, allowing the language translator to assume the programmer obeyed the ODR without checking. This means the programmers is responsible to follow the rules of the ODR or accept undefined behaviour. Other parts or associated rules are diagnosable because they can be, and this is a constraint on the language processor to check for a violation and issue a diagnostic. The "category" of an entity in this paper is either (1) namespace, (2) function, (3) variable, (4) type, (5) enumerator, or (6) template, and refers to all the things which can be declared, defined, or constructed and given some form of user defined name in C++. A function, variable, or enumerator has a type. Namespaces dont have types. Classes and typedef names are types. Class templates are parameterised types. Function templates have parameterised types. The "type" of an entity in this paper is its category, and, if it has a type, its type, or, if it is a type, that type. The exact proposal for the ODR is given below. The structure of the proposal is: A) ODR: The ODR specifies how many definitions, and where entities with external linkage may have. The basic rule is: only one definition is permitted except for classes, inline functions, and templates, which may have one per translation unit, provided they're token equivalent. B) EQUIVALENT DEFINITIONS: Token equivalence is defined as an identical sequence of tokens, with each identifier that refers to an entity outside the definition and which does not have external linkage required to be equivalently defined in each translation unit (that is, the specification is recursive). C) LINKAGE SPECIFICATIONS: Everything to which the ODR ought to apply is deemed to have external linkage. This includes all member functions, of a class with external linkage. Inline extern globals are permitted as a side effect. It includes static local variables of functions with external linkage. However, the ODR may cause functions with internal linkage to be constrained to be equivalent, in this case the form of the definition is enhanced by additional constraints:inline extern functions may not refer to static variables with internal linkage. D) TEMPLATE NAME BINDING. The exact rule for name binding of templates are, or should be, specified in the Working Paper on templates. E) DECLARATION MATCHING. When a name binds to an entity with external linkage, the declarations must match (there is no issue of equivalent definitions: if the definitions are duplicated, the ODR applies independently, but we may only see a declaration, and token equivalence does not apply to declarations. F) UNNAMED TYPES. Some types in classes with external linkage are unnamed. Matching rules that work by linkage are required if these are to be supported. Like names in the unnamed namespace, we synthesise a unique name that the user cant actually use: the entities have external linkage. Problem - Functions ------------------- In C, a function may have only one definition in a program. Duplicate definitions can be detected by the linker. If the definition of a function is visible, calling the function can sometimes be optimised by inlining. If a function is to be used in more that one translation unit, then optimisation opportunities are lost if the definition cannot be given in each translation unit. Therefore, C++ allows multiple definitions of functions in a program. It is clearly not the intent of allowing multiple definitions that the definitions define completely different functions and so the question arises what constitutes equivalent definitions in principle, what rules should be specified in the Standard to constrain duplicate definitions, and which of these rules requires a diagnostic be issued if the rule is broken. Problem - Variables ------------------- In C, whole variables may only have one definition in a program. Again, duplicate definitions can be detected by the linker. C++ follows the C rules for ordinary variables, however the advent of templates introduces complications in that the need for instantiation of variables may arise in more than one translation unit. Note there is no issue of "duplicate" definitions here: variables created by instantiating a template do not have a definition in the precise, grammatic sense used in this paper. Problem - Types --------------- In C, types in different translation units are distinct. However, types must be used in declaring functions and variables, and linkage across translation unit boundaries must require some form of compatibility of the types involved so that code generated in a translation unit with access only to a declaration does not corrupt memory. In C, this problem is addressed by introducing the notion of compatible types. These rules allow structurally equivalent declarations to be considered equivalent, and permits structurally equivalent structure definitions in distinct translation units to be associated. [These rules are flawed, since the relation so defined is not an equivalence relation. In particular, its not transitive, since A in translation unit 1 may be structurally equivalent to B in translation unit 2 which is structurally equivalent to C in translation unit 1 which is not structurally equivalent to A. Its easy to show that A is both structurally equivalent to itself and not, a contradiction rendering structural equivalence inconsistent.] C++ has a completely different strategy. Instead of utilising some notion of structural compatibility, C++ requires name equivalence. This operates in two ways: first, the structural equivalence of declarators is the same as for C, but such declarators are considered to be type names. For example: int* is equivalent to int*, not because they are structurally equivalent, but because 'int*' is deemed a name. For this reason, a typedef name is not the name of a type, but the name of an alias. There are rules for name formation that 'factor out' any typedefs, the phrase used to describe names identical after typedefs have been replaced is 'modulo typedefs'. Name identity is exactly a form of equivalance, for example given typedef const int CI; the names const int* int const* CI* CI const* are identical despite being "spelled" differently. Its important to note this case: the rules for equivalence of definitions given below, namely token equivalence, are stricter than the rules for name identity: none of the above name would be accepted as alternatives for the others in token equivalent definitions. Secondly, class types in C++ may have external linkage, so that the use of a type name -- or type name suitably composed from a type name with external linkage -- in two translation units denotes the same type. It is clear that the definitions or declarations of types with external linkage must be compatible in some sense similar to that used by C. The change in orientation to use of names rather than structural equivalence for types is a consequence of a corresponding weakening of the C rules for functions: Not only may several functions have the same name, which is called overloading, but inlining allows duplicate definitions of the same function. Thus the requirements on types are strengthened. Indeed, functions are identified by a combination of their fully qualified name and signature, and the signature constitutes type information. The difficulty with types is not merely a matter of optimisation, however: duplicate definitions of types are mandatory since they are required even for mere declaration of variables and functions. Problem - enumerators and typedefs ---------------------------------- In C, an enumerator is simply a named constant integer value. The enumerator name is always translation unit local. In C++, the status of enumerators is not clear. First, there is no need to speak of equivalence for enumerators: two enumerators either have the same value or they dont, the values are known at compile time. Second, there is no particular reason that enumerators should have external linkage, or in any way require that the same enumerator name have the same value in distinct translation units. However, the token equivalent formulation of the ODR is based on sequences of tokens of declarations and definitions being equal, and identifiers binding to equivalent definitions. Such a formulation requires equivalence to be defined for every category of user defined name, so typedefs and enumerators must be included in the list. Problem - templates ------------------- A function or class template has a definition, and these are considered equivalent or not depending on the same general token equivalence rules as for other definitions. There are a couple of problems with templates. One is that it isnt clear if or when template names have external linkage. If the dont, the ODR doesnt apply to them. The second problem is specifying the linkage of template instances. For classes, the usual rules apply, but for functions that is not adequate. Its necessary to know, because functions with external linkage may not have argument types depending on type with internal linkage, and because the address of a function with internal linkage cannot be the same as that of a function with the same name and signature in another translation unit, whereas the address of a function with external linkage is required to be the same in all translation units. However, there are several additional special, and severe problems with templates. First, its necessary to note that the ODR and the definition of token equivalence is a constraint applied to the user. So there is no issue of generated functions being equivalent in the sense of the ODR: if behavioural equivalence is required for consistency of the language, it must be guarranteed by other means. Typically, a mechanism for specifying the semantics of generated entities is specified by an algorithm which is context sensitive in ways that matter only if the ODR is already broken. For example, the generated copy constructor is defined to copy members and bases, and the behaviour of such a constructor is the same independently of when, where, or how its generated: the ODR is not required and cannot apply. The problem that arises is with specialisations. A specialisation, in some sense, is a deliberate hijacking of the semantics of what would otherwise be a generated definition. More than one specialisation is in error, and the ordinary ODR applies to specialisations with external linkage. But the ODR cant help prevent hijacking, because there is no "definition" of the unspecialised template instance. Problem - Declaration matching ------------------------------ The attributes of declarations in a single translation unit, and the corresponding attributes of associated declarations or definitions need not match exactly. Certain attributes must be specified in each declaration or definition in order that the declarations and definitions be bound together, so as to be considered to name the same entity (in addition to rules about the scopes of the declarations and definitions). These attributes are the fully qualified name in the case of a variable or type, or the name and signature in the case of a function. In addition, other attributes must have non-conflicting specifications if the identifying attributes match. These include any linkage specification, the type of a variable, the return type of a function, and default arguments of prior declaration must be duplicated in subsequent ones. In addition to rules for intra-translation unit consistency, there are rules for consistency across translation unit boundaries. These rules must apply if the declarations or context nominate the same identifying information and specify external linkage. Declaration matching is not covered in this paper. Equivalent Definitions ---------------------- Given that two declarations with external linkage match, any definitions must also be equivalent in some sense to ensure consistency. Two definitions are semantically equivalent if they may be used interchangeably without any detectable behavioural differences: this does not include memory use or execution speed. Unfortunately, this condition is inherently undiagnosable: the problem is equivalent to the halting problem. What we need is to specify a condition that is both mechanically testable and for which any two definitions that compare equivalent are also semantically equivalent. If certain semantically equivalent definitions turn out to not compare equivalent by the mechanical tests, that is merely a consequence of the undecidable nature of semantic equivalence. What we aim for is to cover a large class of common cases. In particular, the intended method of sharing definitions between translation units is by #including header files. The Solution: the "One Definition Rule" ---------------------------------------- When the ODR permits more than one definition of a function, type, or variable in different translation units, the understanding is that the entity has external linkage. Entities with internal linkage are translation unit local, and may never have more than one definition---the issue of multiple definitions and their equivalence can only arise across translation unit boundaries, and then only when the definitions are associated by use of the same external name. In particular, we're talking about multiple definitions of the _same_ entity. In the following, the word "class" generally means "class", "struct" or "union". Templates have external linkage, this permits multiple definitions of templates. The address of a function or variable with external linkage is unique, whereas the addresses of functions with internal linkage are distinct. This is because entities with the same external name are the same entity, and so must have the same address, whereas entities with internal linkage are not associated across translation unit boundaries, and being distinct entities cannot have the same address. In the following, a name "interior" to a definition is one defined in the body of the definition, other entities are "exterior" to the definition. For example, local static variables are interior to the definitions of their containing functions, as are local classes. PROPOSAL: EQUIVALENCE --------------------- For the purpose of the ODR, two definitions or expressions are equivalent if they are token equivalent. The use of "token equivalent" indicate the particular form of equivalence described below, and also indicates the context (since equivalent has many meanings). Two definitions or expressions are said to be "token equivalent" if the following conditions are met. Each rule is introduced by first indicating a problem and then the solution. ---------------------------------------------------------------------------- TE1) "same tokens" Problem: Consider the example // EX1 // file 1 struct X { static int f() { return 1; } }; // file 2 struct X { int a; static int f() { return 2; } }; There are two classes and two functions both with external linkage but the definitions are not semantically equivalent. Solution: The rule +----------------------------------------------------------------+ | | | Equivalent definitions or expressions consist of the same | | sequence of tokens. | | | +----------------------------------------------------------------+ Example: The sequence // EX2 int f() { return 1; } given in two translation units are token equivalent and equivalent. Resolution: EX1 doesn't obey TE1. ---------------------------------------------------------------------------- TE2) "recursive application" Problem: In the two files given below the definitions of f() obey TE1 but are not semantically equivalent: // EX3 // file 1 enum {one}; static int x; int f() { return x+one; } // file 2 enum {two, one}; static float x; int f() { return x+one; } Rule: +----------------------------------------------------------------+ | | | If a token in one definition or expression is bound to a | | single entity with internal linkage outside the definition | | or expression, | | then the corresponding token in the other definition | | shall be bound to a single entity with the same type | | outside the definition, and the definitions of these two | | entities shall be equivalent; | | | | except that in the case of an enumerator the definition | | of the enumeration types shall be equivalent and the | | values of the enumerators equal. | | | +----------------------------------------------------------------+ Resolution: The f() in EX3 does not obey TE2. Comment: If "no linkage" is introduced, then the above rule must be modified to say "internal or no linkage". Comment: External linkage is not covered by this rule. That is because there need not be a visible definition of an entitiy with external linkage, and because the ODR will independently require equivalence of any definitions for that name anyhow. ---------------------------------------------------------------------------- TE3) "identity of externals" Problem: In the example the definitions of f() obey TE1-2 but are not equivalent. //EX4 // file 1 namespace X { int g() { return 1; } } using namespace X; int f() { return g(); } // file 2 namespace Y { int g() { return 1; } } using namespace Y; int f() { return g(); } Rule: +----------------------------------------------------------------+ | | | If a token in one definition or expression is bound to an | | entity with external linkage, then the corresponding token | | in the other definition or expression | | shall be bound to the same entity. | | | +----------------------------------------------------------------+ Resolution: The f() in EX4 a bind to two distinct g() so EX4 doesn't obey TE3. ---------------------------------------------------------------------------- TE4) "implicitly evaluated default arguments are equivalent" Problem. The f() in EX5 below does obey TE2 and are not equivalent: // EX5 // file 1 extern int g(int x = 1); int f() {return g(); } // file 2 extern int g(int x = 2); int f() {return g(); } Rule: +----------------------------------------------------------------+ | | | If corresponding function calls in equivalent definitions | | or expressions | | cause a default argument to be evaluated the argument | | expressions shall be equivalent. | | | +----------------------------------------------------------------+ Resolution: f() in EX5 does not obey TE3. ---------------------------------------------------------------------------- TE5) "No address taking of internal linkage" Problem: The f() in EX6 are not equivalent: //EX6 // file 1 static int x; int *f() {return &x; } // file 2 static int x; int *f() {return &x; } Rule: +----------------------------------------------------------------+ | | | The address of a variable or function with internal linkage, | | or local reference bound to one, shall not be taken in two | | equivalent definitions or expressions. | | | +----------------------------------------------------------------+ Comment: The reason I permit local references bound to variables with internal linkage is explained after the next rule. ---------------------------------------------------------------------------- TE6) "Limited use of variables with internal linkage" Problem: f() //EX7 // file 1 static int x; const int y = 1; int f() {return x; } // true // file 2 static int x; int f() {return x; } // false +----------------------------------------------------------------+ | | | In two equivalent definitions or expressions | | a variable with internal linkage or a reference bound to one | | shall not be used in an expression, or be passed to a function,| | unless it is const and not volatile and has no mutable | | components and has equivalent initializers | | and has a type with external linkage and any default arguments | | of the constructor are equivalent. | | | +----------------------------------------------------------------+ For example: //EX8: equivalent const int x = 1; int f() {return 1;} //EX9: inequivalent static int x =1; int f() {return 1;} Comment: The explanation of TE5 and TE6 is as follows. Taking the address of a variable with internal linkage yields a pointer which may be dynamically manipulated so that its use cannot be easily detected, so we do not permit the address taking itself. Similarly, passing a variable with internal linkage by reference to a function allows that function to take the address. Disallowing that permits us to assume that any parameters of the function have external linkage, and so taking the address of a parameter is allowed. However, binding a local reference to an entity with internal linkage is permitted, because its known when this occurs statically, and the use of the reference itself checked, as if it were a variable with internal linkage. In-class references must be treated as parameters of functions because they're bound by mem-initialisers. The set of conditions that form the exception to the rule may be described as "constants". Provided the constants have equivalent initialisers and cant be changed in any way, they have the same value in both translation units, and so there is no reason to prohibit their use. There is, however, every reason to allow them: // EX10 const int n = 10; int f() { return n; } is sure to be common usage. The same applies to inline functions: functions are inherently "const". So there is no reason to exclude calling a function with internal linkage. But there is every reason to allow them: // EX11 inline int max(int x, int y) { return x>y ? x : y; } int f(int x, int y) { return max(x,y); } is sure to be common usage. ---------------------------------------------------------------------------- TE7) "Equivalence of enumerations" +----------------------------------------------------------------+ | | | If an enumerator is used, in equivalent defintions or | | expressions, then the enumeration types shall be equivalent. | | | +----------------------------------------------------------------+ Comment: Enumerators dont have "definitions", only the enumeration types in which they are assigned a value. Requiring the definitions to be equivalent is necessary for two reasons: you can do overload resolution based on an enumerator argument, so the type is important, and, the size of the enumerator itself is determined only by the definition enumerator. ---------------------------------------------------------------------------- TE8) +----------------------------------------------------------------+ | | | The following are explicitly permitted in equivalent | | definitions or expressions | | | | 1) any use of a variable, function or type with external | | linkage, | | since these independently obey the ODR | | and the definitions will thus be equivalent | | | | 2) calls that cause default arguments to be evaluated, | | provided the arguments are equivalent | | | | 3) calling functions with internal linkage provided they are | | equivalent | | | | 4) use of a constant with equivalent value and type | | | | | +----------------------------------------------------------------+ NOTE: so far we have only defined "token equivalence". We have not specified any rules for C++. =========================================================================== Token equivalence can only be checked at link time, and only then if the associated translation unit source code is available. However, assuming token equivalence, the other restrictions on inline functions and classes can be checked. PROPOSAL: INLINE EXTERN FUNCTION RULES -------------------------------------- IN1) +----------------------------------------------------------------+ | | | An inline extern function shall not refer to a variable | | with internal linkage unless it is const, not volatile, | | has no mutable components, and is initialised by default | | or with a constant initializer. | | | +----------------------------------------------------------------+ [Diagnosable] Comment: This rule relates to conditions TE5 and TE6. My argument for requiring this rule is that it is in fact diagnosable and there is no purpose in declaring a function inline if it is NOT to be defined in more than one translation unit. Therefore, by assuming an inline definition will be used in another translation unit, we can diagnose an important class of potential violations of the ODR. We do so at the expense of diagnosing an otherwise legitimate use of an inline function in a single place: however, if its not to be shared, it could have been given internal linkage. Comment: There already exist inline extern functions: static class functions have external linkage by prescription but may also be inline. IN2) +----------------------------------------------------------------+ | | | By prescription a local static member of | | an inline function with external linkage has external linkage.| | | +----------------------------------------------------------------+ [Prescriptive] This is necessary so that inline extern functions access the same local statics in all translation units, and the single static is initialised only once. By deeming external linkage, the ODR applies to the initialiser. =========================================================================== PROPOSAL RULES FOR NUMBER (and nature) OF DEFINITIONS (ODR) ----------------------------------------------------------- These rules describe the One Definition Rule proper. ---------------------------------------------------------------------------- OD1) "No more than one definition per translation unit" +----------------------------------------------------------------+ | | | There shall be no more than one definition of each variable, | | function, named class, named enumeration type or template | | given in a translation unit. | | | +----------------------------------------------------------------+ (Diagnosable) ---------------------------------------------------------------------------- OD2) "You cant change the linkage" +----------------------------------------------------------------+ | | | The definition of a function or variable shall not specify | | different linkage than a declaration it is bound to. | | | +----------------------------------------------------------------+ (Diagnosable) Example: extern int x; static int x; // ill-formed ---------------------------------------------------------------------------- OD3) "You cant change the (return) type" +----------------------------------------------------------------+ | | | The definition of a function shall specify the same return | | type as a declaration it is bound to. | | | | The definition of a variable shall specify the same type as | | a declaration it is bound to. | | | +----------------------------------------------------------------+ (Diagnosable) Example: extern const int f(); int f(); // ill-formed extern int x; float x; // ill-formed ---------------------------------------------------------------------------- OD4) "Functions must be defined in a program if they're used" +----------------------------------------------------------------+ | | | If a function is explicitly declared and | | (a) a call exists, or | | (b) its address is taken, or | | (c) it is one of | | an impure virtual member function, or | | a pure or impure virtual destructor, or | | a member delete operator, | | of a class for which a whole object, base subobject, | | member subobject, or array element is constructed, | | then there shall be at least one definition of the function | | in the program, | | otherwise a definition for the function need not be given. | | | +----------------------------------------------------------------+ (Diagnosable) Comment: In theory, nothing needs to be defined unless its used. For example, there's no need to define a function whose address is taken unless the pointer is dereferenced: even use in comparisons doesnt require a definition. In practice, supporting such dynamism is not in the spirit of C++ and we should not grant implementations such freedom to fail to diagnose potential errors. Requiring virtual functions to be defined if an object is instantiated protects programmers: only really smart implementations could detect whether a virtual function is called or not. The reason is the separate compilation and linkage model and archaic linker technology which C++ is designed to support. Had C++ been designed for modern computing, without compatibility restrictions, there would be no requirement for the sort of complex equivalence rules presented here. The issue would not and does not arise in other languages which allow pre-compiled interfaces to be used directly and require the environment to provide this support. For this reason I specify that virtual functions ought to be defined at the point of construction. The rule is just the same as the rule preventing abstract classes being instantiated: there is no reason for that restriction if the pure virtuals are not called. The reason for including member delete operators is that these are called via a virtual destructor, and so act as if they were virtual functions. ---------------------------------------------------------------------------- OD5) "Variables must be defined in a program if they're used" +----------------------------------------------------------------+ | | | Exactly one definition in a program shall be given for | | a non-local variable with static storage class, | | unless it is ignorable and unused, | | in which case no definition is required: | | for the purposes of this rule only | | (a) A type is ignorable if it a builtin type, or | | (b) it is an aggregate of ignorable types | | and | | a variable is used if it is used in an expression, | | other than as an argument of the [sizeof] operator | | or [typeid] operator. | | | +----------------------------------------------------------------+ (Diagnosable) Example: extern int x; struct { int a; } y; main() { sizeof(x); // no definition required typeid(y); // no definition required } Comment: There is no rule requiring local variables to have a single definition. ---------------------------------------------------------------------------- OD6) "Types must be defined in a translation unit if they're used" +----------------------------------------------------------------+ | | | At least one definition of a class or enumeration shall be | | given in a translation unit if the class or enumeration | | is used in such a way that the size of the type must be known, | | a member function called, or a component or enumerator used. | | | | A definition of a type T is not required | | for any of the following | | | | (a) Passing a T by value | | (b) declaring a function type | | (c) instatiation of a pointer to T or a qualified version | | thereof, recursively | | (d) initialisation of a reference of type T or qualified | | version thereof or a reference to any pointer type | | as listed in (c) above | | (e) cast to or from void* | | | | A definition of a type T is required for any of the following | | | | (a) argument of the sizeof or typeid operator | | (b) instantiation of an object | | (c) call of a member function or access to a data member | | (d) formation of a temporary including return by value | | (e) declaration of a non-static data member | | | | | +----------------------------------------------------------------+ (Diagnosable) Example: struct X; // declare X is a struct type struct X* x1; // use X in pointer formation X* x2; // use X in pointer formation enum E* e; // use E in pointer formation Comment: The two lists above are intended to be exhaustive. They're probably not. Every use should be categoriesed one way or the other in accordance with the descriptive subclause. ---------------------------------------------------------------------------- OD7) "Duplicate type definitions are OK if they're equivalent" +----------------------------------------------------------------+ | | | There may be more than one definition of a named enumeration, | | named class type, or template in a program provided the | | definitions are token equivalent. | | | +----------------------------------------------------------------+ (Not diagnosable) Comment: This clause is a major goal of the paper. Unnamed types are covered separately. ---------------------------------------------------------------------------- OD8) "Duplicate function definitions are OK if they're equivalent and inline" +----------------------------------------------------------------+ | | | There may be more than one definition of a function in a | | program provided the definitions are equivalent, | | and provided each definition of the function is explicitly | | marked inline, or an in-class definition is given. | | | +----------------------------------------------------------------+ (Not diagnosable) Comment: its important to understand the implied assumption in this clause. The little word 'a' in "a function" is the key word in the clause (and the previous one). The implication is that its a single function we're talking about that may have more than one definition. Its clear we cant be talking about distinct functions having more than one definition which must be equivalent: two functions do not constitute "a function". The implied requirement is that the functions must have external linkage because that is the way in which its possible to require the function names refer to the same function in different translation units. The existence of member functions which have internal linkage and yet must have equivalent definitions is, in my opinion, inconsistent; by definition such functions are distinct because they are translation unit local. No linkage is even more absurd: there is no question that the integrity of a class across translation units requires all members to be associated by name, which is the meaning of external linkage in C++. The ARM specification of internal linkage for inline members is based on a misconception and hangover from C. C does not attribute linkage to types, but C++ does: thats because in C++ types are name based. The ARM, however, does not support "no linkage" for any global names: all names either have internal or external linkage. Attributing internal linkage to inlines is based on the idea that separate instances are created in each translation unit: copies of the function. The consequence is that no requirement of equivalence follows, but must be explicitly specified: this is unrelated to the One Definition Rule. The problem is that an inconsistency is almost guarranteed if the address of such instances are taken. This clearly detects that the definitions are not equivalent, yet they are required to be. The addresses of two entities with internal linkage in different translation units are required to be distinct. The address of a member function, on the other hand, is required to be unique: by definition of token equivalence, taking the address of a member with internal linkage from any subject to the equivalence requirement ensures that requirement is broken: in effect, the address of a member with internal linkage can only be taken by another function with internal linkage, and only then if that function is never called by a function with external linkage, directly or indirectly. However the ARM is reasonbly clear that the inline keyword, and inclass definitions, are only hints to the compiler and do not have such gross effects on semantics: in particular there is no restriction on taking the address of an inline function. I dont believe any alternatives exist other than: a) require member functions to have external linkage if the class does b) disallow taking the address of inline functions I vote for (a). Please note again that if member functions have linkage, then taking their address must have one of two consequences a) External linkage: the address is program unique and translation unit independent b) Internal linkage: the address is distinct in each translation unit. There is no option that the addresses in (b) may or may not be distinct: they're required to be by specification of internal linkage. The only way to prevent such translation unit locality leaking into the semantics is not to take the address: specifying (b) effectively prevents most address taking. Its possible that non-static member functions can be specified to have no linkage. After all, conceptually a member is simply an offset into a structure: linkage is not required to associate such offsets, structural equivalence is enough, and token equivalence is a stronger condition. ---------------------------------------------------------------------------- OD9) "Typeinfo tells the type" +----------------------------------------------------------------+ | | | The typeinfo objects referenced by the typeid operator | | shall compare equal for the same variable. | | | +----------------------------------------------------------------+ (Requirement on typeid operator) Comment: Uwe suggests this condition is not strong enough, and that the references should be to the same typeinfo object. That may be the case and should be decided by core. For the purposes of the ODR, such a stricter requirement is not necessary. This innocent looking clause is pivotal to the understanding of the meaning of linkage for types. A type has an address which can be compared with another type's address, and these addresses shall be equal if and only if the types are the same type: this is the behavioural, testable definition of the phrase "same type". The address of a type is the value of the typeid operator. The typeid operator is useless if comparing a single variable with itself yields an inequality. Yet exactly this is mandatory if the type of a variable with external linkage has internal linkage: this situation is untenable, variables with external linkage must have types with external linkage. The point of this clause is as follows: // file 1 class T { .. } x; X* px = x; typeinfo& tx = typeid(px); // file 2 class T { .. }; extern X x; extern typeinfo& tx; main() { return typeid(&x) == tx; } OD9 is unequivocable: the program must return 1. If it doesnt, then the ODR is violated, though not necessarily clause OD9: it may be the two definitions of T are not equivalent. One purpose of introducing this rule is to be found in the next clause. ---------------------------------------------------------------------------- OD10) "Unnamed types." +----------------------------------------------------------------+ | | | Two unnamed type definitions denote distinct types; | | unless the tokens of their definitions are in corresponding | | positions in two definitions of the same class or function, | | in which case they denote the same type. | | | +----------------------------------------------------------------+ Comment: all members of classes with external linkage have external linkage, including nested types, and including unnamed types. The problem of unnamed types in classes is that instances of these types with external linkage may exist, and OD9 requires the type of a variable to be unique in a testable way: how then can an unnamed type have external linkage? Sean Corfield and Jerry Schwarz suggested the answer: the external name of the type is synthesised by assigning it a unique position number within a class, function or translation unit. This is possible for classes and functions because they are encapsulated and obey the token equivalence rule: the sequence number of the first token of the definition would do. Its not possible to do this for namespaces: two definitions of an unnamed type with identical token sequence in different translation units are distinct, no matter what namespace they are in, because the same trick is used on a per translation unit basis, and the same UNIQUE translation unit qualifier used in the external name of entities with external linkage in the unnamed namespace is used in the external name. For example: // file 1 struct X { enum {zero} z; }; X x; typeinfo &tz = typeid(x.z); // file 2 struct X { enum {zero} z; }; extern X x; extern typeinfo& tz; main() { return tz == typeid(x.z); } The main() function must return 1 or OD9 is violated. The type of a variable cant be allowed to differ or inconsistency results. The alternative of banning unnamed types in classes would break too much code. ---------------------------------------------------------------------------- PROPOSAL: LINKAGE SPECIFICATIONS -------------------------------- These specifications are more or less needed for the ODR as I have characterised it to operate. LK1) Named members of a class with external linkage have external linkage. (Prescriptive) LK2) Friends of a class with external linkage have external linkage. (Prescriptive) LK3) Non-member functions may also have both external linkage and be inline. (Prescriptive) LK4) Local static variables of functions with external linkage have external linkage. (Prescriptive) LK5) Local classes of functions with external linkage have external linkage. (Prescriptive) Alternative. ------------ This alternative is slightly more radical, but also more consistent. The idea is that all entities with linkage have external linkage. Each entity with linkage has an external name, if the name contains the translation unit local UNIQUE name, then the names can't link across translation units, and we can say the entity also has internal linkage: which does not mean it doesnt have external linkage. I think this is an important simplification. Its a matter of optimisation alone that the compiler does not need to put external names with internal linkage into object files, because the linker could never link them. In fact, that may not be true, now that we have RTTI. This alternative attributes linkage to every name, including "int" and "int*". Therefore, most of the linkage issues simply evaporate. Not all linkage issues evaporate, but their form becomes simpler and more manageable: the issues all have identical form: What is the external name of this entity? Thats it, no other questions about linkage have any meaning, since they're all resolved automatically and trivially by this formulation. Specifying external names for entities corresponds to existing practice (mangled names). It also solves a major problem in the Working Paper on an unrelated issue: what is a name, and how can we refer to "a name". In terms of this proposal, the answer is: name lookup associates a short name with an external name. The word name means external name usually in the Working Paper, even though the short name is the one visible in the code example being described: the binding to the external name is assumed unless the issue being discussed is the name lookup algorithm itself. ---------------------------------------------------------------------------- DECLARATION MATCHING -------------------- Declaration matching rules have two parts. The first part establishes when two declarations refer to the same entity. The second part establishes whether the declarations are consistent. We need some terminology. Definition: Object Indicator ---------------------------- An "object indicator" is one of: a) unqualified b) const c) volatile d) const volatile e) constructor f) destructor g) static Definition: Family name. ----------------------- The name of a set of overloaded functions is called the family name and denotes a single enity sometimes known as a family of functions. The name of a class type is also a family name whose members are the constructors of that type. The name of a destructor also constitues a family name consisting of a ~ token followed by a class name: the family consists of at most one function. The name of an operator constitues a family name and consists of several tokens: the keyword "operator" followed by the operator symbol, by [ ] delete, or by new [ ]. The family name of a user defined conversion operator consist of the keyword operator and the name of the type: the family consists of at most one function. Definition: Short name ---------------------- The short name of an entity is its name as declared and is the shortest unique name identifying the entity in its declarative region. Definition: Full name. ---------------------- The full name of an entity is a program wide unique name of an entity consisting of the pair of the short name with the fully qualified namespace name of the innermost declarative region containing the name's declaration, followed by the containing function name if necessary, followed by synthesised names uniquely representing containing blocks. In the case of nonstatic base class subobject members the full path in the lattice is given: these entities are quite distinct from members, just as a base class is not a class. Comment: the "Full name" is a fiction that enables us to uniquely identify an entity. Definition: function signature ------------------------------ The signature of a functions is a finite sequence of parameter type signatures. A parameter type signature is an equivalence class of function parameter declarators. The equivalence relation which specifies this class is that two function parameter declarators are equivalent if the unqualified versions ofthe types they name are the same, or are both the ellipsis (...). The signature of a function consists of the sequence of parameter type signatures of a declaration of the function. Definition: individual function name ------------------------------------ An individual function name is a triple consisting of the family name of the function, and object inidicator and the inidividual function signature. The object indicator of a global function or static member function is static, wherease that of a non-static member function corresponds to the cv-qualification of the this pointer, except that constructors and destructors have the appropriate indicators. Definition: Function template family name ----------------------------------------- The name of set of overloaded function templates is called the family name and denots a single entity sometimes known as a family of overloaded function templates. The name of an operator function template constitues a family name and consist of several token: the keyword operator followed by the operator symbol, new [ ] or [ ] delete. Definition: Individual function template name --------------------------------------------- An individual template function name is a triple consisting of the family name of the function template, a template formal signature and a parameterised function signature. The template formal signature consists of a finite sequence of formal parameter signatures. A formal parameter signature is either a parameter type signature or the keywords class. [Proposed extension: or namespace, or an inidividual class template name, or some sort of name denoting functions .. I dont know what is correct here. To be resolved at San Diego.] A parameterized function signature is a finite sequence of parameterized function parameter type signatures. [Gak .. these names are long -:] A parameterized function parameter type signature is an equivalence class of parameterized function parameter declarators. The equivalence relation which specifies this class is that two parameterized function parameter declarators are equivalent if all substitutions of the same formall parameters by position yield equivalent parameter type signatures. [My fingers hurt. An ascription clause is required here.] RULE 1 Matching declarations ---------------------------- If two declarations declare a type or class template in the same declarative region then they declare the same entity if the same name is declared. If two declarations declare a function in the same namespace or class, or in any local scope nested directly or indirectly in a local scope nesting in the same namespace or class, then they declare the same entity if they sp[ecify the same individual function name. If two declarations declare a variable in the same namnespace or class or in any local scope nested directly or indirectly in a local scope nested in the same namespace or class, then they declare the same enitity if they declare the same name. If two declarations declare a function template then they declare the same entity if they specify the same individual function template name. If a declaration in a class has the same inidividual function name as a virtual function in a direct or indirect base then the function overrides the base function. COMMENT. These rules cover all declarations, including definitions, except for template specialisations. This needs to be resolved at San Diego. Its not possible to match specialisations in the usual sense, because there is nothing to match to. In addition, overriding rules are required to determine when a virtual function overrides another. RULE 2: Same linkage -------------------- If two declarations of a variable, type, or individual function declare the same entity then the declarations shall not specify different linkage. RULE 3: Same return type ------------------------ If two declarations of a variable declare the same entity, then the type must be named and have the same name in both declarations. If two declarations of an individual function declare the same entity, then the return type must be named and have the same name in both declarations. If two declarations of a function template declare the same entity, then the parameterized return types must have the same parameterized function parameter type signature. If a virtual member function overrides another, then the overriding function must have the same type, or if the overriden function returns a pointer or reference, it may be a pointer or reference (respectively) to an accessible base which is not more cv-qualified. ---------------------------------------------------------------------------- Acknowledgments --------------- Bill Gibbons, Andrew Koenig, Josee Lajore, Bjarne Stroustrup, Erwin Unruh; Jerry Schwartz, Sean Corfield, the usual apologies to others I cant remember, and the usual claim that all the mistakes are mine.