Doc. no.   N2230=07-0090
Date:        2007-04-29
Project:     Programming Language C++
Reply to:   Beman Dawes <bdawes@acm.org>

POD's Revisited; Resolving Core Issue 568 (Revision 3)

Introduction
Summary of proposed changes
Features and benefits of POD types
Motivating examples
    std::pair example
    Endian example
    Two structs example
Coupling between POD's and aggregates
Rationale for changes
Proposed changes to the Working Paper
Open issues
Impact on existing code
Impact on existing ABI's
Interactions with other proposals
Revision history
Acknowledgements
References

Introduction

This paper proposes a resolution for Core Issue 568, Definition of POD is too strict, submitted by Matt Austern.

POD's as defined in the current version of the standard have several problems:

Overly strict requirements. This forces users to make unwise design choices, such as reliance on undefined behavior. See Motivating examples.
Coupling between POD's and aggregates. The current definition of POD depends by reference on the definition of aggregate, causing several difficulties. See Coupling between POD's and aggregates.
Coupling between trivial special member function requirements and layout requirements. The Standard describes requirements as POD or non-POD in many places where trivial construction, copy, assignment, and destruction is the only actual requirement. In a few places, POD or non-POD requirements are used to specify layout requirements Separating trivial special member requirements from layout requirements would result in a cleaner specification.

Summary of proposed changes

POD's are now defined in terms of two new categories of types; trivial types and standard-layout types.
The definition of these types no longer depends on the definition of aggregate.
POD's and trivial classes are now allowed to have constructors, as long as trivial default constructor, copy constructor, copy assignment, and destructor are available.
POD's, trivial types, and standard-layout types are now allowed to have base classes. The base classes are not allowed to have virtual members, or virtual bases, but one base may have non-static data members if the most-derived class doesn't have non-static data members
POD's and standard-layout types are now allowed to have access control. All non-static data members must have the same access control.
Most uses of POD in the WP were found to actually be concerned with only trivial or standard-layout types. Such uses have been changed accordingly.

Features and benefits of POD types

*Features*	Benefits
Byte copyable guarantees [3.9 ¶2-3, basic.types]	Programs can safely apply coding optimizations, particularly `std::memcpy`.
C layout-compatibility guarantees, byte copyable guarantees [9.2 ¶14-17, class.mem], initialization rules.	C++ programs can interoperate with functions written in C and other languages. C++ programs can, after considering compiler, alignment, and data type constraints, perform binary I/O such that files to interoperate with other languages and platforms. C language compatibility.
Static initialization guarantees [3.6.2, basic.start.init]	Programs can avoid order-of-initialization issues. Multi-threaded programs can avoid data races during initialization.
Are aggregates	Brace-enclosed initializer lists allowed.
Various rules for non-POD's	Compilers apply data layout optimizations to non-POD's. Compilers assume non-aliasing, allowing code generation optimizations for non-POD's.

If a program has two arrays of type std::pair<int,int>, then it is natural to expect that memcpy(A2,A1,sizeof(A2)) would be safe. Programmers have trouble imagining any implementation in which a byte-for-byte copy of std::pair<int,int> wouldn't do the right thing. Unfortunately, that's not what the language standard says. It says that byte-for-byte copies are guaranteed to work only for PODs. std::pair<T,U> isn't a class aggregate, since it has a user-defined constructor, and that means it also isn't a POD.

std::pair has a user-defined constructor essentially for syntactic reasons: because in some cases it looks nicer to write "std::pair<int,int> p(1,2);" than to write "std::pair<int,int> p = {1,2};". It seems a shame that this syntactic change caused the loss of the important semantic property of PODness. It's especially a shame because it means something formally doesn't work when on all real-world implementations it actually does work. It also encourages programmers to rely on undefined behavior, which is something the standard should not encourage.

With the proposed resolution, the std::pair<int,int> example is still not a POD because its default constructor has effects. With the proposal in place, however, it becomes possible to turn std::pair into a POD by removing the default constructor's effects. To avoid breaking existing code, that can be done under control of an additional template parameter. The intent is to propose such an addition to the LWG as the core language POD proposal moves forward.

Endian example

Beman Dawes provided this example:

Here is an example of something in development for Boost, based on classes used in industrial applications for many years. The fact that it is a template partial specialization isn't material to this discussion and can be ignored.

template <typename T, std::size_t n_bits>
class endian< big, T, n_bits, unaligned > : cover_operators< endian< big, T, n_bits >, T >
{
  BOOST_STATIC_ASSERT( (n_bits/8)*8 == n_bits );
public:
  typedef T value_type;
  endian() {}
  endian(T i) { detail::store_big_endian<T, n_bits/8>(bytes, i); }
  operator T() const { return detail::load_big_endian<T, n_bits/8>(bytes); }
private:
  char bytes[n_bits/8];
};

But it isn't a POD, so it won't work at all in unions and uses such as binary I/O rely on undefined behavior. Since the primary rationale for the existence of endian is to do binary I/O, forcing the user to rely on undefined behavior is unfortunate to say the least.

Here is what would have to be done to make it a POD:

Remove the constructors. But that makes initialization painful, so boosters are proposing to add an ugly and unintuitive static init function, and an operator= from the value_type. Those are partial workarounds, but not really what the designers, Beman Dawes and Darin Adler, wanted.

Make the data member public. But this encourages a poor design practice.

Eliminate the base class. But the only way to do that without the highly error-prone duplication of the functions provided by the base class is to introduce a lengthy macro. Enough said.

In other words, making this class a POD under current language rules would do serious damage to interface ease-of-use and to code quality, and would encourage poor design practices. Yet the only data member in the class is an array of char, so programmers intuitively expect the class to be memcpyable and binary I/O-able.

With the proposed resolution, the class can be made into a POD by making the default constructor trivial (with N2210 the syntax would be endian()=default), resolving all the issues.

Two structs example

Matt Austern provided this example in Core DR 568:

It’s silly for the standard to make layout and memcpy guarantees for this class:

struct A {
  int n;
};

but not for this one:

struct B {
  int n;
  B(n_) : n(n_) { }
};

With either A or B, it ought to be possible to save an array of those objects to disk with a single call to Unix’s write(2) system call or the equivalent. At present the standard says that it’s legal for A but not B, and there isn’t any good reason for that distinction.

With the proposed resolution, the class can be easily changed (by adding B()=default) to become a POD, solving all the issues.

Coupling between POD's and aggregates

POD's provide object representation guarantees, layout-compatibility guarantees, memory contiguity guarantees, and memory copy-ability guarantees for fairly simple types, yet leave compilers much latitude in such matters for more complicated types.

Aggregates provide well-defined initialization from initializer-clauses.

The two concepts are at most tangential, if not completely orthogonal. Thus to define POD in terms of aggregates creates an unnecessary and confusing dependency. It makes otherwise straightforward changes to the Standard POD and aggregate sections much more difficult because of the need to analyze a potential change for impact on both POD's and aggregates. The coupling is confusing to users, causing them to make mistaken assumptions about POD's. The coupling may be part of the reason even committee members cannot accurately remember the full rules for POD-ness.

Rationale for changes

The proposed changes decompose the current POD requirements into trivial type requirements and standard-layout type requirements, and remove the dependency on the definition of aggregates. Because these decomposed requirements are somewhat less restrictive than the requirements for aggregates, the effect is to make POD's more broadly useful and solve the problems identified in the Introduction and Motivating examples. It also opens up the possibility of designing useful classes that meet one or the other, but not both, of the new trivial and standard-layout requirements.

As a consequence of allowing members of any access control in standard-layout types, the current requirement that POD data members have no intervening access-specifiers is changed to require only that such data members have the same access control. This change is believed to also be more in line with programmer expectations than the current requirements.

Changes are not proposed that would allow POD's to have base classes with non-static data members. There was no apparent way to allow these cases without putting undue restrictions on how compilers allocate base class data in relation to derived class data. Note: this may be contentious; some committee members would like to allow one base class with non-static data members.

Proposed changes to the Working Paper

Added text is shown in green and underlined. Deleted text is shown in ~~red with strikethrough~~.

Commentary is shown in italics. Commentary is not part of the proposed WP changes.

Since issue 538 is currently in review status, changes to clause 9 paragraph 4 are shown relative to 538's proposed wording.

The following table lists all uses of POD, and related topics, in the current working paper, with proposed changes. Because the change to clause 9, paragraph 4,is critical to understanding the other changes, it is presented first.

Working Paper Text

9 ¶4 Classes [class]

A structure is a class defined with the class-key struct; its members and base classes (clause 10) are public by default
(clause 11). A union is a class defined with the class-key union; it holds only one data member at a time (9.5). [Note: aggregates of class type are described in 8.5.1. —end note]

A trivial-class is a class that:

— has a trivial default constructor (12.1), and
— a trivial copy constructor (12.8), and
— a trivial copy assignment operator (13.5.3, 12.8), and
— a trivial destructor (12.4).

[Note: In particular, it excludes virtual functions and virtual base classes. --end note]

A standard-layout-class is a class that:

— has no non-static data members of type non-standard-layout-class (or array of such types) or reference, and
— has no virtual functions (10.3) and no virtual base classes (10.1), and
— has the same access control (clause 11) for all non-static data members, and
— has no non-standard-layout base classes, and
— either has no non-static data member in the most-derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
— has no base classes of the same type as the first non-static data member. ^[footnote]

^[footnote]This ensures that two subobjects that have the same class type and that belong to the same most-derived object are not allocated at the same address ([expr.eq]).

A standard-layout-struct is a standard-layout class defined with the class-key struct or the class-key class. A standard-layout-union is a standard-layout class defined with the class-key union.

[Note: Standard-layout classes are useful for communicating with code written in other programming languages. The layout is specified in 9.2. -- end note]

A POD-struct is ~~an aggregate~~ a class that is both a trivial class and standard-layout class, and has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference~~, and has no user-declared copy assignment operator and no user-declared destructor~~. Similarly, a POD-union is ~~an aggregate~~ a union that is both a trivial class and standard-layout class, and has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference~~, and has no user-declared copy assignment operator and no user-declared destructor~~. A POD class is a class that is either a POD-struct or a POD-union.

[Example:

struct N { // neither trivial nor standard-layout
    int i;
    int j;
    virtual ~N();
};

struct T { // trivial but not standard-layout
    int i;
private: 
    int j;
};

struct SL { // standard-layout but not trivial
    int i;
    int j;
    ~SL();
};

struct POD { // both trivial and standard-layout
    int i;
    int j;
};

-- end example]

1.8 ¶5 The C++ object model [intro.object]

Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size. An object of ~~POD type (3.9)~~ trivial or standard-layout (clause 3.9) type shall occupy contiguous bytes of storage.

3.6.2 ¶1 Initialization of non-local objects [basic.start.init]

Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place. A reference with static storage duration and an object of ~~POD~~ trivial type with static storage duration can be initialized with a constant expression (5.19); this is called constant initialization. Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. Static initialization shall be performed before any dynamic initialization takes place. Dynamic initialization of an object is either ordered or unordered. Definitions of explicitly specialized class template static data members have ordered initialization. Other class template static data members (i.e., implicitly or explicitly instantiated specializations) have unordered initialization. Other objects defined in namespace scope have ordered initialization. Objects defined within a single translation unit and with ordered initialization shall be initialized in the order of their definitions in the translation unit. The order of initialization is unspecified for objects with unordered initialization and for objects defined in different translation units. [ Note: 8.5.1 describes the order in which aggregate members are initialized. The initialization of local static objects is described in 6.7. —end note ]

3.8 ¶2 Object Lifetime [basic.life]

[ Note: the lifetime of an array object or of an object of ~~POD~~ trivial type (3.9) starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array or object occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. —end note ]

3.8 ¶5 Object Lifetime [basic.life]

Before the lifetime of an object has started but after the storage which the object will occupy has been allocated39) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a ~~non-POD~~ non-trivial class type, the program has undefined behavior if:

— the pointer is used to access a non-static data member or call a non-static member function of the object, or

— the pointer is implicitly converted (4.10) to a pointer to a base class type, or

— the pointer is used as the operand of a static_cast (5.2.9) (except when the conversion is to void*, or to void* and subsequently to char*, or unsigned char* )

— the pointer is used as the operand of a dynamic_cast (5.2.7).

3.8 ¶6 Object Lifetime [basic.life]

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any lvalue which refers to the original object may be used but only in limited ways. Such an lvalue refers to allocated storage (3.7.3.2), and using the properties of the lvalue which do not depend on its value is well-defined. If an lvalue-to-rvalue conversion (4.1) is applied to such an lvalue, the program has undefined behavior; if the original object will be or was of a ~~non-POD~~ non-trivial class type, the program has undefined behavior if:

— the lvalue is used to access a non-static data member or call a non-static member function of the object, or

— the lvalue is implicitly converted (4.10) to a reference to a base class type, or

— the lvalue is used as the operand of a static_cast (5.2.9) except when the conversion is ultimately to cv char& or cv unsigned char& ), or

— the lvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.

3.9 ¶2 Types [basic.types]

For any object (other than a base-class subobject) of ~~POD~~ trivial type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.41) If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

3.9 ¶3 Types [basic.types]

For any ~~POD~~ trivial type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the value of obj1 is copied into obj2, using the std::memcpy library function, obj2 shall subsequently hold the same value as obj1.

3.9 ¶4 Types [basic.types]

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For ~~POD~~ trivial types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.42)

3.9 ¶10 Types [basic.types]

Arithmetic types (3.9.1), enumeration types, pointer types, and pointer to member types (3.9.2), and cv-qualified versions of these types (3.9.3) are collectively called scalar types.

Scalar types, POD-struct types, POD-union types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called POD types.

Scalar types, trivial-class types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called trivial types.

Scalar types, standard-layout-class types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called standard-layout types.

3.9 ¶11 Types [basic.types]

If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types. [ Note: Layout-compatible enumerations are described in 7.2. Layout-compatible ~~POD-structs~~ standard-layout-structs and ~~POD-unions~~ standard-layout-unions are described in 9.2. —end note ]

5.2 ¶7 Postfix expressions [expr.post]

When there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg (18.8). The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a ~~non-POD~~ non-trivial class type (clause 9), the behavior is undefined. If the argument has integral or enumeration type that is subject to the integral promotions (4.5), or a floating point type that is subject to the floating point promotion (4.6), the value of the argument is converted to the promoted type before the call. These promotions are referred to as the default argument promotions.

5.3.4 ¶16 New [expr.new]

A new-expression that creates an object of type T initializes that object as follows:
— If the new-initializer is omitted:
— If T is a (possibly cv-qualified) ~~non-POD~~ non-trivial class type (or array thereof), the object is default-initialized (8.5). If T is a const-qualified type, the underlying class type shall have a user-declared default constructor.
— Otherwise, the object created has indeterminate value. If T is a const-qualified type, or a (possibly cv-qualified) ~~POD~~ trivial class type (or array thereof) containing (directly or indirectly) a member of const-qualified type, the program is ill-formed;
— If the new-initializer is of the form (), the item is value-initialized (8.5);
— If the new-initializer is of the form (expression-list) and T is a class type, the appropriate constructor is called, using expression-list as the arguments (8.5);
— If the new-initializer is of the form (expression-list) and T is an arithmetic, enumeration, pointer, or pointer-to-member type and expression-list comprises exactly one expression, then the object is initialized to the (possibly converted) value of the expression (8.5);
— Otherwise the new-expression is ill-formed.

5.9 ¶7 Relational operators [expr.rel]

Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
...
— If two pointers point to non-static data members of the same object, or to subobjects or array elements of such
members, recursively, the pointer to the later declared member compares greater provided the two members are
not separated by an access-specifier label (11.1) have the same access control (clause 11) and provided their class is not a union.

— If two pointers point to non-static data members of the same object ~~separated by an access-specifier label (11.1)~~ with different access control (clause 11) the result is unspecified.

See rationale

5.19 ¶4 Constant expressions

If N2235,Generalized Constant Expressions (Rev 5), or a successor, is to be applied to the working paper, apply it first and then change the following reworded 5.19p2 bullet as indicated:

"a class member access (5.2.5) unless its postfix-expression is of ~~POD~~ trivial type or literal type or of pointer to ~~POD~~ trivial type or literal type".

Otherwise, apply the following change as indicated:

An address constant expression is a pointer to an lvalue designating an object of static storage duration, a string literal (2.13.4), or a function. The pointer shall be created explicitly, using the unary & operator, or implicitly using a non-type template parameter of pointer type, or using an expression of array (4.2) or function (4.3) type. The subscripting operator [] and the class member access . and -> operators, the & and * unary operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression. An expression that designates the address of a subobject of a ~~non-POD~~ non-trivial class object (clause 9) is not an address constant expression (12.7). Function calls shall not be used in an address constant expression, even if the function is inline and has a reference return type.

5.19 ¶5 Constant expressions [expr.const]

A reference constant expression is an lvalue designating an object of static storage duration, a non-type template parameter of reference type, or a function. The subscripting operator [], the class member access . and -> operators, the & and * unary operators, and reference casts (except those invoking user-defined conversion functions (12.3.2) and except dynamic_casts (5.2.7)) can be used in the creation of a reference constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression. An lvalue expression that designates a member or base class of a ~~non-POD~~ non-trivial class object (clause 9) is not a reference constant expression (12.7). Function calls shall not be used in a reference constant expression, even if the function is inline and has a reference return type.

6.7 ¶3 Declaration statement [stmt.dcl]

It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A program that jumps82) from a point where a local variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has ~~POD~~ trivial type (3.9) and is declared without an initializer (8.5).

6.8 ¶4 Ambiguity resolution [stmt.ambig]

The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) is performed before any other initialization takes place. A local object of ~~POD~~ trivial type (3.9) with static storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization. If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration. If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined.

8.5 ¶5 Initializers [dcl.init]

To zero-initialize an object of type T means:

— if T is a scalar type (3.9), the object is set to the value 0 (zero), taken as an integral constant expression, converted to T;93)
— if T is a non-union class type, each non-static data member and each base-class subobject is zero-initialized;
— if T is a union type, the object’s first named data member94) is zero-initialized;
— if T is an array type, each element is zero-initialized;
— if T is a reference type, no initialization is performed.

To default-initialize an object of type T means:

— if T is a ~~non-POD~~ non-trivial class type (clause 9), the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);
— if T is an array type, each element is default-initialized;
— otherwise, the object is zero-initialized.

To value-initialize an object of type T means:

— if T is a class type (clause 9) with a user-declared constructor (12.1), then the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);
— if T is a non-union class type without a user-declared constructor, then every non-static data member and base-class component of T is value-initialized;95)
— if T is an array type, then each element is value-initialized;
— otherwise, the object is zero-initialized

8.5 ¶9 Initializers [dcl.init]

If no initializer is specified for an object, and the object is of (possibly cv-qualified) ~~non-POD~~ non-trivial class type (or array thereof), the object shall be default-initialized; if the object is of const-qualified type, the underlying class type shall have a user-declared default constructor. Otherwise, if no initializer is specified for a non-static object, the object and its subobjects, if any, have an indeterminate initial value97); if the object or any of its subobjects are of const-qualified type, the program is ill-formed.

8.5 ¶14 Initializers [dcl.init]

When an aggregate with static storage duration is initialized with a brace-enclosed initializer-list, if all the member initializer expressions are constant expressions, and the aggregate is a ~~POD~~ trivial type, the initialization shall be done during the static phase of initialization (3.6.2); otherwise, it is unspecified whether the initialization of members with constant expressions takes place during the static phase or during the dynamic phase of initialization.

8.5.1 ¶1 Aggregates [dcl.init.aggr]

An aggregate is an array or a class (clause 9) with ~~no user-declared constructors (12.1)~~ trivial constructors, no private or protected non-static data members (clause 11), no base classes with non-static data members (clause 10), and no virtual functions (10.3).

Portland: "no user-declared constructors" wording unchanged at request of CWG. Others have requested that "no user-declared constructors" be changed to "a trivial default constructor".

9.2 ¶12 Class members [class.mem]

Nonstatic data members of a (non-union) class ~~declared without an intervening access-specifier~~ with the same access control (clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members ~~separated by an access-specifier~~ with different access control is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).

See rationale.

9.2 ¶15-18 Class members [class.mem]

15 Two ~~POD-struct~~ standard-layout-struct (clause 9) types are layout-compatible if they have the same number of non-static data members, and corresponding non-static data members (in declaration order) have layout-compatible types (3.9).

16 Two ~~POD-union~~ standard-layout-union (clause 9) types are layout-compatible if they have the same number of non-static data members, and corresponding non-static data members (in any order) have layout-compatible types (3.9).

17 If a ~~POD-union~~ standard-layout-union contains two or more ~~POD-structs~~ standard-layout-structs that share a common initial sequence, and if the ~~POD-union~~ standard-layout-union object currently contains one of these ~~POD-structs~~ standard-layout-structs, it is permitted to inspect the common initial part of any of them. Two ~~POD-structs~~ standard-layout-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

18 A pointer to a ~~POD-struct~~ standard-layout-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a ~~POD-struct~~ standard-layout-struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

9.5 ¶1 Unions [class.union]

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time. [ Note: one special guarantee is made in order to simplify the use of unions: If a ~~POD-union~~ standard-layout-union contains several ~~POD-structs~~ standard-layout-structs that share a common initial sequence (9.2), and if an object of this ~~POD-union~~ standard-layout-union type contains one of the ~~POD-structs~~ standard-layout-structs, it is permitted to inspect the common initial sequence of any of ~~POD-struct~~ standard-layout-struct members; see 9.2. —end note ] The size of a union is sufficient to contain the largest of its data members. Each data member is allocated as if it were the sole member of a struct. A union can have member functions (including constructors and destructors), but not virtual (10.3) functions. A union shall not have base classes. A union shall not be used as a base class. An object of a class with a non-trivial default constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects. If a union contains a static data member, or a member of reference type, the program is ill-formed.

11.1 ¶3 Access Specifiers [class.access.spec]

~~The order of allocation of data members with separate access-specifier labels is unspecified (9.2).~~

The effect of access control on the order of allocation of data members is described in [class.mem].

12.6.2 ¶4 Initializing bases and members [class.base.init]

If a given non-static data member or base class is not named by a mem-initializer-id (including the case where there is no mem-initializer-list because the constructor has no ctor-initializer), then

— If the entity is a non-static data member of (possibly cv-qualified) class type (or array thereof) or a base class, and the entity class is a ~~non-POD~~ non-trivial class the entity is default-initialized (8.5). If the entity is a non-static data member of a const-qualified type, the entity class shall have a user-declared default constructor.

— Otherwise, the entity is not initialized. If the entity is of const-qualified type or reference type, or of a (possibly cv-qualified) ~~POD~~ trivial class type (or array thereof) containing (directly or indirectly) a member of a const-qualified type, the program is ill-formed.

After the call to a constructor for class X has completed, if a member of X is neither specified in the constructor’s mem-initializers, nor default-initialized, nor value-initialized, nor given a value during execution of the body of the constructor, the member has indeterminate value.

12.7 ¶1 Construction and destruction [class.cdtor]

For an object of ~~non-POD~~ non-trivial class type (clause 9) before the constructor begins execution and after the destructor finishes execution, referring to any non-static member or base class of the object results in undefined behavior. [ Example:

struct X { int i; };                 
struct Y : X { Y(); }; // non-trivial                   
struct A { int a; };                 
struct B : public A { int j; Y y; }; // non-trivial

extern B bobj;
B* pb = &bobj;         // OK
int* p1 = &bobj.a;     // undefined, refers to base class member
int* p2 = &bobj.y.i;   // undefined, refers to member’s member

A* pa = &bobj;         // undefined, upcast to a base class type
B bobj;                // definition of bobj

extern X xobj;
int* p3 = &xobj.i;     //OK, X is a POD trivial class
X xobj;

17.1.3 character container type [defns.character.container]

a class or a type used to represent a character (17.1.2). It is used for one of the template parameters of the string and iostream class templates. A character container class shall be a POD (3.9) type.

No change proposed; there is no known motivation for making a change.

18.1 ¶4 Types [support.types]

The macro offsetof(type, member-designator) accepts a restricted set of type arguments in this International Standard. If type is not a ~~POD structure or a POD union~~ standard-layout-struct or a standard-layout-union (clause 9), the results are undefined.189) The expression offsetof(type, member-designator) is never type-dependent (14.6.2.2) and it is value-dependent (14.6.2.3) if and only if type is dependent. The result of applying the offsetof macro to a field that is a static data member or a function member is undefined.

20.4 type traits

To 20.4.2, Header <type_traits> synopsis [lib.meta.type.synop], type properties, add:

template <class T> struct is_trivial;
template <class T> struct is_standard_layout;

To 20.4.5.3 Type properties [lib.meta.unary.prop], Type Property Predicates table, add:

Template Condition Preconditions

template <class T> struct is_trivial; T is a trivial type ([basic.types]) T shall be a complete type.

template <class T> struct is_standard_layout; T is a standard-layout type ([basic.types]) T shall be a complete type.

21 ¶1 Strings library [strings]

This clause describes components for manipulating sequences of “characters,” where characters may be of any POD (3.9) type. In this clause such types are called char-like types, and objects of char-like types are called char-like objects or simply “characters.”

No change. Users expect c_str() and data() to return pointers to POD types.

25.4 ¶4 C library algorithms [alg.c.library]

The function signature:

qsort(void *, size_t, size_t, int (*)(const void *, const void *));

is replaced by the two declarations:

extern "C" void qsort(void* base , size_t nmemb , size_t size, int (*compar )(const void*, const void*));

extern "C++" void qsort(void* base , size_t nmemb , size_t size, int (*compar )(const void*, const void*));

both of which have the same behavior as the original declaration. The behavior is undefined unless the objects in the array pointed to by base are of ~~POD~~ trivial type.

Open issues

Dependence on N2210 or equivalent

For classes with constructors other than the default constructor, there must be a way to tell the compiler to generate a trivial default constructor or treat a user-supplied default constructor with no effects as being trivial.

In N2210, Defaulted and Deleted Functions, Lawrence Crowl proposes explicit syntax to request the compiler supply trivial special member functions, particularly the default constructor. This would have the considerable advantage that it allows programs to express intent directly rather than relying on an apparently useless definition to tell the compiler of intent that a default constructor be trivial.

If N2210 or equivalent is not accepted, some other way of marking a default constructor as trivial must be specified, such as changing 12.1, Constructors:

A default constructor is trivial if it is implicitly-declared or if defined in the class definition and having no effects, and if:

— its class has no virtual functions (10.3) and no virtual base classes (10.1), and

— all the direct base classes of its class have trivial default constructors, and

— for all the non-static data members of its class that are of class type (or array thereof), each such class has a trivial
default constructor.

Allow base class with non-static data members

Some committee members would like to allow one base class with non-static data members.

Supply concepts

If concepts are accepted, add Trivial and StandardLayout concepts, and change the POD concept as needed.

Possible std::pair and std::tuple changes

Consider changes or additions to make std::pair a POD, such as a special template overload.

Howard Hinnant suggests that we can give std::tuple the desired POD semantics without a special template overload, since we would be breaking existing code.

Impact on existing code

The proposed changes will cause some existing non-POD's to become POD's. This may result in less optimization being performed. The problem can be eliminated by making the class non-POD again, for example, by adding a user-defined do-nothing destructor.

Adding a user-defined do-nothing destructor to existing code to leave POD-ness unchanged is simple enough that it could be done programmatically. If a compiler vendor felt this was a serious concern for their platform and user-base, they might wish to provide such a program. Alternately, compilers may wish to issue warnings during a transition period if the new rules change a non-POD into a POD.

Acceptance of Lawrence Crowl's N2210, Defaulted and Deleted Functions, or some equivalent proposal, will reduce the likelihood of existing non-POD's to become POD's. Assuming such a proposal is accepted, the only cases where a non-POD in existing code will change to a POD are if all of the following conditions are met:

The class has no user-declared constructors, user-declared copy assignment, user-declared destructor, virtual functions, or virtual base classes, and
At most one of the class or one of it's base classes have non-static data members, and
All non-static data members have the same access control, and
No base classes have the same type as the first non-static data member, and
All of the above conditions are met by the bases and members of the class.

Impact on existing ABI's

Allowing standard-layout classes to have base classes forces compilers to implement the empty base optimization for standard-layout classes, and this could break a compiler's application binary interface (ABI). See 9.2/18 above.

This is believed not to be a concern for modern compilers, except possibly in the case of multiple inheritance. Since multiple inheritance is not central to this proposal, allowing standard-layout classes or their bases to use multiple inheritance will be eliminated from the proposal if it proves contentious.

Interaction with other proposals

N2210, Defaulted and Deleted Functions, Lawrence Crowl. This proposal provides syntax to explicitly mark a user-declared default constructor declaration as a request for a compiler-generated trivial default constructor. Since this POD's Revisited proposal requires the ability to do just that, acceptance of Lawrence's proposal or equivalent is assumed. Note that with such explicit syntax, the impact of the POD's Revisited proposal on existing code is markedly reduced.

N2215, Initializer lists (Rev. 3), B. Stroustrup, G. Dos Reis. The authors of the Initializer lists proposal and this POD's Revisited proposal are committed to working together to ensure the two proposals stay in sync. It does not appear that the two proposals currently modify the same working paper text, so no difficulties are anticipated.

Core issue 538, Definition and usage of structure, POD-struct, POD-union, and POD class. This issue, currently in review status, clarifies POD related terminology throughout the working paper. Since it makes changes to the same text modified by this proposal, care must be taken to ensure the two proposals do not diverge.

Revision history

Revision 3

Removed "default constructor with no effects" as a way to identify a trivial default constructor in the presence of other constructors. N2210 makes this hack unnecessary.
Applied several small corrections and clarifications at the direction of the Core Working Group.
Provided alternate wording for 5.19 para 4 in case Generalized Constant Expressions gets applied to the WP first.

Revision 2 - N2172

An additional bullet item was added to standard-layout requirements to prevent two subobjects of the same type from getting the same address:

struct S {};
struct X: S {
  S s;
};

An additional bullet item was added to 8.5 to cover the case where a trivial class has a default constructor.
A footnote was added to 8.5 paragraph 14 pointing out that trivial types with user-declared constructors are not aggregates, and so are not required to be statically initialized.
Minor additional corrections to proposed wording.
The C/C++ interoperability example was removed. It involves issues well beyond the scope of this paper.

Revision 1 - N2102

Review and refinement of all wording.
Changed name from byte-copyable-class to trivial-class, to reflect properties rather than uses.
Added standard-layout class definition, in response to use cases that were legitimately non-copyable, but otherwise met POD requirements.
Changed proposed wording to be relative to issue 538's proposed wording.
Made corrections based on discussions with Core and Evolution working groups at the Portland committee meeting.
Added section discussion ABI issues.

Initial version - N2062

Acknowledgements

Initial Version - Matt Austern, Greg Colvin, Alisdair Meredith, and Clark Nelson provided helpful comments during preparation of this proposal. Our cat Jane woke me up in the middle of the night, provoking this proposal as an alternative to counting sheep (or cats).

Revision 1 - Greg Colvin and Lawrence Crowl provided legitimately non-copyable use cases. Alberto Ganesh Barbati pointed out that the proposed resolution should be relative to the 538 proposed resolution. Martin Sebor pointed out the need for clarification of 11.1, p3. The EWG and CWG in Portland reviewed a draft of revision 1 and made many helpful comments and suggestions. Clark Nelson is facilitating progress through Core. A suggestion was made that trivial types be renamed inert POD's, or IPOD's. Mike Miller suggested that a pod_cast operation be provided to ensure interoperability between POD's and IPOD's.

Revision 2 - Lawrence Crowl identified overly restrictive requirements in standard-layout classes. Daveed Vandevoorde provided the nasty subobject example noted in the revision history and Lawrence Crowl pointed out that the subobject problem could cross several levels of inheritance. Alisdair Meredith pointed out that default initialization now has an additional special case. Alisdair also originated the idea of tagging a class as trivial, standard-layout, or POD to allow diagnostics. Daniel Krügler asked questions resolved by adding a footnote to 8.5 paragraph 14.

Revision 3 - Lawrence Crowl's N2210 was the motivation for revision 3.

References

N2210 Defaulted and Deleted Functions, Lawrence Crowl, www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2210.html

N1824 Extending Aggregate Initialization, Alisdair Meredith, www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1824.htm

Core issue 538. www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#538, Definition and usage of structure, POD-struct, POD-union, and POD class.

Core issue 568. www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#568, Definition of POD is too strict.