Doc. no. N2230=07-0090
Date:
2007-04-29
Project: Programming Language C++
Reply to: Beman Dawes <bdawes@acm.org>
Introduction
Summary of proposed changes
Features and benefits of POD types
Motivating examples
std::pair example
Endian example
Two structs example
Coupling between POD's and aggregates
Rationale for changes
Proposed changes to the Working Paper
Open issues
Impact on existing code
Impact on existing ABI's
Interactions with other proposals
Revision history
Acknowledgements
References
This paper proposes a resolution for Core Issue 568, Definition of POD is too strict, submitted by Matt Austern.
POD's as defined in the current version of the standard have several problems:
Features | Benefits |
Byte copyable guarantees [3.9 ¶2-3, basic.types] |
|
C layout-compatibility guarantees, byte copyable guarantees [9.2 ¶14-17, class.mem], initialization rules. |
|
Static initialization guarantees [3.6.2, basic.start.init] |
|
Are aggregates |
|
Various rules for non-POD's |
|
std::pair
exampleMatt Austern provided this example:
If a program has two arrays of type std::pair<int,int>
, then it
is natural to expect that memcpy(A2,A1,sizeof(A2))
would be safe.
Programmers have trouble imagining any implementation in which a byte-for-byte
copy of std::pair<int,int>
wouldn't do the right thing.
Unfortunately, that's not what the language standard says. It says that
byte-for-byte copies are guaranteed to work only for PODs. std::pair<T,U>
isn't a class aggregate, since it has a user-defined constructor, and that means
it also isn't a POD.
std::pair
has a user-defined constructor essentially for syntactic
reasons: because in some cases it looks nicer to write "std::pair<int,int>
p(1,2);
" than to write "std::pair<int,int> p = {1,2};
". It
seems a shame that this syntactic change caused the loss of the important
semantic property of PODness. It's especially a shame because it means something
formally doesn't work when on all real-world implementations it actually does
work. It also encourages programmers to rely on undefined behavior, which is
something the standard should not encourage.
With the proposed resolution, the std::pair<int,int>
example
is still not a POD because its default constructor has effects. With the
proposal in place, however, it becomes possible to turn std::pair
into a POD by removing the default constructor's effects. To avoid breaking
existing code, that can be done under control of an additional template
parameter. The intent is to propose such an addition to the LWG as the core
language POD proposal moves
forward.
Beman Dawes provided this example:
Here is an example of something in development for Boost, based on classes used in industrial applications for many years. The fact that it is a template partial specialization isn't material to this discussion and can be ignored.
template <typename T, std::size_t n_bits> class endian< big, T, n_bits, unaligned > : cover_operators< endian< big, T, n_bits >, T > { BOOST_STATIC_ASSERT( (n_bits/8)*8 == n_bits ); public: typedef T value_type; endian() {} endian(T i) { detail::store_big_endian<T, n_bits/8>(bytes, i); } operator T() const { return detail::load_big_endian<T, n_bits/8>(bytes); } private: char bytes[n_bits/8]; };
But it isn't a POD, so it won't work at all in unions and uses such as
binary I/O
rely on undefined behavior. Since the primary rationale for the existence of
endian
is to do
binary I/O, forcing the user to rely on undefined behavior is unfortunate to say the least.
Here is what would have to be done to make it a POD:
Remove the constructors. But that makes initialization painful, so boosters are proposing to add an ugly and unintuitive static init function, and an
operator=
from thevalue_type
. Those are partial workarounds, but not really what the designers, Beman Dawes and Darin Adler, wanted.Make the data member public. But this encourages a poor design practice.
Eliminate the base class. But the only way to do that without the highly error-prone duplication of the functions provided by the base class is to introduce a lengthy macro. Enough said.
In other words, making this class a POD under current language rules would do serious damage to interface ease-of-use and to code quality, and would encourage poor design practices. Yet the only data member in the class is an array of char, so programmers intuitively expect the class to be memcpyable and binary I/O-able.
With the proposed resolution, the class can be made
into a POD by making the default constructor trivial (with N2210 the syntax
would be endian()=default
), resolving all the issues.
Matt Austern provided this example in Core DR 568:
It’s silly for the standard to make layout and memcpy
guarantees for this
class:
struct A { int n; };
but not for this one:
struct B { int n; B(n_) : n(n_) { } };
With either A or B, it ought to be possible to save an array of those objects to disk with a single call to Unix’s write(2) system call or the equivalent. At present the standard says that it’s legal for A but not B, and there isn’t any good reason for that distinction.
With the proposed resolution, the class can be easily
changed (by adding B()=default
) to become a POD, solving all the issues.
POD's provide object representation guarantees, layout-compatibility guarantees, memory contiguity guarantees, and memory copy-ability guarantees for fairly simple types, yet leave compilers much latitude in such matters for more complicated types.
Aggregates provide well-defined initialization from initializer-clauses.
The two concepts are at most tangential, if not completely orthogonal. Thus to define POD in terms of aggregates creates an unnecessary and confusing dependency. It makes otherwise straightforward changes to the Standard POD and aggregate sections much more difficult because of the need to analyze a potential change for impact on both POD's and aggregates. The coupling is confusing to users, causing them to make mistaken assumptions about POD's. The coupling may be part of the reason even committee members cannot accurately remember the full rules for POD-ness.
The proposed changes decompose the current POD requirements into trivial type requirements and standard-layout type requirements, and remove the dependency on the definition of aggregates. Because these decomposed requirements are somewhat less restrictive than the requirements for aggregates, the effect is to make POD's more broadly useful and solve the problems identified in the Introduction and Motivating examples. It also opens up the possibility of designing useful classes that meet one or the other, but not both, of the new trivial and standard-layout requirements.
As a consequence of allowing members of any access control in standard-layout types, the current requirement that POD data members have no intervening access-specifiers is changed to require only that such data members have the same access control. This change is believed to also be more in line with programmer expectations than the current requirements.
Changes are not proposed that would allow POD's to have base classes with non-static data members. There was no apparent way to allow these cases without putting undue restrictions on how compilers allocate base class data in relation to derived class data. Note: this may be contentious; some committee members would like to allow one base class with non-static data members.
Added text is shown in
green and underlined. Deleted text is shown
in red with strikethrough.
Commentary is shown in italics. Commentary is not part of the proposed WP changes. |
Since issue 538 is currently in review status, changes to clause 9 paragraph 4 are shown relative to 538's proposed wording.
The following table lists all uses of POD, and related topics, in the current working paper, with proposed changes. Because the change to clause 9, paragraph 4,is critical to understanding the other changes, it is presented first.
Working Paper Text | |||||||||
9 ¶4 Classes [class] A structure is a class defined with the class-key A trivial-class is a class that:
A standard-layout-class is a class that:
[footnote] This ensures that two subobjects that have the same class type and that belong to the same most-derived object are not allocated at the same address ([expr.eq]). A standard-layout-struct is a standard-layout class defined with the class-key struct or the class-key class. A standard-layout-union is a standard-layout class defined with the class-key union. [Note: Standard-layout classes are useful for communicating with code written in other programming languages. The layout is specified in 9.2. -- end note] A POD-struct is [Example: struct N { // neither trivial nor standard-layout int i; int j; virtual ~N(); }; struct T { // trivial but not standard-layout int i; private: int j; }; struct SL { // standard-layout but not trivial int i; int j; ~SL(); }; struct POD { // both trivial and standard-layout int i; int j; }; -- end example] |
|||||||||
1.8 ¶5
The C++ object model [intro.object]
Unless it is a bit-field (9.6), a most derived object shall have a non-zero
size and shall occupy one or more bytes of storage. Base class subobjects
may have zero size. An object of |
|||||||||
3.6.2 ¶1 Initialization of non-local objects
[basic.start.init]
Objects with static storage duration (3.7.1) shall be zero-initialized (8.5)
before any other initialization takes place. A reference with static storage
duration and an object of
|
|||||||||
3.8 ¶2 Object Lifetime [basic.life] [ Note: the lifetime of an array object or of an object of
|
|||||||||
3.8 ¶5 Object Lifetime [basic.life] Before the lifetime
of an object has started but after the storage which the object will occupy
has been allocated39) or, after the lifetime of an object has ended and
before the storage which the object occupied is reused or released, any
pointer that refers to the storage location where the object will be or was
located may be used but only in limited ways. Such a pointer refers to
allocated storage (3.7.3.2), and using the pointer as if the pointer were of
type void*, is well-defined. Such a pointer may be dereferenced but the
resulting lvalue may only be used in limited ways, as described below. If
the object will be or was of a class type with a non-trivial destructor, and
the pointer is used as the operand of a delete-expression, the
program has undefined behavior. If the object will be or was of a — the pointer is used to access a non-static data member or call a non-static member function of the object, or — the pointer is implicitly converted (4.10) to a pointer to a base class type, or — the pointer is used as the operand of a static_cast (5.2.9) (except when the conversion is to void*, or to void* and subsequently to char*, or unsigned char* ) — the pointer is used as the operand of a dynamic_cast (5.2.7). |
|||||||||
3.8 ¶6 Object Lifetime [basic.life] Similarly, before the
lifetime of an object has started but after the storage which the object
will occupy has been allocated or, after the lifetime of an object has ended
and before the storage which the object occupied is reused or released, any
lvalue which refers to the original object may be used but only in limited
ways. Such an lvalue refers to allocated storage (3.7.3.2), and using the
properties of the lvalue which do not depend on its value is well-defined.
If an lvalue-to-rvalue conversion (4.1) is applied to such an lvalue, the
program has undefined behavior; if the original object will be or was of a
— the lvalue is used to access a non-static data member or call a non-static member function of the object, or — the lvalue is implicitly converted (4.10) to a reference to a base class type, or — the lvalue is used as the operand of a static_cast (5.2.9) except when the conversion is ultimately to cv char& or cv unsigned char& ), or — the lvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid. |
|||||||||
3.9 ¶2 Types [basic.types] For any object (other than a base-class subobject) of
|
|||||||||
3.9 ¶3 Types [basic.types] For any |
|||||||||
3.9 ¶4 Types [basic.types] The object representation of an object of type T is the
sequence of N unsigned char objects taken up by the object of type T, where
N equals sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For |
|||||||||
3.9 ¶10 Types [basic.types] Arithmetic types (3.9.1), enumeration types, pointer types, and pointer to member types (3.9.2), and cv-qualified versions of these types (3.9.3) are collectively called scalar types. Scalar types, POD-struct types, POD-union types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called POD types. Scalar types, trivial-class types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called trivial types. Scalar types, standard-layout-class types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called standard-layout types. |
|||||||||
3.9 ¶11 Types [basic.types] If two types T1 and T2 are the
same type, then T1 and T2 are layout-compatible types. [ Note:
Layout-compatible enumerations are described in 7.2. Layout-compatible
|
|||||||||
5.2 ¶7 Postfix expressions [expr.post] When there is no
parameter for a given argument, the argument is passed in such a way that
the receiving function can obtain the value of the argument by invoking
va_arg (18.8). The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and
function-to-pointer (4.3) standard conversions are performed on the argument
expression. After these conversions, if the argument does not have
arithmetic, enumeration, pointer, pointer to member, or class type, the
program is ill-formed. If the argument has a
|
|||||||||
5.3.4 ¶16 New [expr.new] A new-expression
that creates an object of type T initializes that object as follows: |
|||||||||
5.9 ¶7 Relational operators [expr.rel]
Pointers to objects or functions of the same type (after pointer
conversions) can be compared, with a result defined as follows:
|
|||||||||
5.19 ¶4 Constant expressions If N2235,Generalized Constant Expressions (Rev 5), or a successor, is to be applied to the working paper, apply it first and then change the following reworded 5.19p2 bullet as indicated:
Otherwise, apply the following change as indicated: An address
constant expression is a pointer to an lvalue designating an object of
static storage duration, a string literal (2.13.4), or a function. The
pointer shall be created explicitly, using the unary & operator, or
implicitly using a non-type template parameter of pointer type, or using an
expression of array (4.2) or function (4.3) type. The subscripting operator
[] and the class member access . and -> operators, the & and * unary
operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in
the creation of an address constant expression, but the value of an object
shall not be accessed by the use of these operators. If the subscripting
operator is used, one of its operands shall be an integral constant
expression. An expression that designates the address of a subobject of a
|
|||||||||
5.19 ¶5 Constant expressions [expr.const] A reference
constant expression is an lvalue designating an object of static storage
duration, a non-type template parameter of reference type, or a function.
The subscripting operator [], the class member access . and -> operators,
the & and * unary operators, and reference casts (except those invoking
user-defined conversion functions (12.3.2) and except dynamic_casts (5.2.7))
can be used in the creation of a reference constant expression, but the
value of an object shall not be accessed by the use of these operators. If
the subscripting operator is used, one of its operands shall be an integral
constant expression. An lvalue expression that designates a member or base
class of a |
|||||||||
6.7 ¶3 Declaration statement [stmt.dcl] It is possible
to transfer into a block, but not in a way that bypasses declarations with
initialization. A program that jumps82) from a point where a local variable
with automatic storage duration is not in scope to a point where it is in
scope is ill-formed unless the variable has |
|||||||||
6.8 ¶4 Ambiguity resolution [stmt.ambig] The
zero-initialization (8.5) of all local objects with static storage duration
(3.7.1) is performed before any other initialization takes place. A local
object of |
|||||||||
8.5 ¶5 Initializers [dcl.init] To zero-initialize an object of type T means:
To default-initialize an object of type T means:
To value-initialize an object of type T means:
|
|||||||||
8.5 ¶9 Initializers [dcl.init] If no initializer is
specified for an object, and the object is of (possibly cv-qualified)
|
|||||||||
8.5 ¶14 Initializers [dcl.init] When an aggregate with
static storage duration is initialized with a brace-enclosed
initializer-list, if all the member initializer expressions are constant
expressions, and the aggregate is a |
|||||||||
8.5.1 ¶1 Aggregates [dcl.init.aggr]
An
aggregate
is an array or a class (clause 9) with
| |||||||||
9.2 ¶12 Class members [class.mem]
Nonstatic data members of a (non-union) class
|
|||||||||
9.2 ¶15-18 Class members [class.mem]
15 Two 16 Two 17 If a 18 A pointer to a |
|||||||||
9.5 ¶1 Unions [class.union] In a union, at most one of the
data members can be active at any time, that is, the value of at most one of
the data members can be stored in a union at any time. [ Note: one special
guarantee is made in order to simplify the use of unions: If a
|
|||||||||
11.1 ¶3 Access Specifiers [class.access.spec]
The effect of
access control on the order of allocation of data members is described in [class.mem]. |
|||||||||
12.6.2 ¶4 Initializing bases and members
[class.base.init] If a
given non-static data member or base class is not named by a mem-initializer-id
(including the case where there is no mem-initializer-list because the
constructor has no ctor-initializer), then |
|||||||||
12.7 ¶1 Construction and destruction
[class.cdtor]
For an object of struct X { int i; }; struct Y : X { Y(); }; // non-trivial struct A { int a; }; struct B : public A { int j; Y y; }; // non-trivial extern B bobj; B* pb = &bobj; // OK int* p1 = &bobj.a; // undefined, refers to base class member int* p2 = &bobj.y.i; // undefined, refers to member’s member A* pa = &bobj; // undefined, upcast to a base class type B bobj; // definition of bobj extern X xobj; int* p3 = &xobj.i; //OK, X is a |
|||||||||
17.1.3 character container type [defns.character.container] a class or a
type used to represent a character (17.1.2). It is used for one of the
template parameters of the string and iostream class templates. A character
container class shall be a POD (3.9) type.
|
|||||||||
18.1 ¶4 Types [support.types] The macro offsetof(type,
member-designator) accepts a restricted set of type arguments in this
International Standard. If type is not a |
|||||||||
20.4 type traits To 20.4.2, Header <type_traits> synopsis [lib.meta.type.synop], type properties, add: template <class T> struct is_trivial; template <class T> struct is_standard_layout; To 20.4.5.3 Type properties [lib.meta.unary.prop], Type Property
Predicates table, add:
|
|||||||||
21 ¶1 Strings library [strings] This clause describes
components for manipulating sequences of “characters,” where characters may
be of any POD (3.9) type. In this clause such types are called char-like
types, and objects of char-like types are called char-like objects or simply
“characters.”
|
|||||||||
25.4 ¶4 C library algorithms [alg.c.library] The function
signature: |
For classes with constructors other than the default constructor, there must be a way to tell the compiler to generate a trivial default constructor or treat a user-supplied default constructor with no effects as being trivial.
In N2210, Defaulted and Deleted Functions, Lawrence Crowl proposes explicit syntax to request the compiler supply trivial special member functions, particularly the default constructor. This would have the considerable advantage that it allows programs to express intent directly rather than relying on an apparently useless definition to tell the compiler of intent that a default constructor be trivial.
If N2210 or equivalent is not accepted, some other way of marking a default constructor as trivial must be specified, such as changing 12.1, Constructors:
A default constructor is trivial if it is implicitly-declared or if defined in the class definition and having no effects, and if:
— its class has no virtual functions (10.3) and no virtual base classes (10.1), and
— all the direct base classes of its class have trivial default constructors, and
— for all the non-static data members of its class that are of class type (or array thereof), each such class has a trivial
default constructor.
Some committee members would like to allow one base class with non-static data members.
If concepts are accepted, add Trivial and StandardLayout concepts, and change the POD concept as needed.
Consider changes or additions to make std::pair a POD, such as a special template overload.
Howard Hinnant suggests that we can give std::tuple the desired POD semantics without a special template overload, since we would be breaking existing code.
The proposed changes will cause some existing non-POD's to become POD's. This may result in less optimization being performed. The problem can be eliminated by making the class non-POD again, for example, by adding a user-defined do-nothing destructor.
Adding a user-defined do-nothing destructor to existing code to leave POD-ness unchanged is simple enough that it could be done programmatically. If a compiler vendor felt this was a serious concern for their platform and user-base, they might wish to provide such a program. Alternately, compilers may wish to issue warnings during a transition period if the new rules change a non-POD into a POD.
Acceptance of Lawrence Crowl's N2210, Defaulted and Deleted Functions, or some equivalent proposal, will reduce the likelihood of existing non-POD's to become POD's. Assuming such a proposal is accepted, the only cases where a non-POD in existing code will change to a POD are if all of the following conditions are met:
Allowing standard-layout classes to have base classes forces compilers to implement the empty base optimization for standard-layout classes, and this could break a compiler's application binary interface (ABI). See 9.2/18 above.
This is believed not to be a concern for modern compilers, except possibly in the case of multiple inheritance. Since multiple inheritance is not central to this proposal, allowing standard-layout classes or their bases to use multiple inheritance will be eliminated from the proposal if it proves contentious.
N2210, Defaulted and Deleted Functions, Lawrence Crowl. This proposal provides syntax to explicitly mark a user-declared default constructor declaration as a request for a compiler-generated trivial default constructor. Since this POD's Revisited proposal requires the ability to do just that, acceptance of Lawrence's proposal or equivalent is assumed. Note that with such explicit syntax, the impact of the POD's Revisited proposal on existing code is markedly reduced.
N2215, Initializer lists (Rev. 3), B. Stroustrup, G. Dos Reis. The authors of the Initializer lists proposal and this POD's Revisited proposal are committed to working together to ensure the two proposals stay in sync. It does not appear that the two proposals currently modify the same working paper text, so no difficulties are anticipated.
Core issue 538, Definition and usage of structure, POD-struct, POD-union, and POD class. This issue, currently in review status, clarifies POD related terminology throughout the working paper. Since it makes changes to the same text modified by this proposal, care must be taken to ensure the two proposals do not diverge.
Revision 3
Revision 2 - N2172
struct S {}; struct X: S { S s; };
Revision 1 - N2102
Initial version - N2062
Initial Version - Matt Austern, Greg Colvin, Alisdair Meredith, and Clark Nelson provided helpful comments during preparation of this proposal. Our cat Jane woke me up in the middle of the night, provoking this proposal as an alternative to counting sheep (or cats).
Revision 1 - Greg Colvin and Lawrence Crowl provided legitimately
non-copyable use cases. Alberto Ganesh Barbati pointed out that the proposed
resolution should be relative to the 538 proposed resolution. Martin Sebor
pointed out the need for clarification of 11.1, p3. The EWG and CWG in Portland
reviewed a draft of revision 1 and made many helpful comments and suggestions.
Clark Nelson is facilitating progress through Core. A suggestion was made that
trivial types be renamed inert POD's, or IPOD's. Mike Miller suggested that a
pod_cast
operation be provided to ensure interoperability between
POD's and IPOD's.
Revision 2 - Lawrence Crowl identified overly restrictive requirements in standard-layout classes. Daveed Vandevoorde provided the nasty subobject example noted in the revision history and Lawrence Crowl pointed out that the subobject problem could cross several levels of inheritance. Alisdair Meredith pointed out that default initialization now has an additional special case. Alisdair also originated the idea of tagging a class as trivial, standard-layout, or POD to allow diagnostics. Daniel Krügler asked questions resolved by adding a footnote to 8.5 paragraph 14.
Revision 3 - Lawrence Crowl's N2210 was the motivation for revision 3.
N2210 Defaulted and Deleted Functions, Lawrence Crowl, www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2210.html
N1824 Extending Aggregate Initialization, Alisdair Meredith, www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1824.htm
Core issue 538. www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#538, Definition and usage of structure, POD-struct, POD-union, and POD class.
Core issue 568. www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#568, Definition of POD is too strict.