Doc. no. WG21/N2062=06-0132
Date:
2006-09-06
Project: Programming Language C++
Reply to: Beman Dawes <bdawes@acm.org>
Introduction
Features and benefits of POD types
Motivating examples
std::pair example
Endian example
Two structs example
Atomic example
Coupling between POD's and aggregates
Rationale for changes
Proposed changes to the Working Paper
POD in the Standard, with changes
Impact on existing code
Interactions with other proposals
Acknowledgements
References
This paper proposes resolutions for Core Issue 568, Definition of POD is too strict, submitted by Matt Austern.
The current working paper has several problems with POD's:
Features | Benefits |
Byte copyable guarantees [3.9 ¶2-3, basic.types] |
|
C layout-compatible guarantees, including including byte copyable, and [9.2 ¶14-17, class.mem] |
|
C code compatibility guarantees, including byte copyable, C layout compatible, and numerous initialization rules |
|
Static initialization guarantees [3.6.2, basic.start.init] |
|
Various rules for non-POD's |
|
std::pair
exampleMatt Austern provided this example:
If a program has two arrays of type std::pair<int,int>
, then it
is natural to expect that memcpy(A2,A1,sizeof(A2))
would be safe.
Programmers have trouble imagining any implementation in which a byte-for-byte
copy of std::pair<int,int>
wouldn't do the right thing.
Unfortunately, that's not what the language standard says. It says that
byte-for-byte copies are guaranteed to work only for PODs. std::pair<T,U>
isn't a class aggregate, since it has a user-defined constructor, and that means
it also isn't a POD.
std::pair
has a user-defined constructor essentially for syntactic
reasons: because in some cases it looks nicer to write "std::pair<int,int>
p(1,2);
" than to write "std::pair<int,int> p = {1,2};
". It
seems a shame that this syntactic change caused the loss of the important
semantic property of PODness. It's especially a shame because it means something
formally doesn't work when on all real-world implementations it actually does
work. It also encourages programmers to rely on undefined behavior, which is
something the standard should not encourage.
With the proposed wording, the example pair becomes a POD, solving the issue.
Beman Dawes provided this eample:
Here is an example of something in development for Boost, based on classes used in industrial applications for many years. The fact that it is a template partial specialization isn't material to this discussion and can be ignored.
template <typename T, std::size_t n_bits> class endian< big, T, n_bits, unaligned > : cover_operators< endian< big, T, n_bits >, T > { BOOST_STATIC_ASSERT( (n_bits/8)*8 == n_bits ); public: typedef T value_type; endian() {} endian(T i) { detail::store_big_endian<T, n_bits/8>(bytes, i); } operator T() const { return detail::load_big_endian<T, n_bits/8>(bytes); } private: char bytes[n_bits/8]; };
But it isn't a POD, so it won't work at all in unions. Some uses such as binary I/O rely on undefined behavior. Since the rationale for having endian is to do binary I/O, forcing the user to rely on undefined behavior is unfortunate to say the least.
Here is what would have to be done to make it a POD:
Remove the constructors. But that makes initialization painful, so boosters are proposing to add an ugly and unintuitive static init function, and an
operator=
from thevalue_type
. Those are partial workarounds, but not really what the designers, Beman Dawes and Darin Adler, want.Make the data member public. But this encourages a poor design practice.
Eliminate the base class. But the only way to do that without the highly error-prone duplication of the functions provided by the base class is to introduce a lengthy macro. Enough said.
In other words, making this class a POD under current language rules would do serious damage to interface ease-of-use and to code quality, and would encourage poor design practices. Yet the only data member of the class is an array of char, so programmers intuitively expect the class to be memcpyable and binary I/O-able.
With the proposed wording, the class becomes a POD, solving all the issues.
Matt Austern provided this example in Core DR 568:
It’s silly for the standard to make layout and memcpy guarantees for this class:
struct A { int n; };
but not for this one:
struct B { int n; B(n_) : n(n_) { } };
With either A or B, it ought to be possible to save an array of those objects to disk with a single call to Unix’s write(2) system call or the equivalent. At present the standard says that it’s legal for A but not B, and there isn’t any good reason for that distinction.
With the proposed wording, the class becomes a POD, solving all the issues.
Lawrence Crowl provided this example.
Consider a class providing atomic operations. Among other requirements, it should:
For best C++ coding practice, the data should be private. But that would make the class a non-POD under current rules. Under the proposed rules, it is allowable for the data members to be private, as long as all are private.
Under both the current and proposed rules, there doesn't seem to be any way to make a POD non-copyable.
POD's provide object representation guarantees, layout-compatibility guarantees, memory contiguity guarantees, and memory copy-ability guarantees for fairly simple types, yet leave compilers much latitude in such matters for more complicated types.
Aggregates provide well-defined initialization from initializer-clauses.
The two concepts are at most tangential, if not completely orthogonal. Thus to define POD in terms of aggregates creates an unnecessary and confusing dependency. It makes otherwise straightforward changes to the Standard POD and aggregate sections much more difficult because of the need to analyze a potential change for impact on both POD's and aggregates. The coupling is confusing to users, causing them to make mistaken assumptions about POD's. The coupling may be part of the reason even committee members cannot accurately remember the full rules for POD-ness.
The proposed changes decompose the byte-copyability requirement from the larger POD requirements. The dependency on the definition of aggregates by the definition of POD is removed. Instead, additional POD requirements are tailored to the needs of POD's. Because these requirements are somewhat less restrictive than the requirements for aggregates, the effect is to make POD's more broadly useful and solve the problems identified in the Introduction and Motivating examples.
Changes are not proposed that would allow POD's to be non-copyable. There was no apparent way to provide syntax for this without more complexity than is justified by a need judged to be fairly minor.
Changes are not proposed that would allow POD's to have base classes with non-static data members. There was no apparent way to allow these cases without putting undue restrictions on how compilers layout base class data in relation to derived class data.
Added text is shown in
green and underlined. Deleted text is show
in red with strikethrough.
Change 9 [class] paragraph 4 as indicated:
A structure is a class defined with the class-key
struct
; its members and base classes (clause 10) are public by default (clause 11). A union is a class defined with the class-keyunion
; its members are public by default and it holds only one data member at a time (9.5).[ Note: aggregates of class type are described in 8.5.1. —end note ]A byte-copyable-class is a class that has a trivial copy constructor (12.8), a trivial copy assignment operator (13.5.3, 12.8), and a trivial destructor (12.4). [Note: Among other requirements, that precludes virtual functions, virtual bases, and members or bases with non-trivial copy constructors, copy assignments, or destructors. --end note]A POD-struct is
an aggregatea byte-copyable class that:— has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and
— all non-static data members have the same access control (clause 11), and
— has no non-POD base classes, and no base classes with data members.Similarly, a POD-union is
an aggregatea union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-declared copy assignment operator and no user-declared destructor. A POD class is a class that is either a POD-struct or a POD-union. [Note: virtual functions and base classes are prohibited in unions (9.5). -- end note.]
Change other WP text as indicated in the POD in the Standard table below.
The following table lists uses of POD in the current working paper, with proposed changes.
Working Paper Text | Proposal |
1.8 ¶5 [intro.object] An object of POD5) type (3.9) shall occupy contiguous bytes of storage. |
No change |
3.6.2 ¶1 Initialization of non-local objects Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place. A reference with static storage duration and an object of POD type with static storage duration can be initialized with a constant expression (5.19); this is called constant initialization. Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. |
No change |
3.8 ¶2 Object Lifetime [ Note: the lifetime of an array object or of an object of POD type (3.9) starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array or object occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. —end note ] |
No change |
3.8 ¶5 Object Lifetime Restrictions on pointers to partially constructed non-POD types. |
No change |
3.8 ¶6 Object Lifetime Restrictions on l-values of partially constructed non-POD types. |
No change |
3.9 ¶2 Types For any object (other than a base-class subobject) of
|
Change as indicated |
3.9 ¶3 Types For any |
Change as indicated |
3.9 ¶4 Types The object representation of an object of type T is the
sequence of N unsigned char objects taken up by the object of type T, where
N equals sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For |
Change as indicated |
3.9 ¶10 Types Arithmetic types (3.9.1), enumeration types, pointer types, and pointer to member types (3.9.2), and cv-qualified versions of these types (3.9.3) are collectively called scalar types. Scalar types, POD-struct types, POD-union types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called POD types. |
No change |
3.9 ¶11 Types If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types. [ Note: Layout-compatible enumerations are described in 7.2. Layout-compatible POD-structs and POD-unions are described in 9.2. —end note ] |
No change |
5.2 ¶7 Postfix expressions When there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg (18.8). The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined. If the argument has integral or enumeration type that is subject to the integral promotions (4.5), or a floating point type that is subject to the floating point promotion (4.6), the value of the argument is converted to the promoted type before the call. These promotions are referred to as the default argument promotions. |
No change |
5.3.4 ¶16 New A new-expression
that creates an object of type T initializes that object as follows: |
No change |
5.19 ¶4 Constant expressions An address constant expression is a pointer to an lvalue designating an object of static storage duration, a string literal (2.13.4), or a function. The pointer shall be created explicitly, using the unary & operator, or implicitly using a non-type template parameter of pointer type, or using an expression of array (4.2) or function (4.3) type. The subscripting operator [] and the class member access . and -> operators, the & and * unary operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression. An expression that designates the address of a subobject of a non-POD class object (clause 9) is not an address constant expression (12.7). Function calls shall not be used in an address constant expression, even if the function is inline and has a reference return type. |
No change |
5.19 ¶5 Constant expressions A reference constant expression is an lvalue designating an object of static storage duration, a non-type template parameter of reference type, or a function. The subscripting operator [], the class member access . and -> operators, the & and * unary operators, and reference casts (except those invoking user-defined conversion functions (12.3.2) and except dynamic_casts (5.2.7)) can be used in the creation of a reference constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression. An lvalue expression that designates a member or base class of a non-POD class object (clause 9) is not a reference constant expression (12.7). Function calls shall not be used in a reference constant expression, even if the function is inline and has a reference return type. |
No change |
6.7 ¶3 Declaration statement It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A program that jumps82) from a point where a local variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has POD type (3.9) and is declared without an initializer (8.5). |
No change |
6.8 ¶4 Ambiguity resolution The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) is performed before any other initialization takes place. A local object of POD type (3.9) with static storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization. If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration. If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined. |
No change |
8.5 ¶5 Initializers
|
No change |
8.5 ¶9 Initializers If no initializer is specified for an object, and the object is of (possibly cv-qualified) non-POD class type (or array thereof), the object shall be default-initialized; if the object is of const-qualified type, the underlying class type shall have a user-declared default constructor. Otherwise, if no initializer is specified for a non-static object, the object and its subobjects, if any, have an indeterminate initial value97); if the object or any of its subobjects are of const-qualified type, the program is ill-formed. |
No change |
8.5 ¶14 Initializers When an aggregate with static storage duration is initialized with a brace-enclosed initializer-list, if all the member initializer expressions are constant expressions, and the aggregate is a POD type, the initialization shall be done during the static phase of initialization (3.6.2); otherwise, it is unspecified whether the initialization of members with constant expressions takes place during the static phase or during the dynamic phase of initialization. |
No Change |
9 ¶4 Classes [class] A structure is a class defined with the class-key struct; its members and base classes (clause 10) are public by default (clause 11). A union is a class defined with the class-key union; its members are public by default and it holds only one data member at a time (9.5). [ Note: aggregates of class type are described in 8.5.1. —end note ] A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-declared copy assignment operator and no user-declared destructor. Similarly, a POD-union is an aggregate union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-declared copy assignment operator and no user-declared destructor. A POD class is a class that is either a POD-struct or a POD-union. |
See proposed change above. |
9.2 ¶15-18 Class members [class.mem] 15 Two POD-struct (clause 9) types are layout-compatible if they have the same number of non-static data members, and corresponding non-static data members (in order) have layout-compatible types (3.9). 16 Two POD-union (clause 9) types are layout-compatible if they have the same number of non-static data members, and corresponding non-static data members (in any order) have layout-compatible types (3.9). 17 If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members. 18 A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ] |
No Change |
9.5 ¶1 Unions In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time. [ Note: one special guarantee is made in order to simplify the use of unions: If a POD-union contains several POD-structs that share a common initial sequence (9.2), and if an object of this POD-union type contains one of the POD-structs, it is permitted to inspect the common initial sequence of any of POD-struct members; see 9.2. —end note ] The size of a union is sufficient to contain the largest of its data members. Each data member is allocated as if it were the sole member of a struct. A union can have member functions (including constructors and destructors), but not virtual (10.3) functions. A union shall not have base classes. A union shall not be used as a base class. An object of a class with a non-trivial default constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects. If a union contains a static data member, or a member of reference type, the program is ill-formed. |
No change |
12.6.2 ¶4 Initializing bases and members If a
given non-static data member or base class is not named by a mem-initializer-id
(including the case where there is no mem-initializer-list because the
constructor has no ctor-initializer), then |
No change |
12.7 ¶1 Construction and destruction For an object of non-POD class type (clause 9) before the constructor begins execution and after the destructor finishes execution, referring to any non-static member or base class of the object results in undefined behavior. [ Example:
|
Change as indicated |
17.1.3 character container type a class or a type used to represent a character (17.1.2). It is used for one of the template parameters of the string and iostream class templates. A character container class shall be a POD (3.9) type. |
No change. Users expect characters involved in I/O to be C-layout-compatible, and thus POD types. |
18.1 ¶4 Types The macro offsetof(type, member-designator) accepts a restricted set of type arguments in this International Standard. If type is not a POD structure or a POD union (clause 9), the results are undefined.189) The expression offsetof(type, member-designator) is never type-dependent (14.6.2.2) and it is value-dependent (14.6.2.3) if and only if type is dependent. The result of applying the offsetof macro to a field that is a static data member or a function member is undefined. |
No change |
20.4 type traits has many uses of POD in the specification of is_pod. Most of those uses clearly will remain unchanged. Uses in other type traits need to be reviewed | TODO |
21 ¶1 Strings library This clause describes components for manipulating sequences of “characters,” where characters may be of any POD (3.9) type. In this clause such types are called char-like types, and objects of char-like types are called char-like objects or simply “characters.” |
No change. Users expect
c_str() and data() to return pointers to
C-layout-compatible, and thus POD types. |
25.4 ¶4 C library algorithms The function
signature: |
Change as indicated |
The proposed changes will cause some existing non-POD's to become POD's. This may result in less optimization being performed. The problem can be eliminated by adding a user-defined do-nothing destructor.
Adding a user-defined do-nothing destructor to existing code to leave POD-ness unchanged is simple enough that it could be done programmatically. If a compiler vendor felt this was a serious concern for their user-base, they might wish to provide such a program. Alternately, compilers may wish to issue warnings during a transition period if the new rules change a non-POD into a POD. |
See N1824, Extending Aggregate Initialization. Whichever proposal is accepted first, the other will have to be reviewed, and possibly revised, accordingly.
Matt Austern, Greg Colvin, Alisdair Meredith, and Clark Nelson provided helpful comments during preparation of this proposal. Our cat Jane woke me up in the middle of the night, provoking this proposal as an alternative to counting sheep (or cats).
N1824 Extending Aggregate Initialization, Alisdair Meredith, www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1824.htm
Core issue 568. www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#568