Document number

ISO/IEC/JTC1/SC22/WG21/P1605R0

Date

2020-02-21

Reply-to

Rene Rivera, grafikrobot@gmail.com

Audience

WG21

1. Abstract

This proposes to add a core language facility to control the class data member order layout without otherwise impacting class definitions.

2. Changes

2.1. R0 (Initial)

Initial content without standard wording.

3. Introduction

In many domains where C++ thrives there is a contention with the desire for optimal data and code vs clear definitions. It is a desire that is hampered by the member layout rules in C++. Developers are faced with the choice of having well grouped and relevant information in class definitions and suboptimal memory use, or having memory use and incoherent class definitions. This proposal aims to add a facility to reconcile both goals of class design. This proposal hopes to achieve these goals:

  • Fine grain member layout control.

  • Keep member access control for access control.

To solve the problem we need to first see the problem. We can use a working example to work through what we need to address. Let start with a common use case of having a class with flags and values to enable/disable different features of it (highly abstracted):

class A
{
    public:

    // Feature A allows for using A.
    // This feature is optional and
    // is used when feature_a_enabled
    // == true. The feature_a_value
    // is a value in the range
    // [0,15000].

    bool feature_a_enabled;
    unsigned int feature_a_value;

    // Feature B allows for using B.
    // This feature is optional and
    // is used when feature_b_enabled
    // == true. The feature_b_value
    // is a value in the range
    // [0,60000].

    bool feature_b_enabled;
    unsigned int feature_b_value;
};

We can also look at a likely function that uses the members to do a calculation. In this case we’ll look at minimal function:

unsigned int A_q(A const & a)
{
    return a.feature_a_value + a.feature_b_value;
}

As expected the resulting assembly code for this is minimal and efficient:

A_q(A const&):
  mov eax, DWORD PTR [rdi+4]
  add eax, DWORD PTR [rdi+12]
  ret

The problem comes in when we look at the data size of the class:

sizeof(A) == 16

Sixteen bytes might seem small. But if we are dealing with a large number items the data size becomes a serious consideration:

sizeof(A[1024*1024]) == 16777216

When faced with that result and the constraints of some systems, say an embedded system with only 64MiB of total RAM, having one data structure take up 1/3 of your system is not acceptable. And programmers have used various techniques to ameliorate such waste. The most common being rearranging members to minimize the alignment padding. For our example we can place the bool members last and together to allow all the members to be packed:

class A
{
    public:

    // Feature A allows for using A.
    // This feature is optional and
    // is used when feature_a_enabled
    // == true. The feature_a_value
    // is a value in the range
    // [0,15000].

    bool feature_a_enabled;
    unsigned int feature_a_value;

    // Feature B allows for using B.
    // This feature is optional and
    // is used when feature_b_enabled
    // == true. The feature_b_value
    // is a value in the range
    // [0,60000].

    bool feature_b_enabled;
    unsigned int feature_b_value;
};
class A
{
    public:

    // Feature A allows for using A.
    // This feature is optional and
    // is used when feature_a_enabled
    // == true. The feature_a_value
    // is a value in the range
    // [0,15000].

    unsigned int feature_a_value;

    // Feature B allows for using B.
    // This feature is optional and
    // is used when feature_b_enabled
    // == true. The feature_b_value
    // is a value in the range
    // [0,60000].

    unsigned int feature_b_value;

    bool feature_a_enabled;
    bool feature_b_enabled;
};

With that arrangement we still have the minimal, optimal, access for our prototypical A_q function:

A_q(A const&):
  mov eax, DWORD PTR [rdi+4]
  add eax, DWORD PTR [rdi+12]
  ret
A_q(A const&):
  mov eax, DWORD PTR [rdi]
  add eax, DWORD PTR [rdi+4]
  ret

But more importantly we’ve reduced the overall size of the structure.

sizeof(A) == 16
sizeof(A) == 12
sizeof(A[1024*1024]) == 16777216
sizeof(A[1024*1024]) == 12582912

This only works when we restrict ourselves to follow the ORDERRULE [1]. Which is not always possible, and almost always not desired. We can go further in our space saving though. We can turn the data structure into a bitfield since we know the numerical limits of all our data members. And with some trial and error, and some knowledge of what compiler and system we are supporting can further optimize not just the size but minimize the impact this will have on the generated code. We can therefore do the following:

class A
{
    public:

    // Feature A allows for using A.
    // This feature is optional and
    // is used when feature_a_enabled
    // == true. The feature_a_value
    // is a value in the range
    // [0,15000].

    bool feature_a_enabled;
    unsigned int feature_a_value;

    // Feature B allows for using B.
    // This feature is optional and
    // is used when feature_b_enabled
    // == true. The feature_b_value
    // is a value in the range
    // [0,60000].

    bool feature_b_enabled;
    unsigned int feature_b_value;
};
class A
{
    public:

    // Feature A allows for using A.
    // This feature is optional and
    // is used when feature_a_enabled
    // == true. The feature_a_value
    // is a value in the range
    // [0,15000].


    // Feature B allows for using B.
    // This feature is optional and
    // is used when feature_b_enabled
    // == true. The feature_b_value
    // is a value in the range
    // [0,60000].

    unsigned int feature_b_value:16;
    unsigned int feature_a_value:14;
    bool feature_a_enabled:1;
    bool feature_b_enabled:1;
};
A_q(A const&):
  mov eax, DWORD PTR [rdi+4]
  add eax, DWORD PTR [rdi+12]
  ret
AA_q(A const&):
  movzx eax, WORD PTR [rdi+2]
  movzx edx, WORD PTR [rdi]
  and eax, 16383
  add eax, edx
  ret

Even though we’ve added some instructions to deal with the bit field we are still rather optimal in our access. What do we gain in terms of size?

sizeof(A) == 16
sizeof(A) == 4
sizeof(A[1024*1024]) == 16777216
sizeof(A[1024*1024]) == 4194304

This is a now in the palatable range.. We are tracking 1MiB objects in 4MiB. This, of course, comes at a price. We have now entirely detached the documentation in the class with the members they refer to. And making it even worse, the members are seemingly randomly arranged for the casual observer. This is ripe for causing all kinds of future maintenance problems for whomever is trying to understand this code.

There has been at least one previous attempt to solve this problem. P1112 [2] proposes a class level attribute to classify the kind of member layout to apply.

4. Proposal

We propose adding an optional layout: labeled section to class definitions wherein one list the order of members, already declared, in the class. The layout: section would:

  • List the names of any members one wishes to specific the order of.

  • Members listed would come first in the class member layout.

  • Members not listed would follow with the existing layout rules.

  • Member layout order does not alter initialization.

To continue with our example from above, the new class declaration using this feature could be:

class A
{
    public:

    // Feature A allows for using A.
    // This feature is optional and
    // is used when feature_a_enabled
    // == true. The feature_a_value
    // is a value in the range
    // [0,15000].

    // Feature B allows for using B.
    // This feature is optional and
    // is used when feature_b_enabled
    // == true. The feature_b_value
    // is a value in the range
    // [0,60000].

    unsigned int feature_b_value:16;
    unsigned int feature_a_value:14;
    bool feature_a_enabled:1;
    bool feature_b_enabled:1;
};
class A
{
    public:

    // Feature A allows for using A.
    // This feature is optional and
    // is used when feature_a_enabled
    // == true. The feature_a_value
    // is a value in the range
    // [0,15000].

    bool feature_a_enabled:1;
    unsigned int feature_a_value:14;

    // Feature B allows for using B.
    // This feature is optional and
    // is used when feature_b_enabled
    // == true. The feature_b_value
    // is a value in the range
    // [0,60000].

    bool feature_b_enabled:1;
    unsigned int feature_b_value:16;

    layout:

    // This layout gives us a 4 byte
    // structure size with minimal
    // additional access instructions.
    // When compiling with x86-64
    // gcc 9.2 with -O3.

    feature_b_value;
    feature_a_value;
    feature_a_enabled;
    feature_b_enabled;
};
A_q(A const&):
  movzx eax, WORD PTR [rdi+2]
  movzx edx, WORD PTR [rdi]
  and eax, 16383
  add eax, edx
  ret
A_q(A const&):
  movzx eax, WORD PTR [rdi+2]
  movzx edx, WORD PTR [rdi]
  and eax, 16383
  add eax, edx
  ret
sizeof(A) == 4
sizeof(A) == 4
sizeof(A[1024*1024]) == 4194304
sizeof(A[1024*1024]) == 4194304

As we can see the effect of optimizing the layout for the application use case is preserved. But the drawbacks of the optimization are removed. The layout: now contains the the members of the class in the order we require they be in.

Features of this proposal:

  • Puts the control of member layout where it matters, in the user’s hands. Where the particular tradeoffs of memory vs. performance can be made.

  • The layout can’t be ignored by the compiler and hence provides ABI stability across compiler version and possibly across compilers.

  • Coexists with existing #pragma pack compiler feature as it makes the ordering orthogonal from the packing.

  • Doesn’t override alignment and addressing requirements, again, because the ordering control is orthogonal. For example from use of alignas.

  • Simple, minimal, and clear syntax makes it easy to understand intent and effect.

  • Allows control of individual bit-field members within the same syntax as other members.

  • The layout declarations can be easily documented to provide rationales for users of the class.

  • Does not, definitionally, force override ordering of all members and hence allows for minimal targeted optimizations.

5. Proposed Wording

To be determined.

6. Design Decisions and Considerations

6.1. Why not have algorithmic layouts?

P1112 [2] proposes a mechanism to have "smart" algorithmic layout control. It proposes to add a [[layout(?)]] attribute to the class to select from an existing set of algorithmic layouts like: smallest, declorder, cacheline, and so on. A key problem with an algorithmic approach is the increased risk of ABI violations as pointed out in P1112.[2] Dealing with the C++ ABI is difficult enough as it is. We would like to avoid adding to the uncertainly and complexity of the C++ ABI.

6.2. Should alignment control be allowed in the layout declarations?

We need to consider if other member data specifications that affect size of the class should be consolidated, i.e. allowed, in the layout declaration section. For example alignas could be allowed as such:

class A
{
    public:

    bool a_f;
    int a;
    bool b_f;
    int b;

    layout:

    a;
    alignas(16) b;
    a_f;
    b_f;
};

6.3. Should layout order be reflect-ed?

There needs to be some thought and consideration given to how layout ordering can or should be available through reflection.

6.4. Should we additionally specify padding options?

It would be interesting to consider adding syntax to formalize both general and specific inter-member padding. I.e. it could be of benefit to extend this proposal to formalize and improve the common #pragma pack feature.

7. Acknowledgements

Thanks to Michael Shoell who, through varied lunch conversations, provided the impetus for this proposal.


1. class.mem p19 —  Non-static data members of a (non-union) class with the same access control and non-zero size ([intro.object]) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified. Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions ([class.virtual]) and virtual base classes ([class.mi]).
2. P1112 —  Language support for class layout control, Pal Balog