Types and sizes

Jens Gustedt, INRIA, France

org: ISO/IEC JCT1/SC22/WG14 document: N2838
target: IS 9899:2023, TS 6010:2023 version: 1
date: 2021-10-11 license: CC BY

Revision history

Paper number Title Changes
N2820 Types and sizes Initial version
N2838 Types and sizes v1 Supersedes N2820
don’t use the term declarator
insist that cast expressions may lead to UB

Introduction

In 6.5.2 of the C standard, sizes are primarily defined for types. Although this is not stated explicitly, it is commonly assumed that such sizes cannot exceed SIZE_MAX. Sizes of objects (in contrast to storage instances as of TS 6010) are only a deduced property that is in most cases defined through the type that an object has. This proposal attempts to make this approach consistent throughout the standard, and to reduce the number of marginal cases where the interpretation of sizeof is different between C and C++.

For the latter, observe that code as in

int const n = 23;
int const m = 24;
double A[n][m];
int j = 0;
printf("sizeof is %zu\n", sizeof A[++j]);

is interpreted much differently in C and C++ since both languages have quite different definitions for integer constant expressions. For both the declaration of A is valid, but for C++ it is an array with compile-time fixed lengths, whereas for C it is a VLA. Therefore the sizeof operator may evaluate the increment operator in C, but not in C++.

Changes

Change in 6.2.5 p25

A complete type shall have a size that is less than or equal to SIZE_MAX. A type has known constant size if the type is not incomplete it is complete and is not a variable length array type.

Rationale:

In view of our recent discussion about overflow in calloc, we did some search into existing implementations and asked on the WG14 reflector and some other media if there could be objects defined that with a size that exceeds SIZE_MAX. It turned out that all interpret the current standard that huge objects make the behavior implicitly undefined. This change here makes that explicit. In particular, it makes it explicit that even requesting such a huge object has no defined behavior.

Impact:

This change is only a clarification and should not have an impact on existing programs or implementations.

Change in 6.5.3.4 p2

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a type name representing a variable length array type (possibly enclosed in a nested set of typeof declarators), the operand is evaluated;FNT1 otherwise, the operand is not evaluated and the behavior is undefined if the operand is an expression for which the size is not known before the sizeof operator is met. The result is an integer constant if the type has a known constant size, see 6.2.5..

FNT1 The evaluation of such declarators is specified in 6.7.6.2, below

Rationale:

This changes the specification when the operand of sizeof is not an lvalue of VLA type. Such an object cannot be declared as a compound literal, so it must either

sizeof (*(int (*)[some_function()])p)

For the first, the type of such an object and by that its size is already fixed before the execution reads the sizeof expression. The second already implicitly defines the type and can better be expressed with a type name variant

sizeof (int[some_function()])

that avoids the evaluation and dereferencing of the variable p, both operations potentially having undefined behavior.

Therefore we think that the evaluation should be restricted to the case where the operand is a type name for which the evaluation makes a difference in the type, namely to the case of the type name of a VLA that either occurs directly in the sizeof or with intermediate typeof.

Impact:

Consider the the following code snippet

double A[n][m];
int j = 0;
int i = 0;
printf("sizeof is %zu\n", sizeof A[++j][++i]);
printf("now j is %d, i is %d\n", j, i);
printf("sizeof is %zu\n", sizeof A[++j]);
printf("now j is %d\n", j);                 // what is printed here?

where n and m are supposed to be some integer variables with values greater than 2. In a non-representative survey among C and C++ enthusiasts we asked for the value that is printed for j. Their answers were distributed as follows:

j %
0 37.8
1 7.8
2 54.3

So over 90% had a wrong answer. For us this clearly shows the need to reform that marginal property of the sizeof operator.

With the current standard, the first two increment operators are not evaluated because the expression has type double (so i stays 0, for example) but would increment the last because the expression has type double[*], a VLA. As a consequence, j is 1 for the printf in line 7. With the proposed change, none of the increment operators would be evaluated.

By that, this proposal changes the status of some sizeof expressions, namely those that concern lvalues of VLA type.

This change also has an impact on the new typeof feature (N2724) which follows the same strategie as currently for sizeof. If this change here is agreed, we will coordinate for a similar change for typeof with the author.

Questions to WG14

  1. Shall we integrate Change 3.1 into C23?

  2. Shall we integrate Change 3.2 into C23?

  3. Shall we integrate the same changes into TS 6010?

Acknowledgements

We like to thank Joseph Myers for his feedback on the first version of this paper. The idea for Change 3.2 came from Tomasz Stanislawski.