sizeof
expressions2023-12-10
document number | date | comment |
---|---|---|
n3187 | 202312 | this paper, original proposal |
CC BY, see https://creativecommons.org/licenses/by/4.0
sizeof
expressions?With C23, the behavior of certain types of expressions that involve varible array lengths and sizeof
has changed from some prescribed behavior to implementation-defined.
The problem is an innocent looking phrase in 6.5.3.4 p2 (for sizeof
)
… If the type of the operand is a variable length array type, the operand is evaluated; …
which of course states a necessity: to determine the size of a VLA an evaluation of the hidden state is mandatory. But the generality of that phrase implies a probably unwanted “side effect”. Expressions that use casts can be of VLA type and still use other operators with side effects outside the VLA type expression itself. An example
Here, p
is thought to be a pointer to some complete type, and n
is an identifier with integer type and value; the pointer value is cast to a pointer to array type which is then indirected to an array type. So the type of the operand is double[n]
which may or may not be an constant length or variable length array.
For C17, the only clear case where the array could be a constant length array was when n
designated an enumeration constant. For n
the name of a variable, it was perhaps debatable if the standard allowed it to be constant expressions, but common interpretation excluded this and WG14 voted in fact to clarify this in this sense, see below. So the above sizeof
expression was commonly treated as not being a constant expression. Thus the operand had to be evaluated and the side effect of the increment operator had to be applied, too.
Gcc follows this line of argument since early versions, in particular they have const
-qualified variables that they accepted as “constant expression” (for example for the initialization of static
variables) but not as “integer constant expression”.
Since the specification didn’t seem clear enough, during the elaboration C23 we first (in Jan 2022) constrained that property for n
by making it explicit, that a variable (and other similar forms of constant expressions of integer type) cannot constitute an integer constant expression, and that in consequence ++p
had to be evaluated. Later (Jun 2023), we then reverted the position and decided to leave the possibility of having such a variable as an integer constant expression to the appreciation of the implementation. Namely in C23 we now have the following
14 An implementation may accept other forms of constant expressions; however, it is implementation-defined whether they are an integer constant expression.
which changes the circumstances in which ++p
is evaluated from mandatory to implementation-defined.
So C23 introduced a normative change, here, that now makes programs containing such code non-portable: there is no feature test for this property proposed by the standard.
We think that this new portability problem is merely artificial and should not be blamed on the concept of VLA. It is only there because we did not constrain the permitted expressions in sizeof
operands enough. As a relatively simple solution we propose to ban operators with side effects from all array length and from all sizeof
expressions where the operand has array type. Choosing all array types here is voluntary, because we want a constraint to trigger for all code, regardless on which side of the implementation-defined barrier the particular development platform is found.
With this choice of not having side effects, we will still have that some implementations have a specific array as constant length and some other have it as variable length. Nevertheless other than for some very special _Generic
expressions, this distinction alone has not much implications for real-life programs. For the variable length case there could be, in principle, an lvalue conversion of some internal state, but that state does not change during the lifetime of the type. So in reality, applications will not notice any difference whichever way their implementation goes.
Another option would be to ban these operators in all sizeof
expressions, not only for array types. The proposed text could be easily changed. Nevertheless we thought that this might have a too big impact on code that has nothing to do with VLA or array length expressions in general, and should thus be avoided at first.
By going through the standard with this problem in mind we found another inconsistency that we think should be improved at the same time, namely the status of parameters of variable modified type. For these there are basically two sets of rules:
A function that is only declared and not defined has all array length information removed for the prototype and replaced by the token *
. Thereby no bounds check on the caller side of such a function is imposed on implementations.
A function definition must have concrete values for all array lengths, such that within the body of the function all index calculations may be performed consistently.
These specifications leave a gap (or two depending on the POV), namely to know what the prototype of a function that only has a definition is, when it is called.
void foo(size_t n, char buf[n][n]) {
...
// what type for foo for recursive call?
}
void bar(void) {
// what type for foo on call?
}
Here, it is clear that for any call one array dimension is rewritten to a pointer. But what about the second? In principle inside the function itself could be as if we had a declaration that somehow kept the other dimension to a variable n
. But that would mean that foo is somehow an external symbol with a VM type in its type description.
There is currently no text for this: the text for declaration only functions clearly doesn’t apply for both cases, the text for definitions would imply the definition of a VM type for the parameters but which would live in file scope.
Currently the standard badly mixes terminology when talking about arrays, namely it talks of array size, length and bound when referring basically to the same concept. This is quite confusing, but could be fixed relatively easily. We propose to consequently talk about
For arrays this is consistent with the use in variable length array.
Clause 6.7.6.2 is very confusing, because for example it talks about a “size” which refers to the assignment-expression in the syntax, but which is exactly not that, a size, but is the number of array elements. Also it leaves some evaluations of length expressions and visibility of their side effect to the merci of the implementation, and the visible type of a function definition after the function body ends is obscure.
The current specification reads:
6.7.6.2 Array declarators
Constraints
1 In addition to optional type qualifiers and the keyword
static
, the[
and]
may delimit an expression or a*
. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keywordstatic
shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.
2 …
3 …
4 If the size is not present, the array type is an incomplete type. If the size is
*
instead of being an expression, the array type is a variable length array type of unspecified size, which can only be used as part of the nested sequence of declarators or abstract declarators for a parameter declaration, not including anything inside an array size expression in one of those declarators173); such arrays are nonetheless complete types. If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations need not support; see 6.10.9.3.)
5 If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by
*
; otherwise, each time it is evaluated it shall have a value greater than zero. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a typeof orsizeof
operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated. Where a size expression is part of the operand of analignof
operator, that expression is not evaluated.
We propose to add terminology to talk consistently of “array length” and to better distinguish arrays with variable length and constant length. Then we propose a normative change by clearly defining rules for the evaluation of array length expressions. Normative changes are marked like this, non-normative changes are not marked specially.
1 In addition to optional type qualifiers and the keyword
static
, the[
and]
may delimit an assignment expression or*
. If they delimit an assignment expression, it shall have an integer type and it or any subexpression shall not use assignment, increment and decrement operators and shall not apply the indirection and array subscript operators to a pointer tovolatile
-qualified target type. If the assignment expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keywordstatic
shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.
2 …
3 …
4 If present, the assignment expression or the
*
punctuator is called the length of the declarator. If the length is not present, the array type is an incomplete type. If the length is*
instead of being an assignment expression, the array type is a variable length array type of unspecified size, which can only be used as part of the nested sequence of declarators or abstract declarators for a parameter declaration, not including anything inside an array length expression in one of those declarators173); such arrays are nonetheless complete types. If the length is an integer constant expression and the element type has a known constant size, the array type is a constant length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations need not support; see 6.10.9.3.)
5 If the length is an assignment expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by
*
. Otherwise, each time the length is evaluated it shall have a value greater than zero and the evaluation shall have no side effects. Nevertheless, for a functionf
that has such a definition with variably modified types, the identifierf
has a type as if only declared (and not defined) and the same replacement rules as for function prototype scope apply for the purpose of determining the type off
; this not withstanding within their visibility scope, function parameters with variably modified type have a known length as described previously. The length of each instance of a variable length array type is stored in a hidden state of the execution as if an object ofconst
-qualified but notvolatile
-qualified integer type is declared in the same scope as the declared array type; it does not change during the lifetime of the array type. Where a length expression is part of the operand of analignof
operator, that expression is not evaluated.
The new constraints can’t cover function calls, because their side effects (or lack thereof) are not visible directly. We propose to add a recommended practice after p6
Recommended practice
6’ If an array length expression contains a function call, it is recommended that the called function is unsequenced such that under no circumstances side effects may occur. In addition, it is recommended that, as far as possible, array length expressions that produce side effects are diagnosed.
The changes proposed above have the disadvantage that they introduce new undefined behavior in the standard, in particular for side effects that would be hidden in function calls or that would be triggered by floating point computations. Since with C23 we have indeed the possibility to better describe what we expect and to place the interdiction completely into the constraint:
1 In addition to optional type qualifiers and the keyword
static
, the[
and]
may delimit an assignment expression or*
. If they delimit an assignment expression, it shall have an integer type and it or any subexpression shall not use assignment, increment and decrement operators, shall not apply the indirection and array subscript operators to a pointer tovolatile
-qualified target type, shall not perform evaluation or arithmetic of floating point type, and shall not evaluate function call expressions unless the corresponding function pointer has a type that has the[[unsequenced]]
attribute. If the assignment expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keywordstatic
shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.
2 …
3 …
4 If present, the assignment expression or the
*
punctuator is called the length of the declarator. If the length is not present, the array type is an incomplete type. If the length is*
instead of being an assignment expression, the array type is a variable length array type of unspecified size, which can only be used as part of the nested sequence of declarators or abstract declarators for a parameter declaration, not including anything inside an array length expression in one of those declarators173); such arrays are nonetheless complete types. If the length is an integer constant expression and the element type has a known constant size, the array type is a constant length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations need not support; see 6.10.9.3.)
5 If the length is an assignment expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by
*
. Otherwise, each time the length is evaluated it shall have a value greater than zero. Nevertheless, for a functionf
that has such a definition with variably modified types, the identifierf
has a type as if only declared (and not defined) and the same replacement rules as for function prototype scope apply for the purpose of determining the type off
; this not withstanding within their visibility scope, function parameters with variably modified type have a known length as described previously. The length of each instance of a variable length array type is stored in a hidden state of the execution as if an object ofconst
-qualified but notvolatile
-qualified integer type is declared in the same scope as the declared array type; it does not change during the lifetime of the array type. Where a length expression is part of the operand of analignof
operator, that expression is not evaluated.
NOTE: The exclusion of operations in the constraints ensure that array length expression will only perform value computations but never initiate side effects, see 5.1.2.3.
No recommended practice would be necessary for this variant.
sizeof
and alignof
operatorsHere, the current text reads:
6.5.3.4 The sizeof and alignof operators
Constraints
1 The
sizeof
operator shall not be applied to an expression that has function type or an incomplete type, to the parenthesizedname of such a type, or to an expression that designates a bit-field member. Thealignof
operator shall not be applied to a function type or an incomplete type.
Semantics
2 The
sizeof
operator yields the size (in bytes) of its operand, which may be an expression or the parenthesizedname of a type. The size is determined from the type of the operand.The result is an integer.If the type of the operand is a variable length array type, the operand isevaluated; otherwise, the operand isnot evaluatedand the result is aninteger constant.
This text has several problems, marked with strike through.
sizeof
expression by a number token with integer type is intended, here.We propose to replace that text with the following, again only normative changes are marked:
6.5.3.4 The sizeof and alignof operators
Constraints
1 The
sizeof
operator shall not be applied to an expression that has function type or an incomplete type, to a parenthesized type name, or to an expression that designates a bit-field member; if the type is an array type the operand shall not use assignment, increment and decrement operators and shall not apply the indirection and array subscript operators to a pointer to avolatile
-qualified target type. Thealignof
operator shall not be applied to a function type or an incomplete type.
Semantics
2 The
sizeof
operator yields the size (in bytes) of its operand; it is determined from the type of the operand. If that type is a variable length array type thesizeof
expression is said to be variable. In that case, the operand is evaluated and thesizeof
expression is not an integer constant expression; the evaluation shall not produce side effects. Otherwise, the result is determined at translation time, the operand is not evaluated, and thesizeof
expression is said to be constant and is an integer constant expression.
We also propose to add a recommended practice, here.
Recommended practice
If the operand of a
sizeof
expressions has an array type and contains a function call, it is recommended that the function has the unsequenced property.
The confusion in terminology also is present in the example of p8, where a comment uses the term “execution time sizeof
” which has not been introduced. We propose to change the term to “variable sizeof
”.
Similar problematic denomination is also present in the example 5 6.7.2.5 p10 for the typeof operators. A similar change should be applied here.
Similar as the above for array length we may already have variant that puts all restrictions into the constraint section.
6.5.3.4 The sizeof and alignof operators
Constraints
1 The
sizeof
operator shall not be applied to an expression that has function type or an incomplete type, to a parenthesized type name, or to an expression that designates a bit-field member; if the type is an array type the operand shall not use assignment, increment and decrement operators, shall not apply the indirection and array subscript operators to a pointer to avolatile
-qualified target type, shall not perform evaluation or arithmetic of floating point type, and shall not evaluate function call expressions unless the corresponding function pointer has a type that has the[[unsequenced]]
attribute. Thealignof
operator shall not be applied to a function type or an incomplete type.
Semantics
2 The
sizeof
operator yields the size (in bytes) of its operand; it is determined from the type of the operand. If that type is a variable length array type thesizeof
expression is said to be variable. In that case, the operand is evaluated and thesizeof
expression is not an integer constant expression. Otherwise, the result is determined at translation time, the operand is not evaluated, and thesizeof
expression is said to be constant and is an integer constant expression.
As a consequence, we propose the following changes in terminology, to be apply throughout the document.
C23 | C2y |
---|---|
array bound | array length |
array size (expression) | array length (expression) |
non-variable length array | constant length array |
execution time sizeof |
variable sizeof |
bounds-checking | length-checking |
Among others, this implies changes in Annex K (which becomes “Length-checking interfaces”) but not in Annex L where the term bound is used in a more generic way.