Chasing Ghosts I: constant expressions

Jens Gustedt, INRIA and ICube, France

2025-01-04

target

integration into IS ISO/IEC 9899:202y

document history

document number date comment
n3447 202501 Original proposal

license

CC BY, see https://creativecommons.org/licenses/by/4.0

1 Motivation

The recent campaign for slaying daemons has revealed that in fact some of the undefined behavior (UB) in the current C standard doesn’t even exist: some of the situations in J.2 that would in principle result in UB cannot trigger at all. The reason for these are misformulations in the normative text that seem to indicate UB where in fact there only are constraint violations or unspecified behavior.

We say that a semantic non-constraint requirement is a ghost-UB if no conforming program in any execution may ever violate it.

The present paper deals with ghost-UB that is attributed to constant expression, namely J.2 (50) to (53) in the current counting. The principal observation here to have is that the term “constant expression” is a syntax term and not a semantic term. So either an expression is a constant expression or it is isn’t and the difference between the two is not behavior (semantic) but syntax alone.

In fact all uses in the standard of the term are either covered by constraint violations (such as for array designators in initializers) or the fact if an expression is a constant expression (or not) then distinguishes different categories (such as VLA and arrays of known constant size). In all of these cases it makes no sense to speak of UB.

Additionally, the standard does not intend to leave constant expressions as a UB extension point: 6.6 p14 explicitly states that the use of extensions to the concept of constant expressions is implementation-defined.

2 Approach

Once convinced that we have ghost-UB the easiest way to deal with the situation is just to remove the useless listings (50) to (53) in J.2. We think that this alone would not be very user friendly and that users still would trip over the many “shall” that are confusingly applied in the text.

In fact, most of the text in 6.6 is even placed in the wrong category. The definitions made there are purely syntactical and convey no semantics beyond that. Thus we propose to reorder most of the text that then would appear under “Description” and only leave those entries that must in “Constraints” and “Semantics”. These deal with cases where the evaluated value does not fit:

Interesting for the latter, the implied UB didn’t even make it yet into J.2’s list, so we also add it, there.

Additionally:

No normative change is intended by this paper.

3 Suggested Wording

New text is underlined green, removed text is stroke-out red.

3.1 Clause 6.6, Constant expressions

We propose to reorder this clause completely and to remove most “shall” by just factual description. This means the following is a complete replacement of the corresponding section (but subclause 6.6.1 should remain as it is now).

6.6 Constant expressions
Syntax

constant-expression: conditional-expression

Description
2 The fact that a given conditional expression forms a constant expression is detected at translation time. In most of the cases the value of the constant expression is also determined at translation time, but values of address constants (or values that are derived from them) are possibly only determined during linking or program startup. An expression that evaluates to a constant is required in several contexts, the most general form appears in initializers for objects for which the value is determined at translation time or program startup such as objects with static storage duration or with the constexpr specifier.
3 A constant expression is a conditional expression that does not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is by definition an integer constant expression.115) Additionally, such a constant expression is, or evaluates to, a null pointer constant (6.3.3.3) or one of the categories that are described in this clause:
4 A compound literal with storage-class specifier constexpr is a compound literal constant, as is a postfix expression that applies the . member access operator to a compound literal constant of structure or union type, even recursively. A compound literal constant is a constant expression with the type and value of the unnamed object.
5 An identifier that is:
is a named constant, as is a postfix expression that applies the . member access operator to a named constant of structure or union type, even recursively. For enumeration and predefined constants, their value and type are defined in the respective clauses; for constexpr objects, such a named constant is a constant expression with the type and value of the declared object.
6 An integer constant expression116) has integer type and only has operands that are integer literals, named and compound literal constants of integer type, character literals, sizeof or _Lengthof expressions whose results are integer constant expressions, alignof expressions, and floating, named, or compound literal constants of arithmetic type that are the immediate operands of casts. Cast operators in an integer constant expression only convert arithmetic types to integer types, except as part of an operand to the typeof operators, sizeof operator, _Lengthof operator, or alignof operator.
7 An arithmetic constant expression has arithmetic type and only has operands that are integer literals, floating literals, named or compound literal constants of arithmetic type, character literals, sizeof and _Lengthof expressions whose results are integer constant expressions, and alignof expressions. Cast operators in an arithmetic constant expression only convert arithmetic types to arithmetic types, except as part of an operand to the typeof operators, sizeof operator, _Lengthof operator, or alignof operator.
8 An address constant is a null pointer,117) a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it is created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly using an expression of array or function type. The array-subscript [] and member-access -> operator, the address & and indirection * unary operators, and pointer casts can be used in the creation of an address constant, if the value of an object is not accessed by use of these operators.118)
9 A structure or union constant is a named constant or compound literal constant with structure or union type, respectively. Starting from a structure or union constant, the member-access . operator can be used to form a named constant or compound literal constant as described previously in this subclause; here, for a union constant, only the member that is initialized by the union constant’s initializer can be used.
10 An implementation may accept other forms of constant expressions, called extended constant expressions. It is implementation-defined whether extended constant expressions are usable in the same manner as the constant expressions defined in this document, including whether or not extended integer constant expressions are considered to be integer constant expressions.119)
Constraints
11 Each constant expression shall evaluate to a constant that is in the range of representable values for its type.
Semantics
12 If a floating expression is evaluated in the translation environment, the arithmetic range and precision shall be at least as great as if the expression were being evaluated in the execution environment.120)
13 The semantic rules for the evaluation of a constant expression are the same as for nonconstant expressions.121)
Forward references: array declarators (6.7.7.3), initialization (6.7.11).

The footnotes are as follows:

115) The typeof and alignof operators and many instances of sizeof and _Lengthof operators are by definition integer constant expressions (6.7.3.6) and thus their operands are not evaluated (6.5.4.5).
116) An integer constant expression is required in contexts such as the size of a bit-field member of a structure, the value of an enumeration constant, and the size of a non-variable length array. Specific rules to determine the value and specific constraints that apply to the integer constant expressions used in conditional-inclusion preprocessing directives are discussed in 6.10.2.
117) A named constant or compound literal constant of integer type and value zero is a null pointer constant. A named constant or compound literal constant with a pointer type and a value null or the constant nullptr cast to a pointer type are null pointers but not null pointer constants; they can only be used to initialize a pointer object at program or thread startup if its type implicitly converts to the target type.
118) Named constants or compound literal constants with arithmetic type, including names of constexpr objects, are valid in offset computations such as array subscripts or in pointer casts, as long as the expressions in which they occur form integer constant expressions. In contrast, names of other objects, even if const-qualified and with static storage duration, are not valid.
119) For example, in the declaration int arr_or_vla[(int)+1.0];, while possible to be computed by some implementations as an array with a size of one, it is implementation-defined whether this results in a variable length array declaration or a declaration of an array of known constant size of automatic storage duration. The choice depends on whether (int)+1.0 is an extended integer constant expression.
120) The use of evaluation formats as characterized by FLT_EVAL_METHOD and DEC_EVAL_METHOD also applies to evaluation in the translation environment.
121) Thus, in the following initialization,
static int i = 2 || 1 / 0;
the expression is a valid integer constant expression with value one.

3.2 J.2, Undefined behavior

Remove the following four entries

(50​) An expression that is required to be an integer constant expression does not have an integer type; has operands that are not integer literals, named constants, compound literal constants, enumeration constants, character literals, predefined constants, sizeof or _Lengthof expressions whose results are integer constant expressions, alignof expressions, or immediately-cast floating literals; or contains casts (outside operands to sizeof, _Lengthof and alignof operators) other than conversions of arithmetic types to integer types (6.6).
(51​) A constant expression in an initializer is not, or does not evaluate to, one of the following: a named constant, a compound literal constant, an arithmetic constant expression, a null pointer constant, an address constant, or an address constant for a complete object type plus or minus an integer constant expression (6.6).
(52​) An arithmetic constant expression does not have arithmetic type; has operands that are not integer literals, floating literals, named and compound literal constants of arithmetic type, character literals, predefined constants, sizeof or _Lengthof expressions whose results are integer constant expressions, or alignof expressions; or contains casts (outside operands to sizeof or alignof operators) other than conversions of arithmetic types to arithmetic types (6.6).
(53​) The value of an object is accessed by an array-subscript [], member-access . or ->, address &, or indirection * operator or a pointer cast in creating an address constant (6.6).

Add one new entry

(50′) The value of a floating expression as determined in the translation environment in the context of the evaluation of a constant expression is outside the arithmetic range or has less precision than if it were evaluated in the execution environment (6.6).

4 Note to the editors and other interested parties

There is a branch on WG14’s gitlab that reflects the proposed changes:

https://gitlab.gwdg.de/iso-c/draft/-/tree/ce-UB

Aknowledments

Thanks to Martin Uecker for review and discussions.