Reproducible expressions

Jens Gustedt, INRIA and ICube, France

2024-12-01

target

integration into IS ISO/IEC 9899:202y

document history

document number	date	comment
n3392	202411	Original proposal
n3415	202412	Detailed argumentation, add recommended practice

license

CC BY, see https://creativecommons.org/licenses/by/4.0

1 Motivation

To qualify if an array is a VLA or not, the status of the array length expression of the array declarator is important. If it is an integer constant expression, the array is not a VLA and all its properties are determined at translation time. Otherwise, it is a VLA and the type expression is evaluated whenever size information is needed. In particular, type expressions of VM types where the array length expression has side effects may (or may not) be evaluated each time it is reached during execution, and thus the side effects may or may not take place.

We think that this feature is actually a bug of the specification. It seams that real world examples where side effects appear in array length expressions are rare and most often erroneous. There is a particular code pattern that may add hidden modifications to state, that is when the array length expression is a macro invocation or a function call, and when the extent to which side effects in these appear are not properly mastered.

Additionally, array length expressions with side effects have a specific caveat: evaluation order. In the following

unsigned i = 45
double A[++i][++i]; // error, unsequenced evaluations, undefined behavior

the evaluations of the two ++i expressions are unsequenced, and thus the behavior of that declaration is undefined. If such a side effect is hidden in a macro or function call

double B[dimension(1)][dimension(2)]; // side effects in dimension()?

not much can be said about the array by inspecting the declaration without prior knowledge of dimension:

If dimension is a macro, something like dimension(1) could resolve to an integer constant expression and then the array is then of known constant size. Otherwise it is a VLA.
We don’t know if there are side effects hidden, and thus, if each time we reach this declaration there is a state change.
If there are side effects, we don’t know if they need to be sequenced, and thus we don’t know from simple inspection if the code has undefined behavior.

Unfortunately, the following features are often confused

the syntactic position of an expression
the point in time during execution when an expression is reached,
the point in time when an expression is evaluated, in particular if an expression is evaluated at
- translation time (e.g integer constant expressions),
- at link time (e.g address constants), or
- during the execution of the program when it is reached,
and whether or not an evaluation has a side effect.

Note that to determine if a type is a VM type or not, in general the distinction between link time and execution is not relevant, so in the following we will not distinguish these two cases. We are left with the following cascade of properties of array length expressions:

It is an integer constant expression
Otherwise, it is an expression of integer type but never has side effects.
Otherwise, it is an expression of integer type but only has side effects that result in the same reproducible state.
Otherwise, it is an expression of integer type that in some executions may have a side effect with changes that are not reproducible.

2 Approach

The goal of this papers is constrain the definitions of array declarator such that the last case never happens, but the cursor where and how to make the change has yet to be determined.

The proposed changes are normative.

The idea is to restrict possible array length expressions already syntactically as far as that is possible. The technique is similar to the one already used for “constant expression”. Namely the term is derived from “conditional expression” and then constrained further as necessary:

The term “conditional expression” already excludes all forms of assignments so this form of side effect is already excluded if this is itself an assignment.
The only arithmetic expression that may have side effects are assignment, increment and decrement operations, so we disallow these.
Another possible side effect is a read operation on a volatile lvalue, so ban them, too.
Then, side effects could still occur during function calls.

Only the last point cannot always be detected at translation time; the called function may be the result of the evaluation of a modifiable function pointer, and may thus change each time the function call expression is reached. Thus this last requirement cannot easily be expressed as a constraint and possibly leads to undefined behavior if we just ban “functions with side effects”.

In the following we will assume that we would reach consensus for the first points above, and that we only have to solve the problems of whether or not we want to allow function calls, and, if so, if we want to restrict these function calls in any way, syntactically or semantically.

There is a multitude of choices that could be made to improve the situation.

Ban function calls from array length expressions.
Allow function calls for functions with certain properties, namely
- restrict to unsequenced functions for variant 2.
- restrict to reproducible functions for variant 3.
If we want to restrict the possible functions, do we impose
- a syntactic feature, namely that we only accept functions or function pointers that are annotated by a certain attribute
- a semantic feature, namely that we only accept functions or function pointers that have certain properties.

The first option has the following properties:

It is easy to implement. Even simpler than for constant expressions, implementations just have to exclude certain operators from array length expressions, and disallow reading volatile lvalues.
It introduces an asymmetry with macro calls. Is dimension above a macro or a function?
This option is the one that impacts existing code the most. All code that has function calls to determine array lengths, would become invalid, and users would have to work around this restriction.

In essence this is an implementation-friendly but user-hostile option.

For the decision between the other possibilities, we note that none of these are exclusive in time: variant 3. could later be strengthened to variant 2., and a semantic feature could later be captured by syntax.

For this proposal, we choose the option that is the least restrictive for both, implementations and users. Namely we chose variant 3. allowing function calls that are reproducible, but leave the responsibility to check them where it is now, namely on the user side. In addition to banning direct side effects, the difference to the situation as it was before is then that implementations have guidance on the behavior that is expected, and that they may start to diagnose suspicious behavior where they can.

So we restrict the possible function calls to the properties that are collected for the [[reproducible]] attribute, namely of being effectless, idempotent and not having pointer parameters that give access mutable state. This is the minimum combination of features that is necessary for the desired properties:

Effectless ensures that no changes to non-local state variables are implied.
Idempotent ensures that no other observable changes to the local state of the function occur.
Only allowing pointer parameters to const-qualified base type ensures that not even modifications to state that is passed into the function may occur.

3 Suggested additions and changes to the wording.

New text is underlined green, removed text is ~~stroke-out red~~.

3.1 Add the term reproducible expression

Add a new clause 6.5’ before 6.6 (Constant expressions)

6.5’ Reproducible expressions

Syntax

reproducible-expression: conditional-expression

Description

A reproducible expression can be evaluated in any place without changing the observable program state.

Constraints

A reproducible expression shall not be or contain

an assignment operator,

an increment operator,

a decrement operator,

a conversion of an lvalue with volatile type.

Semantics

If a reproducible expression is evaluated and contains a function call expression, the called function shall be effectless and idempotent and no object that is pointed to by an argument of the call shall be modified.^FNT)

^FNT) That is, the function pointer expression of the call can be converted in place to a function pointer type with an [[reproducible]] attribute and where all pointer parameters, if any, are restrict-qualified and have a const-qualified base type without changing the semantics of the program.

Recommended practice

In contexts that require reproducible expressions with function calls, it is recommended to use functions that are annotated with [[reproducible]] and that have pointer parameters with const-qualified base types. Where this is possible, it is recommended that implementations diagnose if a function call in a reproducible expression is not effectless, not idempotent or if it modifies an object referred to by one of the arguments to the call.

3.2 Constrain the syntax of array declarations and array type names

Replace the grammar term ~~assignment-expression~~ used in

array-declarator (6.7.7.1, Declarators, General)
6,7.7.3, Array declarators, p3
array-abstract-declarator (6.7.8 Type names)

by reproducible-expression.

4 Interaction with other proposals

If n3414 is accepted concurrently, the additions of the word “assignment” there should instead read “reproducible”.