Reproducible expressions

Jens Gustedt, INRIA and ICube, France

2024-12-01

target

integration into IS ISO/IEC 9899:202y

document history

document number date comment
n3392 202411 Original proposal
n3415 202412 Detailed argumentation, add recommended practice

license

CC BY, see https://creativecommons.org/licenses/by/4.0

1 Motivation

To qualify if an array is a VLA or not, the status of the array length expression of the array declarator is important. If it is an integer constant expression, the array is not a VLA and all its properties are determined at translation time. Otherwise, it is a VLA and the type expression is evaluated whenever size information is needed. In particular, type expressions of VM types where the array length expression has side effects may (or may not) be evaluated each time it is reached during execution, and thus the side effects may or may not take place.

We think that this feature is actually a bug of the specification. It seams that real world examples where side effects appear in array length expressions are rare and most often erroneous. There is a particular code pattern that may add hidden modifications to state, that is when the array length expression is a macro invocation or a function call, and when the extent to which side effects in these appear are not properly mastered.

Additionally, array length expressions with side effects have a specific caveat: evaluation order. In the following

unsigned i = 45
double A[++i][++i]; // error, unsequenced evaluations, undefined behavior

the evaluations of the two ++i expressions are unsequenced, and thus the behavior of that declaration is undefined. If such a side effect is hidden in a macro or function call

double B[dimension(1)][dimension(2)]; // side effects in dimension()?

not much can be said about the array by inspecting the declaration without prior knowledge of dimension:

Unfortunately, the following features are often confused

Note that to determine if a type is a VM type or not, in general the distinction between link time and execution is not relevant, so in the following we will not distinguish these two cases. We are left with the following cascade of properties of array length expressions:

  1. It is an integer constant expression
  2. Otherwise, it is an expression of integer type but never has side effects.
  3. Otherwise, it is an expression of integer type but only has side effects that result in the same reproducible state.
  4. Otherwise, it is an expression of integer type that in some executions may have a side effect with changes that are not reproducible.

2 Approach

The goal of this papers is constrain the definitions of array declarator such that the last case never happens, but the cursor where and how to make the change has yet to be determined.

The proposed changes are normative.

The idea is to restrict possible array length expressions already syntactically as far as that is possible. The technique is similar to the one already used for “constant expression”. Namely the term is derived from “conditional expression” and then constrained further as necessary:

Only the last point cannot always be detected at translation time; the called function may be the result of the evaluation of a modifiable function pointer, and may thus change each time the function call expression is reached. Thus this last requirement cannot easily be expressed as a constraint and possibly leads to undefined behavior if we just ban “functions with side effects”.

In the following we will assume that we would reach consensus for the first points above, and that we only have to solve the problems of whether or not we want to allow function calls, and, if so, if we want to restrict these function calls in any way, syntactically or semantically.

There is a multitude of choices that could be made to improve the situation.

The first option has the following properties:

In essence this is an implementation-friendly but user-hostile option.

For the decision between the other possibilities, we note that none of these are exclusive in time: variant 3. could later be strengthened to variant 2., and a semantic feature could later be captured by syntax.

For this proposal, we choose the option that is the least restrictive for both, implementations and users. Namely we chose variant 3. allowing function calls that are reproducible, but leave the responsibility to check them where it is now, namely on the user side. In addition to banning direct side effects, the difference to the situation as it was before is then that implementations have guidance on the behavior that is expected, and that they may start to diagnose suspicious behavior where they can.

So we restrict the possible function calls to the properties that are collected for the [[reproducible]] attribute, namely of being effectless, idempotent and not having pointer parameters that give access mutable state. This is the minimum combination of features that is necessary for the desired properties:

3 Suggested additions and changes to the wording.

New text is underlined green, removed text is stroke-out red.

3.1 Add the term reproducible expression

Add a new clause 6.5’ before 6.6 (Constant expressions)

6.5’ Reproducible expressions
Syntax
reproducible-expression: conditional-expression
Description
A reproducible expression can be evaluated in any place without changing the observable program state.
Constraints
A reproducible expression shall not be or contain
Semantics
If a reproducible expression is evaluated and contains a function call expression, the called function shall be effectless and idempotent and no object that is pointed to by an argument of the call shall be modified.FNT)
FNT) That is, the function pointer expression of the call can be converted in place to a function pointer type with an [[reproducible]] attribute and where all pointer parameters, if any, are restrict-qualified and have a const-qualified base type without changing the semantics of the program.
Recommended practice
In contexts that require reproducible expressions with function calls, it is recommended to use functions that are annotated with [[reproducible]] and that have pointer parameters with const-qualified base types. Where this is possible, it is recommended that implementations diagnose if a function call in a reproducible expression is not effectless, not idempotent or if it modifies an object referred to by one of the arguments to the call.

3.2 Constrain the syntax of array declarations and array type names

Replace the grammar term assignment-expression used in

by reproducible-expression.

4 Interaction with other proposals

If n3414 is accepted concurrently, the additions of the word “assignment” there should instead read “reproducible”.