Array subscripting without decay

Javier A. Múgica, Spain

Jens Gustedt, INRIA and ICube, France

2024-09-30

target

integration into IS ISO/IEC 9899:202y

document history

document number	date	comment
n3311	202408	Original proposal
n3335	202409	Fixes to the constratint based on the length of the array. Added the informal proposal for discarded code. Intrduced the terms top level fixed/variable length array.
n3352	202409	Integrate feedback from the reflector: handle the case where the subscript is the same as the length, amend text for &, amend constant expressions for compound literals, amend text for address constants
n3360	202409	Adds the `[]` operator to named constants. Puts the upper bound on the lenght of the array also on

named and compound literal constants. Limits ICE inside [] to make a constant expression. Some wording fixes. |

license

CC BY, see https://creativecommons.org/licenses/by/4.0

1 Discussion

Traditionally, the definition of array subscripting goes through conversion of the array to a pointer. Thus, E[n] is defined as (*((E)+(n))), where E is converted to a pointer to its first element. On first sight it may seem there is no semantic difference between this and saying that “E[n] denotes the n^th element of the array”. But indeed there is; the paragraph on conversion of an array to a pointer says

Except when […], an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

Therefore, subscripting an array precludes its declaration with register. This seems an artificial restriction, existing only because of the way E[n] is defined. Implementations such as gcc have lifted this restriction since decades.

A similar restriction is present for their use as integer constant expressions (ICE). Consider

constexpr int x[3]= {0,10,20};
float y[x[1]];

This code is not valid because x[1] is not an integer constant expression. Since the whole object x is a constant expression, it seems each of its members should be usable as ICE.

It was also noted recently on the reflector that the expression *(E+n) produces an lvalue in instances where a non-lvalue would be expected for E[n], as in

struct {const int i; int arr[1];} func();
func().i;           //not l-value
func.arr[0];        //equivalent to the following
*(func().arr+0);    //lvalue

Other problems arose when studying the extension of the subscripting operation to allow range selections.

1.1 Constraint on the subscript

In an expression E[N], where E is an array, we propose N to be ≥ 0 and for it to be a constraint if N is an integer constant expression. This is not imposed if E has pointer type, neither as constraint nor as UB. In particular, for a pointer p the common idiom p[-1] remains valid. If E is an array this was already invalid, since, from the definitions of arrays in C, the element E[-1] does not exist, even if E were to decay to a pointer. Thus:

int A[3][3];
A[1][-1];  // A[1] is an array of three elements; say, B. B[-1] does not exist.
           // In *(B-1), the pointer B (once B decays to a pointer) points to the
           // the first element of an array of 3 elements. B-1 is not a valid pointer.
           // Hence, *(A[1]-1), hence A[1][-1], has UB as per the current standard.

An implementation may very well define this behavior and allow it. More likely, an implementation may just compute the adress A[1]-1 and have this code work without defining the behavior (it just works). Programmers relying on that can rewrite A[1][-1] (which henceforth will raise a diagnostic) to *(A[1]-1) in order to avoid the diagnostic.

Nevertheless it is unclear to us if that UB is not used by optimizers to make assumptions about subscripts, so using this UB on arrays is inherently dangerous. Therefore we propose to promote this from UB to a constraint in situations that are easily detectable at translation time, namely when the subscript is an ICE.

1.1.1 Drawback

The following code is valid today and would require a diagnostic if the constraint is introduced also for an out-of-bounds access beyond the array length:

#define SAFE_ACCESS(a, x) (((x) < ARRAY_SIZE(a)) ? a[x] : 0)
int a[10];
int b = SAFE_ACCESS(a, 10);

This macro is superseded by the constraint, but is still useful for VLA. A user of this macro would have to replace all uses for fixed length arrays by a direct access or change the macro so that the subscript is not an ICE even if x is.

Here the out-of-bound index is valid because the expression is not evaluated. However, writing “if the expression is evaluated” in the constraint is not adequate because being evaluated is a runtime property. There are contexts where the expression is known not to be evaluated already at translation time. It is for these contexts than an execption can be made in the constraint. However, this concept is not developed in the standard, and its introuction falls outside the scope of this proposal.

1.2 Options

We provide two options:

1.2.1 Option 1

In Option 1 we understand that the text implies that the address is never taken, just as member designation says

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object.

and it is understood that the object’s address is not taken.

1.2.2 Option 2

In Option 2 we make the address not taken only if the subscript is an ICE.

In the rewording we also forced, in E[n], E to be the array or pointer and n to be the integer. This can be undone.

1.3 Top level fixed and variable length array

We propound, independently of the main proposal, the introduction of that concept, the need for which has become apparent of late, for this and other array related proposals.

The paper N3327 proposes the changing of the term “variable length” to “variable size array”. That is independent from the term proposed here, though ultimately, if the change proposed in that paper is adopted, the term variable length array may eventually come to stand for what we call here top level variable length array. But even if the proposed change in that paper is adopted soon, the “re-meaning” of VLA will have to wait a few years, to prevent confusion with its current meaning.

2 Wording

New text is underlined green, removed text is ~~stroke-out red~~. Possible reorganization of the paragraphs is left to the discretion of the editors.

2.1 Array-to-pointer decay

6.3.2.1 Lvalues, arrays, and function designators

3 Except when it is the operand of the sizeof operator, or typeof operators, or the unary & operator, or the postfix expression of an array subscripting operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is implementation-defined.

2.2 Postfix operators

2.2.1 Option 1

6.5.3 Postfix operators

6.5.3.2 Array subscripting

Constraints

1 ~~One of the expressions~~The postfix expression shall have type “pointer to complete object type” ~~, the other expression shall have integer type, and the result has type “type”.~~ or “array of type”. The expression within square brackets, called the subscript, shall have integer type. If the postfix expression is an array and the subscript is an integer constant expression its value shall not be negative.

Semantics

2 A postfix expression followed by an expression in square brackets [ ] is a subscripted designation of an element of an array ~~object~~. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero). Let the expression be E[N] and let E be pointer to, or array of, T. The expression has type T. If E has pointer type the expression is equivalent to *((E)+(N)) and is an lvalue. If E has array type, let n be the value of N, E[N] designates the n-th element of E, counting from zero, it is an lvalue if E is an lvalue, and n shall not be negative and shall be less than the length of the array or equal to it; it shall only equal the length of the array if the [] operator is followed by zero or more [] operators with subscripts equal to zero and the resulting postfix expression is the operand of the unary & operator or is converted to an expression with pointer type as described in 6.3.2.1.^xxx)

^xxx) If E is a named constant of array type and the subscript is an integer constant expression with value inside the bounds of the array, the expression is again a named constant. If in addition T is an integer type or an arithmetic type, the expression is an integer constant expression or arithmetic constant expression respectively (6.6).

3 Successive subscript operators designate an element of a multidimensional array ~~object~~. If E is an n-dimensional array (ngeq2) with dimensions i × j × ⋯ × k, then E[N] ~~(used as other than an lvalue) is converted to a pointer to~~ denotes an (n − 1)-dimensional array with dimensions j × ⋯ × k. If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue. It follows from this that arrays are stored in row-major order (last subscript varies fastest).

4 EXAMPLE ~~The following snippet has an array object defined by the declaration:~~Consider the arrays defined by the declarations

int x[3][5];
constexpr int y[3] = {3, 6, 9};
float z[y[1]];

Here x is a 3 × 5 array of objects of type int; more precisely, x is an array of three element objects, each of which is an array of five objects of type int. In the expression x[i], which is equivalent to (*((x)+(i))), x is first converted to a pointer to the initial array of five objects of type int. Then i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five objects of type int. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the objects of type int, so x[i][j] yields an int. The expression x[1] designates the second element of array x, which is itself an array of five objects of type int. Then x[1][2] designates the third element thereof, which is an int. It is the 7-th stored element of the two-dimensional array x (counting from 0). z is not a variable length array, but an array of 6 float, since y[1] is an integer constant expression.

2.2.2 Option 2

Add the following paragraph after p. 2 and move the footnote at the end of p. 2 to the end of this paragraph.

3 If E has array type and the subscript is an integer constant expression the corresponding element of the array is accessed without taking the address of the array. Note that such an array subscripting operation is valid even if the array has register storage class.

6.5.4 Unary operators

6.5.4.3 Address and indirection operators

neither the & operator nor the unary *nor the access to the value that is implied by the [] is evaluated

2.3 Constant expressions

6.6 Constant expressions

6 A compound literal with storage-class specifier constexpr is a compound literal constant, as is a postfix expression that applies the . member access operator to a compound literal constant of structure or union type or the [] array subscripting operator to a compound literal constant of array type with an integer constant expression less than the length of the array as subscript, even recursively. A compound literal constant is a constant expression with the type and value of the unnamed object.

7 … is a named constant, as is a postfix expression that applies the . member access operator to a named constant of structure or union type or the [] array subscripting operator to a named constant of array type with an integer constant expression less than the length of the array as subscript, even recursively. …

12 Any constant expression can be used as operand in the creation of an address constant. Additionally, a non-constant expression formed with the array-subscript [] and member-access -> operator, the address & and indirection * unary operators, and pointer casts can be used ~~in the creation of an address constant~~, but then the value of an object shall not be accessed by use of these operators.

2.4 Storage class register

6.7.2 Storage-class specifiers

Remove the last sentence in the following footnote

¹²⁷⁾ The implementation can treat any register declaration simply as an auto declaration. However, whether or not addressable storage is used, the address of any part of an object declared with storage-class specifier register cannot be computed, either explicitly (by use of the unary & operator as discussed in 6.5.4.3) or implicitly (by converting an array name to a pointer as discussed in 6.3.3.1). ~~Thus, the only operator that can be applied to an array declared with storage-class specifier register is sizeof and the typeof operators.~~

3 Top level fixed/variable length array (Wording)

3.1 Array declarators

6.7.7.3 Array declarators

Insert the following paragraph after paragraph 5:

6 If the size is an integer constant expression the array is a top level fixed length array. If the size is * or is an expression which is not an integer constant expression the array is a top level variable length array.

4 Future extensions

We’d like a constraint on the subscript so as not to exceed the length of the array:

If in addition the array is complete and is a top level fixed length array, the subscript shall be less than or equal to the length of the array. It can only equal the length of the array if the [] operator is followed by zero or more [] operators with subscripts equal to zero and the resulting postfix expression is the operand of the unary & operator or is converted to an expression with pointer type as described in 6.3.2.1.

As explained in the discussion, an exception needs to be observed for code which is known not to be evaluated. We intend to introduce this concept in a future proposal. First, a term should be adopted for that code. Say, discarded.

Secondly, the text should deem certain code as discarded: Whenever it says that some code is not evaluated, the statements discarded as a result of a controlling expression being an ICE, the right operands of || and && operators when the left one is an ICE equal to 1 or 0 respectively, and others.

With that term introduced, the proposed constraint above would begin thus:

If in addition the expression is not discarded, the array is complete and …

5 Acknowledgment

We’d like to thank Martin Uecker and Joseph Myers for their suggestions.