2024-08-21

integration into IS ISO/IEC 9899:202y

document number | date | comment |
---|---|---|

N3311 | 202408 |
original proposal |

**CC BY**, see https://creativecommons.org/licenses/by/4.0

Traditionally, the definition of array subscripting goes through
conversion of the array to a pointer. Thus, `E[n]`

is defined as `(*((E)+(n)))`

,
where `E`

is converted to a pointer to
its first element. On first sight it may seem there is no semantic
difference between this and saying that “`E[n]`

denotes the
`n`

^{th} element of the array”. But indeed there is;
the paragraph on conversion of an array to a pointer says

Except when […], an expression that has type “array of

type” is converted to an expression with type “pointer totype” that points to the initial element of the array object and is not an lvalue.If the array object has`register`

storage class, the behavior is undefined.

Therefore, subscripting an array precludes its declaration with `register`

. This seems
an artificial restriction, existing only because of the way `E[n]`

is defined. Implementations such as gcc have lifted this restriction
since decades.

A similar restriction is present for their use as integer constant expressions (ICE). Consider

This code is not valid because `x[1]`

is not an integer constant expression. Since the whole object `x`

is a constant expression, it seems each of
its members should be usable as ICE.

It was also noted recently on the reflector that the expression `*(E+n)`

produces an lvalue in instances where a non-lvalue would be expected for
`E[n]`

,
as in

```
struct {const int i; int arr[1];} func();
func().i; //not l-value
func.arr[0]; //equivalent to the following
*(func().arr+0); //lvalue
```

Other problems arose when studying the extension of the subscripting operation to allow range selections.

In an expression `E[N]`

,
where `E`

is an array, we mandate `N`

to be ≥ 0 and for it to be a constraint if
`N`

is an integer constant expression. This is
not imposed if `E`

has pointer type,
neither as constraint nor as UB. In particular, for a pointer `p`

the common idiom `p[-1]`

remains valid. If `E`

is an array this
was already invalid, since, from the definitions of arrays in C, the
element `E[-1]`

does not exist, even if `E`

were to
decay to a pointer. Thus:

```
int A[3][3];
A[1][-1]; // A[1] is an array of three elements; say, B. B[-1] does not exist.
// In *(B-1), the pointer B (once B decays to a pointer) points to the
// the first element of an array of 3 elements. B-1 is not a valid pointer.
// Hence, *(A[1]-1), hence A[1][-1], has UB as per the current standard.
```

An implementation may very well define this behavior and allow it.
More likely, an implementation may just compute the adress `A[1]-1`

and
have this code work without defining the behavior (it just works).
Programmers relying on that can rewrite `A[1][-1]`

(which henceforth will raise a diagnostic) to `*(A[1]-1)`

in order to avoid the diagnostic.

Nevertheless it is unclear to us if that UB is not used by optimizers to make assumptions about subscripts, so using this UB on arrays is inherently dangerous. Therefore we propose to promote this from UB to a constraint in situations that are easily detectable at translation time, namely when the subscript is an ICE.

Similarly we propose to introduce a constraint for a diagnosable out-of-bounds access beyond the array length.

We provide two options:

In Option 1 we understand that the text implies that the address is never taken, just as member designation says

A postfix expression followed by the

`.`

operator and an identifier designates a member of a structure or union object.

and it is understood that the object’s address is not taken.

In Option 2 we make the address not taken only if the subscript is an ICE.

In the rewording we also forced, in `E[n]`

,
`E`

to be the array or pointer and `n`

to be the integer. This can be undone.

New text is underlined green, removed text is
~~stroke-out red~~. Possible reorganization of the paragraphs is
left to the discretion of the editors.

6.3.2.1 Lvalues, arrays, and function designators

3 Except when it is the operand of the

`sizeof`

operator, or`typeof`

operators, or the unary`&`

operator, or the postfix expression of an array subscripting operator, or is a string literal used to initialize an array, an expression that has type “array oftype” is converted to an expression with type “pointer totype” that points to the initial element of the array object and is not an lvalue. If the array object has`register`

storage class, the behavior is implementation-defined.

6.5.3 Postfix operators

6.5.3.2 Array subscripting

Constraints

1

~~One of the expressions~~The postfix expression shall have type “pointer to complete objecttype”~~, the other expression shall have integer type, and the result has type “~~or “array oftype”.type”. The expression within square brackets, called thesubscript, shall have integer type. If the postfix expression is an array and the subscript is an integer constant expression its value shall not be negative. If in addition the array is not a variable length array, the subscript shall be less than the length of the array.

Semantics

2 A postfix expression followed by an expression in square brackets

`[ ]`

is a subscripted designation of an element of an array~~object~~.~~The definition of the subscript operator~~Let the expression be`[]`

is that`E1[E2]`

is identical to`(*((E1)+(E2)))`

. Because of the conversion rules that apply to the binary`+`

operator, if`E1`

is an array object (equivalently, a pointer to the initial element of an array object) and`E2`

is an integer,`E1[E2]`

designates the`E2`

-th element of`E1`

(counting from zero).`E[N]`

and let`E`

be pointer to, or array of,T. The expression has typeT. If`E`

has pointer type the expression is equivalent to`*((E)+(N))`

and is an lvalue. If`E`

has array type,`E[N]`

designates the`N`

-th element of`E`

, counting from zero, it is an lvalue if`E`

is an lvalue, and`N`

shall not be negative and shall be smaller than the length of the array.

3 If`E`

is a named constant of array type and`N`

is a constant expression the result is a constant expression. If furthermore`N`

is an integer constant expression andTis an integer type or an arithmetic type, the result is an integer constant expression or arithmetic constant expression respectively.

4 Successive subscript operators designate an element of a multidimensional array

~~object~~. If`E`

is ann-dimensional array (n≥ 2) with dimensionsi×j× ⋯ ×k, then`E[N]`

~~(used as other than an lvalue) is converted to a pointer to~~denotes an (n − 1)-dimensional array with dimensionsj× ⋯ ×k.~~If the unary~~It follows from this that arrays are stored in row-major order (last subscript varies fastest).`*`

operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue.

5

EXAMPLE~~The following snippet has an array object defined by the declaration:~~Consider the arrays defined by the declarations

Here

`x`

is a 3 × 5 array of objects of type`int`

; more precisely,`x`

is an array of three element objects, each of which is an array of five objects of type`int`

.~~In the expression~~The expression`x[i]`

, which is equivalent to`(*((x)+(i)))`

,`x`

is first converted to a pointer to the initial array of five objects of type`int`

. Then`i`

is adjusted according to the type of`x`

, which conceptually entails multiplying`i`

by the size of the object to which the pointer points, namely an array of five`int`

objects. The results are added and indirection is applied to yield an array of five objects of type`int`

. When used in the expression`x[i][j]`

, that array is in turn converted to a pointer to the first of the objects of type`int`

, so`x[i][j]`

yields an`int`

.`x[1]`

designates the second element of array`x`

, which is itself an array of five objects of type`int`

. Then`x[1][2]`

designates the third element thereof, which is an`int`

. It is the 7-th stored element of the two-dimensional array`x`

(counting from 0).`z`

is not a variable length array, but an array of 6`float`

, since`y[1]`

is an integer constant expression.

Replace the indicated p 3 by

3 If`E`

has array type and`N`

is an integer constant expression the corresponding element of the array is accessed without taking the address of the array. The result is a constant expression, integer constant expression or arithmetic constant expression if`E`

is a named constant of array type and, for the latter two,Tis an integer type or an arithmetic type, respectively. Note that such an array subscripting operation is valid even if the array has`register`

storage class.

6.7.2 Storage-class specifiers

Remove the last sentence in the following footnote

^{127)}The implementation can treat any`register`

declaration simply as an`auto`

declaration. However, whether or not addressable storage is used, the address of any part of an object declared with storage-class specifier`register`

cannot be computed, either explicitly (by use of the unary`&`

operator as discussed in 6.5.4.3) or implicitly (by converting an array name to a pointer as discussed in 6.3.3.1).~~Thus, the only operator that can be applied to an array declared with storage-class specifier~~`register`

is`sizeof`

and the`typeof`

operators.