2024-10-14
integration into IS ISO/IEC 9899:202y
document number | date | comment |
---|---|---|
n3311 | 202408 | Original proposal |
n3335 | 202409 | Fixes to the constratint based on the length of the array. Added the informal proposal for discarded code. Intrduced the terms top level fixed/variable length array. |
n3352 | 202409 | Adds the [] operator to compound literals. Tweeks the wording of the unary & operator. Modifies the condition on when the index can equal the length of the array. |
n3360 | 202409 | Adds the [] operator to named constants. Puts the upper bound on the lenght of the array on named and compound literal constants. |
n3380 | 202410 | Do not remove index[array] ,
yet, but deprecate it. Do not deal with named constants, yet. |
question | yes | no | abstain | result |
---|---|---|---|---|
Does WG14 want something along the lines of N3360 into C2y? | 12 | 2 | 5 | direction |
Does WG14 object to breaking index[array]
as in N3360? |
8 | 5 | 6 | no consensus |
Does WG14 want to deprecate index[array]
as in N3360 in C2y? |
10 | 1 | 8 | direction |
Does WG14 want to make a constraint violation out of negative integer constant expressions used as subscripts of an array (not a pointer) as in n3360 in C2y? | 14 | 1 | 5 | direction |
CC BY, see https://creativecommons.org/licenses/by/4.0
Traditionally, the definition of array subscripting goes through
conversion of the array to a pointer. Thus, E[m]
is defined as (*((E)+(m)))
,
where E
is converted to a pointer to
its first element. On first sight it may seem there is no semantic
difference between this and saying that “E[m]
denotes the
m
th element of the array”. But indeed there is;
the paragraph on conversion of an array to a pointer says
Except when […], an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has
register
storage class, the behavior is undefined.
Therefore, subscripting an array precludes its declaration with register
. This seems
an artificial restriction, existing only because of the way E[m]
is defined. Implementations such as gcc have lifted this restriction
since decades.
It was also noted that the expression *(E+m)
produces an lvalue in instances where a non-lvalue would be expected for
E[m]
,
as in
struct {const int i; int arr[1];} func();
func().i; // not lvalue
func.arr[0]; // equivalent to the following
*(func().arr+0); // non-const, immutable lvalue with temporary lifetime
This paper changes this, and so the provision of the standard that introduces temporary object for values with an array member, will only trigger in the rare case that the array member is used in places where a pointer value is expected. With the proposed changes we have
struct {const int i; int arr[1];} func();
func().i; // not lvalue, no temporary object instantiated
func.arr[0]; // not lvalue, no temporary object instantiated
*(func().arr+0); // non-const, immutable lvalue with temporary lifetime
A similar restriction is present for the use as integer constant expressions (ICE). Consider
This code is not valid because x[1]
is not an integer constant expression. Since the whole object x
is a constant expression, it seems each of
its members should be usable as ICE. Previous versions of this paper
tried to change this, but now we delay this to a follow-up paper.
Other problems arose when studying the extension of the subscripting operation to allow range selections.
In an expression E[M]
,
where E
is an array, we propose M
to be ≥ 0 and for it to be a constraint if
M
is an integer constant expression.
This is not imposed if E
has pointer
type, neither as constraint nor as UB. In particular, for a pointer
p
the common idiom p[-1]
remains valid. If E
is an array this
was already invalid, since, from the definitions of arrays in C, the
element E[-1]
does not exist, even if E
were to
decay to a pointer. Thus:
int A[3][3];
A[1][-1]; // A[1] is an array of three elements; say, B. B[-1] does not exist.
// In *(B-1), the pointer B (once B decays to a pointer) points to the
// the first element of an array of 3 elements. B-1 is not a valid pointer.
// Hence, *(A[1]-1), hence A[1][-1], has UB as per the current standard.
An implementation may very well define this behavior and allow it.
More likely, an implementation may just compute the adress A[1]-1
and have this code work without defining the behavior (it just works).
Programmers relying on that can rewrite A[1][-1]
(which henceforth will raise a diagnostic) to *(A[1]-1)
in order to avoid the diagnostic.
Nevertheless it is unclear to us if that UB is not used by optimizers to make assumptions about subscripts, so using this UB on arrays is inherently dangerous. Therefore we propose to promote this from UB to a constraint in situations that are easily detectable at translation time, namely when the subscript is an ICE.
The following code is valid today and would require a diagnostic if the constraint is introduced also for an out-of-bounds access beyond the array length:
This macro is superseded by the constraint, but is still useful for
VLA. A user of this macro would have to replace all uses for fixed
length arrays by a direct access or change the macro so that the
subscript is not an ICE even if x
is.
Here the out-of-bound index is valid because the expression is not evaluated. However, writing “if the expression is evaluated” in the constraint is not adequate because being evaluated is a runtime property. There are contexts where the expression is known not to be evaluated already at translation time. It is for these contexts than an execption can be made in the constraint. However, this concept is not developed in the standard, and its introduction falls outside the scope of this proposal.
New text is underlined green, removed text is
stroke-out red. Possible reorganization of the paragraphs is
left to the discretion of the editors.
6.3.2.1 Lvalues, arrays, and function designators
3 Except when it is the operand of the
sizeof
operator, ortypeof
operators, or the unary&
operator, or one of the two expressions of an array subscripting operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object hasregister
storage class, the behavior is implementation-defined.
6.5.3 Postfix operators
6.5.3.2 Array subscripting
Description
1 A postfix expression followed by an expression in square brackets
[ ]
is a subscripted designation of an element of an array. The use of this operator with the postfix expression of integer type is an obsolescent feature.
Constraints
2 One of the expressions shall have type “pointer to complete object type” or “array of type”, the other expression, called the subscript, shall have integer type, and the result has type “type”. If one of the two expressions has array type and the subscript is an integer constant expression, the value of the latter shall not be negative.
Semantics
3
A postfix expression followed by an expression in square brackets[ ]
is a subscripted designation of an element of an arrayobject.The definition of the subscript operatorIf either expression has pointer type the expression E1[E2] is equivalent to[]
is thatE1[E2]
is identical to(*((E1)+(E2)))
. Because of the conversion rules that apply to the binary+
operator, ifE1
is an array object (equivalently, a pointer to the initial element of an array object) andE2
is an integer,E1[E2]
designates theE2
-th element ofE1
(counting from zero).*((E1)+(E2))
and is an lvalue. Otherwise, letE
be the expression of array type and letm
be the value of the subscript; the array subscript expression designates them
-th element of the array designated byE
, counting from zero, it is an lvalue ifE
is an lvalue, andm
shall not be negative and shall be less than the length of the array or equal to it; it can only equal the length of the array if the[]
operator is followed by zero or more[]
operators with subscripts equal to zero and the resulting postfix expression is the operand of the unary&
operator or is converted to an expression with pointer type as described in 6.3.2.1.
4 Successive subscript operators designate an element of a multidimensional array
object. IfE
is an n-dimensional array (n ≥ 2) with dimensions i × j × ⋯ × k, thenE[N]
(used as other than an lvalue) is converted to a pointer todenotes an (n − 1)-dimensional array with dimensions j × ⋯ × k.If the unaryIt follows from this that arrays are stored in row-major order (last subscript varies fastest).*
operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue.
5 EXAMPLE The following snippet has an array object defined by the declaration:
Here
x
is a 3 × 5 array of objects of typeint
; more precisely,x
is an array of three element objects, each of which is an array of five objects of typeint
.In the expressionThe expressionx[i]
, which is equivalent to(*((x)+(i)))
,x
is first converted to a pointer to the initial array of five objects of typeint
. Theni
is adjusted according to the type ofx
, which conceptually entails multiplyingi
by the size of the object to which the pointer points, namely an array of fiveint
objects. The results are added and indirection is applied to yield an array of five objects of typeint
. When used in the expressionx[i][j]
, that array is in turn converted to a pointer to the first of the objects of typeint
, sox[i][j]
yields anint
.x[1]
designates the second element of arrayx
, which is itself an array of five objects of typeint
. Thenx[1][2]
designates the third element thereof, which is anint
. It is the 7-th stored element of the two-dimensional arrayx
(counting from 0).
6.5.4 Unary operators
6.5.4.3 Address and indirection operators
neither the
&
operatornor the unary *nor the access to the value that is implied by the[]
is evaluated
6.7.2 Storage-class specifiers
Remove the last sentence in the following footnote
127) The implementation can treat any
register
declaration simply as anauto
declaration. However, whether or not addressable storage is used, the address of any part of an object declared with storage-class specifierregister
cannot be computed, either explicitly (by use of the unary&
operator as discussed in 6.5.4.3) or implicitly (by converting an array name to a pointer as discussed in 6.3.3.1).Thus, the only operator that can be applied to an array declared with storage-class specifierregister
issizeof
and thetypeof
operators.
6.11 Future language directions
6.11.4a Postfix operators
The use of the array subscripting operator with a postfix expression of integer type followed by an expression of pointer or array type enclosed in square brackets is an obsolescent feature.
We’d like to thank Martin Uecker and Joseph Myers for their suggestions.