2023-12-13
document number | date | comment |
---|---|---|
n3188 | 202312 | this paper, original proposal |
CC BY, see https://creativecommons.org/licenses/by/4.0
This paper builds on n3187 where we clarify the evaluation model of VM Types. In essence we reduce the evaluation that is necessary to obtain array length or size information to a simple lvalue conversion of a hidden state variable that holds the array length. Because that state variable can be assumed to be immutable (it doesn’t change during the lifetime of any instance of an object of VM type), such an evaluation is completely harmless and can be effected by the compiler whenever the corresponding object or type definition has been met.
The goal of this paper here is to allow user code to name that otherwise hidden state variable and use it directly to declare VM Types. By that we introduce a simple mechanism to track array length
and all of that without changing any function call ABI. The only new feature that has to be taught compilers is how to compute size information of arrays when the array length is not in a hidden state but at a specific location, that is either a specific variable (with automatic or static duration and with or without linkage) or a structure member at a specific offset from the array.
Note though, that even the existing model already has some subtlety that comes from the fact that for now these state variables are always assumed to be automatic. Although VM type declarators are limited to block scope, they have not necessarily automatic storage duration, pointers to VLA may be declared with static
. This opens the possibility to declare static objects that have a different type each time the definition is met. Even worse, if such an object is found in a recursive function, the same object can simultaneously have several types. Valid pointer values that point to the static VM but use these different types may be propagated along recursive calls and so the type-based aliasing rule might simply break down. So some care is in order to not make this situation even worse.
In this paper we so propose to add an annotation to bind an array length (constexpr
or const
) to a const
-qualified integer variable that holds the length. By allowing the identifier of such a variable to act as a forward declaration, this generalizes the current concept of incomplete arrays to VLA and allows to improve cross TU declarations of array objects.
To illustrate the simplicity of the proposed concept suppose that for C23 we have for example
in a header. Users of this header need implicit knowledge to compute the size of array
(which has an incomplete type) as sizeof(double[arrayLength])
, but nothing in the code indicates this relationship and no array length checking compiler can track this size and diagnose out-of-bounds accesses.
The defining translation unit in C23 could then look like this
#ifdef TOTO
# define MYDAILYDOSE 20
#else
# define MYDAILYDOSE 60
#endif
size_t const arrayLength = MYDAILYDOSE;
double array[MYDAILYDOSE] = {};
Which, again, has to ensure the implicit relationship between the value of arrayLength
and the effective length with which array
is defined. So this is a fragile relation that can only be ensured by documentation, and not by language constructs.
With our proposal this code becomes much simpler. In the header
Here, if arrayLength
has no other declaration it is implicitly declared to have type size_t const
and external linkage. Any user of this header can (in block scope) simply query for the size information by using sizeof array
much as for any other VLA. Note that thereby this sizeof
expression is not an integer constant expression, and indeed it can’t be, because the necessary information (the value of arrayLength
) is not available at translation time.
The defining TU now becomes
#ifdef TOTO
size_t const arrayLength = 20;
#else
size_t const arrayLength = 60;
#endif
double array[.arrayLength] = {};
To ensure that the array can always be constructed during translation (and not only during linking) we impose that arrayLength
must be located in the same TU as array
and that thus at the end of the translation of the TU, all information for the code generation of the array is present. In particular, the array is considered a VLA, and so only default initialization as above is allowed, we don’t have to deal with specific initialization at this point.
The proposal extends this simple idea to other usages of arrays, namely to
This new feature is intended to provide an easy to use tool that helps our users to annotate array declarations with array length information, that put simple sizeof
expressions at their disposal to access this information, and that also can be used by analyzing compilers to track out-of-bounds accesses to a much larger extend than is currently possible.
With this approach, it is also possible to improve a bit on the side of those C library interfaces that traditionally have integer parameters that encode array lengths but that come after these in the parameter list.
The syntax that we propose for an array length specifier is to have
in places where otherwise could be an expression in an array length declarator. This is much similar to member designators in initializers and should be easy to grasp for C programmers, even if they are not yet familiar with that notation. This notation has the advantage to disambiguate clearly from expression that would be evaluated. But any other notation that ensures this distinction would be fine.
All of this is meant that it does not introduce ABI changes. Code that doesn’t use these features can continue to interact with code that uses them. Even the new feature for static arrays that have lengths that are not integer constant expressions do not need much more overhead than current static arrays: the only novelty here is that they may take the length parameter from a static const
variable definition. (This might be novel to C, but is commonly used in C++).
Note also, that the imposed constraints for flexible array members are relatively rigid. In particular, to be used in the declaration of the flexible array member an integer member must be const
-qualified. Otherwise a reliable evaluation with sizeof
(and reliable static analysis) would not be possible and the undefined behavior would open a new attack surface. With the usual tools of the language it would even be difficult to create (or maintain) such structure objects: the const
-qualified integer member can never be written to. But this aspect of creating and reallocating structure objects with flexible array members and const
-qualified members is more general than that and independent of the proposal at hand; a solution that always views reallocated structures as new objects (similar as done by realloc
) is proposed in n3186.
The syntax should be added to array-declarator in 6.7.6.
direct-declarator [ type-qualifier-listopt . identifier ]
and to array-abstract-declarator in 6.7.7
direct-abstract-declaratoropt [ type-qualifier-listopt . identifier ]
Note that the syntax with a dot ensures that this construct cannot be confounded with expression syntax, and that this thereby does not constitute an evaluation of the given identifier. Other syntax might be possible for this feature, the important thing is that a distinction between the use of the identifier as a reference to the name of an object and to the use as an lvalue expression should always be easily detectable.
Modify 6.7.6.2 p1 in the “Constraints”, reorganization of the text into new paragraphs is left to the editors.
1 In addition to optional type qualifiers and the keyword
static
, the[
and]
may delimit an expression, a.
followed by an identifier (called a length designator), or*
. If there is neither an assignment expression nor a length designator, the array length specifier shall be the outermost length specifier in the declaration; such an array type without expression or length designator is incomplete and shall not be used as the element type of another array type.If they delimit an expression (which specifies the size of an array), the expression shall have an integer type.If the expression is a constant expression, it shall have a value greater than zero. If it is an expression and that expression is not an integer constant expression, the declaration shall not appear in file scope and not be a member declaration; if the declaration declares a variable length array it shall not have static or thread storage duration. The element type shall not be an incomplete type or function type. The optional type qualifiers and the keywordstatic
shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.
2
LetIf an identifier is declared as having a variably modified type, it shall be an ordinary identifier (as defined in 6.2.3), have no linkage, and have either block scope or function prototype scope. If an identifier is declared to be an object with static or thread storage duration, it shall not have a variable length array type.ID
be an identifier that is used as a length designator in the declarator or abstract declarator of a typeXT
(including implicitly in the declaration of an object or the definition of type). IfID
has been declared prior toXT
, it shall have integer type. IfID
is not declared when it is used as a length designator but then later declared in the same scope or structure, the type shall besize_t
. If there is a visible declaration ofID
, it shall have aconst
-qualified type unless the declaration appears in function prototype scope. In any redeclaration ofXT
, the corresponding declarator or abstract declarator shall use a*
(if admited) or the same identifierID
;ID
shall refer to the same parameter in the parameter list, to the same structure member or to the same instance of the same object if it is another kind of integer object, respectively.
The following constraints apply to the lexical positions ofXT
and a declaration ofID
.
XT
shall not appear in the declaration of a union member.- If
XT
appears in the declaration of a structure member,ID
shall be a member of the declared structure type; if additionallyXT
appears in the declaration of an array member that member shall be a flexible array member.- If
XT
appears in the declaration of a parameter,ID
shall be the name of a parameter in the same parameter list.- If
XT
appears in file scope,ID
shall also be declared in file scope.- Otherwise, if
XT
is associated to a blockB
and the scope ofID
is associated to a block that containsB
, the definition ofID
shall be visible forXT
.If a declaration declares an identifier
A
and uses a length designatorID
: ifA
has linkage and there is any declaration ofID
it shall also have linkage; additionally, if the linkage ofA
is external the linkage ofID
shall also be external. If the declaration is the definition of an array with static storage duration,ID
shall also have static storage duration and be defined in the same translation unit asA
.
For the “Semantics” section
4 If the
sizearray length is not present, the array type is an incomplete type. Otherwise, if thesizelength is*
instead of being an expression, the array type is a variable length array type of unspecifiedsizelength, which can only be used in declarations or type names with function prototype scope172) and in the declarator of a function return type; such arrays are nonetheless complete types. If thesizelength is an integer constant expression or if the length designator isconstexpr
and if the element type has a known constant size, the array type is a constant length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations need not support; see 6.10.9.3.)
5 If the
sizelength is an expression that is not an integer constant expression or it is a length designator that is not previously declared asconstexpr
: if it appears in a declaration at function prototype scope or in the return type of a function declaration that is not a definition, it is treated as if it were replaced by*
. Otherwise, each timeitan array length expression is evaluated it shall have a value greater than zero and if it is a length designator.ID
,ID
shall have value greater than or equal to zero. If an object or type with a length specifier with value zero is accessed the behavior is undefined. If a length designatorID
is not declared in the current scope it behaves as if a declaration within the same scope asA
as followsFNT1) is visible and the declaration ofA
is nonetheless complete.FNT2)
extern size_t const ID;
5¹The
sizelength of each instance of a variable length array type does not change during its lifetime.Where a size expression is part of the operand of a typeof orWhere asizeof
operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated.sizelength expression is part of the operand of analignof
operator, that expression is not evaluated; if it is a length designator, that designator is ignored.
FNT1) All constraints for a subsequent declaration or definition of such an object
ID
apply.
FNT2) That is, in that case.ID
acts as the forward declaration of an object of typesize_t const
. To form a valid programID
has an explicit definition with an non-zero initializer in this or another translation unit. Although the size of the array type may not be known at translation time, the array type is still complete andsizeof(A)
is valid (in block scope) and possibly evaluatesID
and is thus not an integer constant expression. If the definition ofID
is located in a different translation unit and that translation unit has no compatible definition ofA
with length designatorID
, the behavior is undefined.
Amend EXAMPLE 2 by
The first declares
x
to be a pointer toint
; the second declaresy
to be an array ofint
of unknown size (an incomplete type), the storage for which is defined elsewhere. In contrast to that,z
has a complete variable length array type with a length designator as if declared as
extern size_t const n;
The translation unit that definesz
then also contains a file-scope definition ofn
that initializesn
to a positive value.
Here, the order of the two definitions may also be inverted. Even thoughn
is constant (and it is implementation-defined if the expressionn
is an integer constant expression)z
is a variable length array. An explicit element-wise initialization is a constraint violation. In block scope, the arrayz
has a less-constrained use thany
.
Replace Example 4
EXAMPLE 4 Objects declared with linkage that have a VM type declare the length designator explicitly and that length designator has static storage duration and also has linkage; if the linkage is external the linkage of the length designator is also external unless the length designator is also defined by the definition in which case it is fixed and has internal linkage. Variable length arrays (VLA) are more restricted because the definition needs to compute the total size when the lifetime of the instance of the VLA begins and eventually during the lifetime when requested with a sizeof
operator. Therefore the lifetime of any length designator (explicit or implicit) does not have a lifetime that is shorter than the lifetime of the VLA. Finally, all identifiers declared with a VM type that are structure members, have a length designator that is another member of the same structure; if additionally that member with length designator is an array, it is a flexible array member that possibly extends beyond the size of the structure. In the following, “invalid” indicates a constraint violation.
int B[100]; // valid, file scope but not VLA
extern int const n;
static int const j = 8;
static int H[.j]; // valid, internal linkage
int A1[n]; // invalid, not an integer constant expression
extern int (*p2)[n]; // invalid, not an integer constant expression
void fvla(int m, int C[m][m]); // valid, VLA with prototype scope
void fvlb(int[m][m], size_t m); // valid, VLA with prototype scope, misleading
void fvla(int m, int C[m][m]) // valid, C adjusted to auto pointer to VLA
{
if (m <= 0) unreachable(); // undefined behavior if m is not positive
typedef int T[m][m]; // valid, block scope typedef of VLA
struct tag {
int (*y)[n]; // invalid, not an integer constant expression
int z[n]; // invalid, not an integer constant expression
};
int D[m] = { }; // valid, auto VLA, initialized per default
static int E[m]; // invalid, not an integer constant expression
extern int F[m]; // invalid, not an integer constant expression
int (*s)[m]; // valid, auto pointer to VLA
extern int (*r)[m]; // invalid, not an integer constant expression
if (m > 100) unreachable();// undefined behavior, if m is too big
static int (*q)[m] = &B; // valid if m≤100, q has no linkage
}
int A2[.n]; // invalid, no definition of n in the TU
extern int A3[.n]; // valid, declaration of an array with external linkage
extern int (*p3)[.n]; // valid, pointer to array
void fvla(int m, int[.m][.m]); // valid, no need for const
void fvlb(int[.m][.m], size_t m); // valid, no need for const
void fvlb(int C[.m][.m], size_t const m) // valid, C adjusted to auto pointer to VLA
{
if (m == 0) unreachable(); // undefined behavior if m is not positive
typedef int T[.m][.m]; // valid, m has no linkage and automatic duration
struct tag {
int (*y)[.k]; // valid, y not an array
size_t const k;
int z[.n]; // invalid, n not a member
};
int A4[.n]; // valid, auto VLA object but static array length
int D[.m] = { }; // valid, auto VLA with auto array length, initialized per default
static int E[.m]; // invalid, static VLA definition with auto length
extern int F[.m]; // invalid, static VLA declaration, auto VLA type
extern int G[.j]; // invalid, G external and j internal linkage
extern int H[.j]; // valid, H and j have internal linkage
static size_t const i = 7;
extern int K[.i]; // invalid, i has no linkage
int (*s)[.m]; // valid, auto pointer to VLA
extern int (*r)[.m]; // invalid, static object, auto array length
extern int (*t)[.n]; // valid, local scope, static array length with external linkage
if (m > 100) unreachable();// undefined behavior, if m is too big
static int (*q)[.m] = &B; // valid if m≤100, q and m have no linkage
if (n > 100) unreachable();// undefined behavior, if n is too big
static int (*p)[.n] = &B; // valid if n≤100, q has no linkage, n extern
}
Since we also want to be able to treat flexible array members of structures as VLA, we have to enable these as members for them. Change 6.7.2.1:
11 A member of a
structure orunion may have any complete object type other than a variably modified type.145) In addition, a member may be declared to consist of a specified number of bits (including a sign bit, if any). Such a member is called a bit-field;146) its width is preceded by a colon.
Amend p20, flexible array members:
20 As a special case, the last member of a structure with more than one named member may have an incomplete array type or a variable length array type with a length specifier; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.
However,When a.
(or->
) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member:
- If the flexible array member has an incomplete array type, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed.
- If the flexible array member has a length designator
ID
for an integer memberID
, for each instance of the structure type the flexible array member is a variable length array with a length that is equal to the value of theID
member of that instance. The size of the underlying storage shall be at least the size of the structure and such that an array of the corresponding size fits at the offset of the flexible array member.
The offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements (because no element fits into the allocated space for the instance or because
ID
happens to be zero), it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.
For VLA parameters that are adjusted to pointers, a length specifier plays a similar role to a length expression that has a static
specification. Namely it establishes a contract between the caller of the function and the function implementation about the minimal length of the array that is represented by the adjusted pointer argument.
6 A declaration of a parameter as “array of type” shall be adjusted to “qualified pointer to type”, where the type qualifiers (if any) are those specified within the
[
and]
of the array type derivation. If the length is given by a specifierID
or the keywordstatic
also appears within the[
and]
of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the value ofID
for the call or the value of the size expression.
A lot of C library interfaces receive length information through parameters that could profit from the new length specifier. It should be discussed on a per interface base which of these actually already have formulated restrictions and for which of these that would be a semantic change.
We propose to make these annotations always via the length specifier. Doing it for the few cases where it would be possible to use a [static]
annotation would have issues for backwards compatibility and for compatibility with C++.
The length parameters here don’t need to be const
-qualified, because these are not the definitions but only declarations. Implementations would have to change their function definitions with const
to be able to track the length internally for the purpose of the function in question itself.
Note also:
rsize_t
only work here because, per definition, rsize_t
is the same type as size_t
.[static n]
construct, a call for a function with a [.n]
parameter where n
happens to be zero is defined, as long as the rewritten pointer parameter is not accessed. In consequence, functions that allow zero sizes to indicate special circumstances only have to test n
for non-zero before any access to the rewritten array parameter. Thus, the proposed prototype for snprintf
below has the same semantics as for C23.int snprintf(char s[restrict .n], size_t n, const char * restrict format, ...);
int vsnprintf(char s[restrict .n], size_t n, const char * restrict format, va_list arg);
int snprintf_s(char s[restrict .n], rsize_t n, const char * restrict format, ...);
int sprintf_s(char s[restrict .n], rsize_t n, const char * restrict format, ...);
int vsnprintf_s(char s[restrict .n], rsize_t n, const char *restrict format, va_list arg);
int vsprintf_s(char s[restrict .n], rsize_t n, const char * restrict format, va_list arg);
char *gets_s(char s[.n], rsize_t n);
int strfromd(char s[restrict .n], size_t n, const char *restrict format, double fp);
int strfromf(char s[restrict .n], size_t n, const char *restrict format, float fp);
int strfroml(char s[restrict .n], size_t n, const char *restrict format, long double fp);
int mblen(const char s[.n], size_t n);
int mbtowc(wchar_t * restrict pwc, const char s[restrict .n], size_t n);
size_t mbstowcs(wchar_t * restrict pwcs, const char s[restrict .n], size_t n);
size_t wcstombs(char * restrict s, const wchar_t pwcs[restrict .n], size_t n);
int strfromd32(char s[restrict .n], size_t n, const char*restrict format, _Decimal32 fp);
int strfromd64(char s[restrict .n], size_t n, const char*restrict format, _Decimal64 fp);
int strfromd128(char s[restrict .n], size_t n, const char*restrict format, _Decimal128 fp);
char *strncpy(char s1[restrict .n], const char s2[restrict .n], size_t n);
char *strncat(char s1[restrict .n], const char s2[restrict .n], size_t n);
int strncmp(const char s1[.n], const char s2[.n], size_t n);
size_t strxfrm(char s1[restrict .n], const char s2[restrict .n], size_t n);
errno_t strncpy_s(char pwcs[restrict .s1max], rsize_t s1max, const char s2[restrict .n], rsize_t n);
errno_t strncat_s(char pwcs[restrict .s1max], rsize_t s1max, const char s2[restrict .n], rsize_t n);
size_t mbrtoc8(char8_t * restrict pc8, const char s[restrict .n], size_t n, mbstate_t * restrict ps);
size_t mbrtoc16(char16_t * restrict pc16, const char s[restrict .n], size_t n, mbstate_t * restrict ps);
size_t mbrtoc32(char32_t * restrict pc32, const char s[restrict .n], size_t n, mbstate_t * restrict ps);
int swprintf(wchar_t s[restrict .n], size_t n, const wchar_t * restrict format, ...);
int vswprintf(wchar_t s[restrict .n], size_t n, const wchar_t * restrict format, va_list arg);
wchar_t *wcsncpy(wchar_t* restrict s1, const wchar_t s2[restrict .n], size_t n);
wchar_t *wmemcpy(wchar_t* restrict s1, const wchar_t s2[restrict .n], size_t n);
wchar_t *wmemmove(wchar_t *s1, const wchar_t s2[.n], size_t n);
wchar_t *wcsncat(wchar_t* restrict s1, const wchar_t s2[restrict .n], size_t n);
int wcsncmp(const wchar_t *s1, const wchar_t s2[.n], size_t n);
size_t wcsxfrm(wchar_t * restrict s1, const wchar_t s2[restrict .n], size_t n);
int wmemcmp(const wchar_t *s1, const wchar_t s2[.n], size_t n);
QWchar_t *wmemchr(QWchar_t s[.n], wchar_t c, size_t n);
wchar_t *wmemset(wchar_t s[.n], wchar_t c, size_t n);
size_t mbrlen(const char s[restrict .n], size_t n, mbstate_t * restrict ps);
size_t mbrtowc(wchar_t * restrict pwc, const char s[restrict .n], size_t n, mbstate_t * restrict ps);
int snwprintf_s(wchar_t s[restrict .n], rsize_t n, const wchar_t * restrict format, ...);
int swprintf_s(wchar_t s[restrict .n], rsize_t n, const wchar_t * restrict format, ...);
int vsnwprintf_s(wchar_t s[restrict .n], rsize_t n, const wchar_t *restrict format, va_list arg);
int vswprintf_s(wchar_t s[restrict .n], rsize_t n, const wchar_t *restrict format, va_list arg);
errno_t wcsncpy_s(wchar_t pwcs[restrict .s1max], rsize_t s1max, const wchar_t s2[restrict .n], rsize_t n);
errno_t wmemcpy_s(wchar_t pwcs[restrict .s1max], rsize_t s1max, const wchar_t s2[restrict .n], rsize_t n);
errno_t wcsncat_s(wchar_t pwcs[restrict .s1max], rsize_t s1max, const wchar_t s2[restrict .n], rsize_t n);