N3441: _Generic and VLA Realignment and Improvement

1. Changelog

1.1. Revision 1 - December 23^rd, 2024

Rework due to direction from September meeting.
Expansion on incomplete types and the powers granted therein.
- _Generic uses the same syntax.
- Cannot complete variable length arrays (what does that even mean for initialization??).
- _BitInt()/signed _BitInt()/unsigned _BitInt() becomes the natural syntax for an incomplete _BitInt type.
_Generic uses a new way to match types for generic associations, called the "associated type".
- Incomplete _BitInt matches all bit-precise integer types (of the same signedness).
- Arrays have more powerful associations than they do under "compatible" so they can provide significant, safe differentiation without any undefined behavior.
Parameters cannot have any of these powers. Unfortunately.
- Pre-existing function calls using [] and [N] is massive; cannot possibly provide better matching/meaning under these circumstances.
- Undefined behavior must remain to allow for size mismatches during run-time. Lying about the size of storage is, unfortunately, a very common C pass time.
- Blame K&R and, subsequently, everyone else after them for not fixing how arrays are handled as parameters or returns 🤷‍♀️.

1.2. Revision 0 - September 9^th, 2024

Initial release. ✨

2. Introduction and Motivation

There are several strange hiccups and problems with _Generic as it concerns constant sized arrays, variable length arrays, _BitInt, and other (potentially new) feature sets. Aaron Ballman’s [N3260] provided a way to do direct type matching, which eliminated some of these concerns when using a type as the controlling indicator for generic selection. But, this was not enough to stop the few situations where implementation quirks were exposed and the inadequacies of type compatibility for a compile-time, zero run-time cost feature that _Generic was meant to be.

2.1. Unusual Array Behavior

Consider the following:

int main () {
  int arr[10] = {};
  int result = _Generic(typeof(arr),
    int[10]: 0,
    int[11]: 1,
    int[20]: 2,
    default: 3
  );
  return result;
}

This works just fine: constant sized arrays are considered compatible only with arrays of the same type and constant size. The above programs compiles, runs, returns 0 dependably, and exits. Consider the same program, however, with a variable length array for the controlling type of the generic selection expression:

int main () {
  int n = 20;
  int vla[n] = {};
  int result = _Generic(typeof(vla),
    int[10]: 0,
    int[11]: 1,
    int[20]: 2,
    default: 3
  );
  return result;
}

This program simply does not compile. Every non-default branch of this generic selection is considered a match, because every variable length array is compatible with every other kind of array insofar as the compatibility rules are concerned. This provokes a constraint violation, in that only one branch of a generic selection expression may match the controlling type (or none, in which case there must be a default branch).

[N3290] exacerbates this problem by attempting to not only leave the compatibility rules around this matter unresolved, but introduces adding variable length array types as a plausible branch for generic selection by stripping the constraint out:

int main () {
  int n = 20;
  int vla[n] = {};
  int result = _Generic(typeof(vla),
    int[10]: 0,
    int[11]: 1,
    int[ n] : 2, // VLA matches?
    default: 3
  );
  return result;
}

Unfortunately, this too results in the same problem: all of these branches are considered compatible with one another under the changes and direction that [N3290] presents. Even if one went back to matching on constant sized arrays for the controlling type, this code would still not compile because the VLA branch is considered compatible with all of the other type branches in the list: the compiler would reject the code still as no two generic selection branches may contain a compatible type, either:

int main () {
  int n = 20;
  int arr[20] = {};
  int result = _Generic(typeof(arr),
    int[10]: 0,
    int[11]: 1,
    int[ n] : 2, // compiler error: non-distinct branch from
                 // every other generic selection branch
    default: 3
  );
  return result;
}

This continues to deteriorate when using int[] to match "any-sized integer array" and the proposed int[*]; both of them are compatible with one another and they both match on arrays that are either variable length arrays or constant sized arrays. Nominally, this might not be a problem, except there is further issue: the compatibility rules themselves have Bad Behavior on them even if you strip out all of the compatible match branches and only have one:

int main () {
  int n = 10;
  int arr[n] = {}; // variable length array
  int result = _Generic(typeof(arr),
    int[11] : 0, // this matches for some reason???
    default: 1
  );
  return result;
}

This program returns 0, which makes no sense. The sizes do not match, but because we defined this in terms of compatibility (and all constant sized and variable length arrays are compatible with one another) we have introduced undefined behavior here. Worse, this gives the impression that the array has the size m when it clearly does not. This is easily spotted in these simple toy programs, but is far less usable when applied to larger programs with much more complex control flow and no ahead-of-time knowable constant values.

[N3290] makes this situation worse by allowing variable length arrays to be put inside of _Generic as well, leading to a situation where variable length arrays can easily match array types that are not the same.

int main () {
  int n = 10;
  int arr[20] = {};
  int result = _Generic(typeof(arr),
    int[n] : 0, // fixed-size arrays now match against any variable length array
    default: 1
  );
  return result;
}

All in all, this user experience can be misleading and sets programmers up for failure from the holes in the type compatibility. [N3290] provides multi-dimensional matching but does nothing to actually improve the situation with regards to compatibility, and standardizes adding variable length arrays to more places without consideration for either the original feature motivation (a compile-time selection criteria that carries no run-time cost) or the apparent failure modes.

It is costly to C as a whole to add features "just because the syntax should work" when those features come with undefined behavior, AND has questionable behaviors to start with. Critically, these features can be added to C with just a little bit more care that would prevent or outright eliminate the vast majority of these clear code violations. C may be a language where you can do "whatever needs to be done", but there is no reason to greenlight designs which do not improve the situation and expand the domain of possible misleading code.

We should focus on improving type compatibility both in-general, for initialization, and for _Generic.

2.2. Implementation Quirks from Complex Expressions and `_Generic(type-name, ...)`

This program produces the same constraint violation on all platforms:

typedef struct meow { const int i; } meow;

static_assert(_Generic((meow){0}.i, const int: 1, int: 0), "what in the world?!");

The following snippet produces different programs in GCC versus Clang:

typedef struct meow { const int i; } meow;

static_assert(_Generic(typeof((meow){0}.i), const int: 1, int: 0), "what in the world?!");

This will trigger a constraint violation (an error) on some platforms, while letting translation (typical compilation) proceed just fine on others. But, it’s hard to know that: first we have have to check "am I using the right kind of matching syntax?". Then, we have to check "is it returning the answer I expect for this?". While there’s a "what actually is the type of this?" question for GCC, Clang, and other vendors under the C standard there is an interesting background issue shown by this: type-based matching exposes from implementations. While the code in this case produces a (very loud) compilation error, there is other code with _Generic that will simply silently choose the wrong function designator, or produce the wrong result.

This also affects arrays, for which the type-based matching has stronger and better powers than the expression-based matching. For example, this code produces a constraint violation:

int main () {
  int arr[20] = {};
  int result = _Generic(arr,
    int[]: 0,
    default: 1
  );
  return result;
}

But this code does not:

int main () {
  int arr[20] = {};
  int result = _Generic(typeof(arr),
    int[]: 0,
    default: 1
  );
  return result;
}

Of course, as [N3290] notes, this does not extend to multiple dimensions (which is the core point of [N3290] before it tackles the problem of VLAs):

int main () {
  int multi_arr[20][10] = {};
  int result = _Generic(typeof(multi_arr),
    int[][]: 0, // array of incomplete type, immediate error
    default: 1
  );
  return result;
}

This incongruence -- especially in the face of arrays -- is not a complete design. Aaron Ballman’s [N3260] being accepted into C2y drastically and materially improved the situation around this area by granting greater control and power, but more tweaks are needed to make the behavior consistent, usable, and fully powerful for both ends of the behavior.

The answer to this question changing based on which form of _Generic matching is deployed has turned the feature incongruent; the underlying lack of synchronization between implementations is an important issue but not one we are tackling in this paper. The simple contention is that this is something that exposed how much the feature is in need of harmonization, alongside all of the other observed issues.

Therefore, we propose a general overhaul and a new phrase-of-power that we are going to term translation compatibility, that would be applied to both type-style generic selection and expression-style generic selection. The specification would aim to both enhance and clarify all of the cases, while enabling variable length array matching and multidimensional array matching without adding new ways to invoke undefined behavior.

3. Design

There are, effectively, two (or three) distinct root problems identified in the source code examples above:

incomplete types for array dimensions, and the completion rules around them, are problematic in a wide variety of cases where it would make sense;
_Generic’s rules for matching on arrays using type compatibility is insufficient for a translation-time (compile-time)-driven feature, which is what necessitated Aaron Ballman’s _Generic changes previously;
and, the array dimension ABI/API rules for parameters do not match expectations.

The larger, more holistic solution here is, then, two-fold:

rework the definition of incomplete array types in C into an advanced terminology to allow for multiple dimensions to be completed at once for both VLAs and constant sized arrays;
and, provide a way to match on constant sized arrays (T[]), incomplete arrays (T[N]), and "variably incomplete arrays" (T[*]) in both _Generic and during initialization.

These two steps bring us to a place that allows both T[20][] and T[][20] as array types wherein both can be completed. It also provides a framework in which we can provide a similar syntax for other types that need parameterization over a size, e.g. _BitInt(N) and -- perhaps in a future -- _Vec(T, N) or other size-variable types. That is, both _BitInt() and _Vec(T) can be considered incomplete types types similar to T[] and T[][]: the syntax for this would be _BitInt() and _Vec(float) (with no integer constant expression within).

Unfortunately, absolutely nothing can be done for parameters due to the ABI constraints (i.e., that every parameter is a pointer). This means that the changes we make here cannot be applied fully globally; the only effect of the wording on parameters will be to allow T[][] as a parameter type (previously it was not allowed).

3.1. Why `_BitInt`??

Allowing us to be generic over all _BitInt of signed or unsigned type means we can write a library version to find the number of bits in a bit-precise integer type:

#define BITINT_WIDTH_CHARBIT(S) _Generic(typeof(A),        \
  _BitInt(S * (CHAR_BIT) + 0): S * (CHAR_BIT) + 0,         \
  _BitInt(S * (CHAR_BIT) + 1): S * (CHAR_BIT) + 1,         \
  _BitInt(S * (CHAR_BIT) + 2): S * (CHAR_BIT) + 2,         \
  _BitInt(S * (CHAR_BIT) + 3): S * (CHAR_BIT) + 3,         \
  _BitInt(S * (CHAR_BIT) + 4): S * (CHAR_BIT) + 4,         \
  _BitInt(S * (CHAR_BIT) + 5): S * (CHAR_BIT) + 5,         \
  _BitInt(S * (CHAR_BIT) + 6): S * (CHAR_BIT) + 6,         \
  _BitInt(S * (CHAR_BIT) + 7): S * (CHAR_BIT) + 7,         \
  unsigned _BitInt(S * (CHAR_BIT) + 0): S * (CHAR_BIT) + 0,\
  unsigned _BitInt(S * (CHAR_BIT) + 1): S * (CHAR_BIT) + 1,\
  unsigned _BitInt(S * (CHAR_BIT) + 2): S * (CHAR_BIT) + 2,\
  unsigned _BitInt(S * (CHAR_BIT) + 3): S * (CHAR_BIT) + 3,\
  unsigned _BitInt(S * (CHAR_BIT) + 4): S * (CHAR_BIT) + 4,\
  unsigned _BitInt(S * (CHAR_BIT) + 5): S * (CHAR_BIT) + 5,\
  unsigned _BitInt(S * (CHAR_BIT) + 6): S * (CHAR_BIT) + 6,\
  unsigned _BitInt(S * (CHAR_BIT) + 7): S * (CHAR_BIT) + 7,\
)


#define BITINT_WIDTH(...) _Generic(typeof(__VA_ARGS__),               \
  _BitInt(): BITINT_WIDTH_CHARBIT((sizeof(__VA_ARGS__) - 1)),         \
  unsigned _BitInt(): BITINT_WIDTH_CHARBIT((sizeof(__VA_ARGS__) - 1)),\
)

This is not fully complete, as e.g. the size of a _BitInt(...) is not mandated to be exactly filled to the CHAR_BIT boundary. But, this is a decent start to getting a macro going, and further tweaking would allow for a greatly improved experience (though this might be something better suited for a built-in).

This would also allow an implementation that wanted to match all _BitInts but only select a certain criterion to do so, which might aid in the expansion of the <stdbit.h> utitlities.

3.2. Initialization Changes

The core changes affect initialization by allowing for an array of incomplete types to be formed (the new type formed will also be an incomplete type rather than a constraint violation). It will allow the follow kinds of code and also comes with general principles that can be extended to future code:

int a[] = {1, 2, 3};                 // int[3] gets completed, works today with no changes
int b[][] = {{1,2}, {2,3}, {3,4}};   // int[2][3] gets completed, this proposal
int c[][3] = {{1,2}, {2,3}, {3,4}};  // int[2][3] gets completed, this proposal
int d[2][] = {{1,2}, {2,3}, {3,4}};  // int[2][3] gets completed, this proposal
int e[3][] = {{1,2}, {2,3}, {3,4}};  // int[3][3] gets completed, this proposal
                                     // (extra entries {}-init)
int f[3][4] = {{1,2}, {2,3}, {3,4}}; // int[3][4], works today with no changes
_BitInt(3) abi = 2wb; // _BitInt(3), works today with no changes
_BitInt() bbi = 2wb; // _BitInt(2), completed

// NOT included in this proposal:
// hypothetical future feature integrations, if made available
_Vec(float, 3) va = {2.0f, 3.0f, 4.0f}; // _Vec(float, 3) potential future extension
_Vec(float) vb = {2.0f, 3.0f, 4.0f};    // _Vec(float, 3) potential future extension
_Vec(float, 4) vc = {2.0f, 3.0f, 4.0f}; // _Vec(float, 4) potential future extension
_Vec(float, *) v_sva = { some_vla };    // _Vec(float, runtime-n) potential future extension
                                        // for Scalable Vector Extensions (SVE)

There is the question of allowing T[*] for initialization, as with the _Generic changes for variable length arrays. The problem is that there’s no meaningful definition for it: the point of T[*] is to let the initializer define the size of the variable length array. But there’s absolutely no scenario in which this makes sense:

extern int n;
extern int m;

int arr0[n];
int arr1[m];

int two_vlas[*][] = { arr0, arr1 }; // ???

Does this copy two variable length arrays into two_vlas, or is it just a pointer copy? This is part of the problem with array parameters, array returns, and similar: C does not have a consistent or reasonable definition for this syntax and its definitions. There is one place where this can help immensely, however, and that’s variable length array pointers (e.g., variably modified types).

3.2.1. Initialization for Pointer Declarators

A possible way to extend just array completion is to also perform array completion within pointers. This is particularly helpful for variable length arrays and variably modified types, where expressions for doing simple tasks can become very involved:

#include <stdlib.h>

const char (*substr)(size_t n, const char (*p)[n],
  size_t start, size_t end)[n > end - start ? end - start : n];

extern size_t m;

int main (int argc, const char* argv[]) {
  const char str[m] = {};
  // initialize str
  // ...

  // [*] is completed to the appropriate size
  const char (*sub)[*] = substr(sizeof(str), &str, 4, m - 4);

  return 0;
}

Trying to figure out the right way to calcuate the size for sub is a miniature nightmare of its own making. It can also be extremely redundant, which gets harder when not using literals like 4 and simple identifiers: functions which compute values may not be idempotent or reproducible, making unspecified/undefined behavior problems worse. The blessing of using a later-completed variably modified type such as T(*p)[*] is helpful in making these types more palatable without the use of macros. (Another potential extension to make the declaration of functions using variably modified types better is trailing function return types, but that’s another problem for another proposal.)

Unfortunately, this faces some troubles. Getting the behavior to be consistent between T(*p)[*] and T(*p)[] is difficult because T(*p)[] is also valid syntax with somewhat perplexing and confusing behaviors:

int main () {
  int a[] = { 1, 2 };
  int (*ptr_a)[2] = &a;
  int (*incomplete_ptr_a)[] = &a;
  return
    a[0] + *ptr_a[0] + (*incomplete_ptr_a)[0]
    + a[0] + ptr_a[0][0] + (*incomplete_ptr_a)[0]
    // the below errors???
    + a[0] + *ptr_a[0] + incomplete_ptr_a[0][0]
  ;
}

(Godbolt) The errors exhibited in this snippet, even if they are standards-blessed, are confusing to say the least. One finds out that incomplete_ptr_a, unlike a itself, is not completed. Technically, this code is currently legal today even if it produces weird errors during the uses of the variable; a better way to do this would be to actually prevent incomplete_ptr_a from being incomplete when created in this manner on initialization. This is, technically, a change to existing code. We do not know of any code that intentionally uses pointers to incomplete arrays and then specifically relies on the ability to change the declaration, like so:

int main () {
  int a[] = { 1, 2 };
  int a2[] = { 1, 2, 3 };
  int (*ptr_a)[2] = &a;
  int (*incomplete_ptr_a)[] = &a; // incomplete pointer
  incomplete_ptr_a = &a2; // change what is pointed out, still incomplete
  return
    a[0] + *ptr_a[0] + (*incomplete_ptr_a)[0]
    + a[0] + ptr_a[0][0] + (*incomplete_ptr_a)[0]
    // the below errors???
    + a[0] + *ptr_a[0] + incomplete_ptr_a[0][0]
  ;
}

Because it also has other implications, this proposal is not yet proposing changing the behavior and intends to poll WG14 about this.

3.3. `T[]`. `T[*]`, and `T[CONSTANT]` Rules

The goal is to utilize the preexisting syntax of T[] in conjunction with T[*] and T[CONSTANT] to allow previously undefined behavior to become well-defined and intuitive. The exact type-based matching [N3290] afforded us, briefly, the ability to match on incomplete types. This allowed T[] to work, but fell down on its face later due to T[][] being an incomplete array to an incomplete array (violating other rules, elsewhere, which put it back into banned status).

The general rules above allow us to use nested arrays in _Generic by treating nested incomplete arrays as just normal incomplete types. This solves the major hurdle when _Generic(typeof(foo), ...) is deployed for multidimensional arrays. The only thing to do is to apply a new set of general-purpose array compatibility rules.

3.3.1. The Purpose of Having Both `T[]` and `T[*]`

The reasons for having both T[] and T[*] — and giving one greater priority to match variable length arrays versus the other without getting a constraint violation — are two-fold.

First, it’s important to follow the design we were already given previously. Even if we personally do not like T[*] in argument lists for functions (because it serves very little purpose and is just a third or fourth way to make an argument that, at the ABI level, is effectively required to be a T* object), the design is already there. It is a stand-in for eventual variable length array parameters. That is its association, and that is how it must stay. [N3290] blends this difference away by deviating it from its original purpose, making it match on all array types. We believe this is a disservice to the design and makes it confusing: if [*] is meant to designate variable length arrays in one context, why does it become a catch-all in another? This is why it is scoped to this one use case for this proposal.

Secondly, T[] is already the universally-understood "any array" indicator. While we lack initialization syntax for "specifically make this a variable length array" (save for an empty initializer with a variable length array compound literal, perhaps), for both arguments and single-level _Generic selection for arrays, this works out just fine. Simply granting it special permissions in _Generic is sufficient to continue to make it a catch-all, rather than only having it work in one narrow case completely by accident of a few rules coming together from recent changes.

Together, these two allows users to specifically accept variable length arrays in certain places (and ignore constant sized arrays as e.g. an enforcement mechanism), but also the opposite:

#define IS_VARIABLE_ARRAY_OF(TYPE, ...) _Generic(typeof(__VA_ARGS__), \
  TYPE[*]: 1, default: 0 \
)
#define IS_CONSTANT_ARRAY_OF(TYPE, ...) _Generic(typeof(__VA_ARGS__), \
  TYPE[*]: 0, TYPE[]: 1, default: 0 \
)
#define IS_ARRAY_OF(TYPE, ...) _Generic(typeof(__VA_ARGS__), \
  TYPE[]: 1, default: 0 \
)

This has been something that has been requested before, and in particular can aid in increasing type safety when invoking other macros or generating code. It is also notable here that, under the design of this proposal, TYPE could be int[] which makes more code simply work as-expected.

Being able to separate at compile-time the difference between a variable length array and a constant sized array is critical for programmers who wish to either provoke errors when handed a source of one type or another, or separate approaches for the sake of code generation.

In general, the core driving reason to wanting to be capable of observing the difference between the two is fairly simple: C implementations, despite great advancements in the last 40 years, cannot fully improve the code generation around variable length arrays for fundamental design reasons. "The size of this type is only known at run-time" hides a lot of useful information from a compiler! While smart compilers can break these sorts of things down given enough optimizer power and inlined code, at its further reaches variable length arrays take operations that can be computed during compilation/translation and effectively delay them to execution. This means that while variable length arrays can save on the overall run-time memory used for a program, it comes at the cost of increased codegen to handle specific cases, especially since such a wide variety of their behavior and allocation is left completely unspecified and up to the implementation. (There is some work going into improving this situation.)

3.3.2. Array Usage Examples: Constant Sized Input

Here is an example of expected behavior from matching on a constant sized array with the whole gamut of different types deployed:

#include <assert.h>

int main () {
  int arr[10] = {};
	
  int result_constants = _Generic(typeof(arr),
    int[10]: 0,
    int[11]: 1,
    int[20]: 2,
    default: 3
  );
  assert(result_constants == 0);

  int result_constant_and_incomplete = _Generic(typeof(arr),
    int[10]: 0,
    int[]: 1,
    default: 2
  );
  assert(result_constant_and_incomplete == 0);

  int result_incomplete = _Generic(typeof(arr),
    int[]: 0,
    default: 1
  );
  assert(result_incomplete == 0);

  int result_incomplete_and_vla = _Generic(typeof(arr),
    int[]: 0,
    int[*]: 1,
    default: 2
  );
  assert(result_incomplete_and_vla == 0);

  int result_incomplete_constant_and_vla = _Generic(typeof(arr),
    int[10]: 0,
    int[]: 1,
    int[*]: 2,
    default: 3
  );
  assert(result_incomplete_constant_and_vla == 0);
	
  return 0;
}

3.3.3. Array Usage Examples: Variable Length Input

Here is a similar example, but with the input array being a VLA:

#include <assert.h>

int main () {
  int n = 10;
  int vla[n] = {};
	
  int result_constants = _Generic(typeof(vla),
    int[10]: 0,
    int[11]: 1,
    int[20]: 2,
    default: 3
  );
  assert(result_constants == 3);

  int result_constant_and_incomplete = _Generic(typeof(vla),
    int[10]: 0,
    int[]: 1,
    default: 2
  );
  assert(result_constant_and_incomplete == 1);

  int result_incomplete = _Generic(typeof(vla),
    int[]: 0,
    default: 1
  );
  assert(result_incomplete == 0);

  int result_incomplete_and_vla = _Generic(typeof(vla),
    int[]: 0,
    int[*]: 1,
    default: 2
  );
  assert(result_incomplete_and_vla == 1);

  int result_incomplete_constant_and_vla = _Generic(typeof(vla),
    int[10]: 0,
    int[]: 1,
    int[*]: 2,
    default: 3
  );
  assert(result_incomplete_constant_and_vla == 2);
	
  return 0;
}

3.3.4. What About Array Parameters?

Right now, parameters to functions can have VLA and array types. But, all of them decay to pointers and, currently today, they match against pointers, not arrays. We have no intention of changing this behavior with this proposal: this example will continue to work as is expected:

void foo (int arg[10]) {
  static_assert(_Generic(typeof(arg), int[10]: 0, int[]: 0, int*: 1), "oh.");
  // same behavior under this proposal
  static_assert(_Generic(typeof(arg), int[10]: 0, int[]: 0, int[*]: 0, int*: 1), "oh.");
}

The only way to change this behavior would be to deprecate constant sized and variable length array parameters, leave then deprecated for 20 years, and then restore array behavior to these function types.

3.4. Harmonizing Between Type-based and Expression-based `_Generic`

Making all of the prior-displayed changes to arrays would be very awkward if it then stopped working for controlling expression-based _Generic. For example, due to the rule about incomplete types (and, in general, not being able to produce an incomplete type as a value from an expression), using T[] as a match would go back to being illegal in expressions. Therefore, as part of this proposal, we are also going to be advocating for a simple change to both the type-based _Generic, and the expression-based generic. Namely:

first, both will perform direct type-based matching (with the above array rules);
then, both will then fall back to l-value converted, compatibility-based matching.

Together, this fallback mechanism paired with the current exact type-based matching will allow generic selection expressions to have normal behavior for both librarians and users. It, thankfully, poses no risk to existing code using _Generic today. No expression-based generic was capable of matching on an e.g. const int versus an int: there doesn’t exist a _Generic today where the first one was being selected over the second one, and implementations have been warning about that being unreachable/unmatchable code for some time now. Similarly, the type-based code will continue to work as-expected if it was already written correctly in the few months since the extension has been standardized for C2y.

Ultimately, this allows both versions to have identical behavior. While enabling the power of both was nice, doing one type of matching for type-based versions and one type of matching for expression-based versions would ultimately end up being a legacy mistake from the perspective.

4. Prior Art

There is no prior art for this. We intend to gauge Committee reaction, refine the paper, and then submit patches to existing implementations for this behavioral improvement.

5. Wording

The wording is relative to the latest Working Draft.

5.1. Intent

The intent of this wording is to provide:

a change in the definition of incomplete types for arrays, which makes it so an incomplete constant sized array specification with T[] and incomplete variable size T[*] can complete their respective constant and incomplete array types;
move the array types wording and concentrate it into §6.2.5;
improvements to how type completion works for both categories of incomplete array types, allowing multiple type derivations to be completed at once.

This wording is broken up into two major sections:

the incomplete type changes and array matching changes for both _Generic and initialization in § 5.2 Wording: Incomplete Types, Initialization, and Arrays;
and, the wording that allows _Generic to start with "strict matching" and then fall back to "l-value conversion" matching in § 5.3 Wording: Generic Strict Matching and Fallback Matching Harmonization.

These are the ways in which the changes are separable. If WG14 so desires, these two sections can be voted on separately using the paper’s section numbers to indicate the section currently being voted on.

5.2. Wording: Incomplete Types, Initialization, and Arrays

5.2.1. General Standard: Change "unspecified size" to "unknown size" globally

There are a handful of places where describing a VLA refers to an unspecified size, where the term of art should be an unknown size. There are far more uses of "unknown size" than "unspecified size".

5.2.2. Modify §6.2.5 Types

²⁵ Any number of derived types can be constructed from the object and function types, as follows:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. ~~The element type shall be complete whenever the array type is specified.~~ If the element type is an incomplete type, then the array type is an incomplete type. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called "array of T". The construction of an array type from an element type is called "array type derivation".

…

…

^✨ A signed or unsigned bit-precise integer type with no specified width, _BitInt(), signed _BitInt() or unsigned _BitInt(), is an incomplete type called a bit-precise incomplete integer type. It is completed, for an identifier of that type, by specifying the width in a later declaration (with internal or external linkage) or during initialization.

^✨ An array type of known size is either a constant array type or a variable length array type. If the array type has an integer constant expression and the element type has a known constant size, the array type is a constant array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations may support; see 6.10.10.4.)

²⁷ An array type of unknown size is an incomplete type. An unknown size for an array type is one where the size is not specified or the size is specified as *. An unknown size of * means the array type is a variably incomplete array type. An unknown size that is not specified means the array type is an unspecified incomplete array type. It is completed, for an identifier of that type, by specifying ~~the size~~ the constant size, for an unspecified incomplete array type, or the non-constant size, for a variably incomplete array type, in a later declaration (with internal or external linkage) or during initialization .

^✨ Array types have a rank, which is the number of dimensions of that array. An array type with a non-array element type has a rank of 1. For every element type of an array type that is itself also an array, the rank of the array increases by 1. Each dimension of a potentially nested array has an index from 0 (outermost) to rank - 1 (innermost). Incomplete array types still have a rank and still have an index that identifies that dimension.

^✨ NOTE Contrary to the lexicographic order in a multidimensional array type, the set of balanced square brackets seen first is the outermost dimension, with each subsequent set of balanced square brackets denoting inner array sizes. For a declaration int arr[3][4][5];, 3 is the size for the outermost array dimension and 5 is the size for the innermost array dimension in arr.

5.2.3. Modify §6.5.2.1 Generic selection

Constraints

² A generic selection shall have no more than one default generic association. The type name in a generic association shall specify a type, other than a variably modified type that is not a variably incomplete array type. No two generic associations in the same generic selection shall specify ~~compatible~~ associated types , as defined later in this sublcause . If the generic controlling operand is an assignment expression, the controlling type of the generic selection expression is the type of the assignment expression as if it had undergone an lvalue conversion, array to pointer conversion, or function to pointer conversion. Otherwise, the controlling type of the generic selection expression is the type designated by the type name. The controlling type shall be compatible with at most one of the types named in the generic association list. If a generic selection has no default generic association, its controlling type shall be ~~compatible~~ associated with exactly one of the types named in its generic association list.
^✨ Given an input type and a target type, the input type is associated with the target type under the following conditions.

If the input type is a bit-precise integer type:

If target type is a bit-precise incomplete integer type of the same signedness and same width, then the two types are associated.

Otherwise, if target type is a bit-precise incomplete integer type of the same signedness, then the two types are associated.

If the input type is an array type and it has a constant size:

If the target type is an array type with both an element type that is associated with the input array type’s element type and a constant size that is equivalent to the input array type’s constant size, then the two types are associated.

Otherwise, if the target type is an array type with both an element type that is associated with the input array type’s element type and is an empty incomplete array type, then the two types are associated.

Otherwise, if the input type is an array type and it has a non-constant size:

If the target type is an array type with both an element type that is associated with the input array type’s element type and a constant size that is equivalent to the input array type’s constant size, then the two types are associated.

Otherwise, if the target type is an array type with both an element type that is associated with the input array type’s element type and is an empty incomplete array type, then the two types are associated.

Otherwise, if the input type is an array and it is an incomplete type:

If the target type is an array type with both an element type that is associated with the input array type’s element type and is an variably incomplete, then the two types are associated.

Otherwise, if the target type is an array type with both an element type that is associated with the input array type’s element type and is an empty incomplete array type, then the two types are associated.

Otherwise, the two types are associated if they are compatible.

Semantics

³ The generic controlling operand is not evaluated. If a generic selection has a generic association with a type name that is ~~compatible~~ associated with the controlling type, then the result expression of the generic selection is the expression in that generic association. Otherwise, the result expression of the generic selection is the expression in the default generic association. None of the expressions from any other generic association of the generic selection is evaluated.

...

^✨ EXAMPLE Generic selection can match multidimensional arrays by using incomplete array types.

int main () {
  int result = _Generic(int[20][10],
    int[][]: 0,
    default: 1
  );
  return result; // return 0
}

Constant array types are associated with other constant array types before they are associated with unspecified incomplete array types.

int main () {
  int a = _Generic(int[20][10],
    int[][]: 0,
    default: 1
  );
  int b = _Generic(int[20][10],
    int[][]:   1,
    int[20][]: 0,
    default:   1
  );
  int c = _Generic(int[20][10],
    int[][]:     1,
    int[20][]:   1,
    int[20][10]: 0,
    default:     1
  );
  return a + b + c; // return 0
}

Multiple types associated with the controlling operand is a constraint violation.

int main () {
  int a = _Generic(int[20][10],
    int[][]:   1,
    int[][10]: 0,
    int[][*]:  1,
    default:   1
  );
  // okay, a is 0
	
  _Generic(int,
    int:     0,
    int:     0, // constraint violation
    default: 1
  );
  _Generic(int[20][10],
    int[][]: 0,
    int[][]: 0, // constraint violation
    default: 1
  );
  _Generic(int[20][10],
    int[][]:  0,
    int[*][]: 1, // constraint violation
    int[*][]: 1, // constraint violation
    default:  1
  );
}

^✨ EXAMPLE A variable length array is associated with variably incomplete array types before unspecified incomplete array types.

int main () {
  int n = 10;
  int vla[n] = {};
	
  static_assert(_Generic(typeof(vla),
    int[10]: 0,
    int[11]: 0,
    int[20]: 0,
    default: 1
  ));

  static_assert(_Generic(typeof(vla),
    int[10]: 0,
    int[]:   1,
    default: 0
  ));

  static_assert(_Generic(typeof(vla),
    int[]:   0,
    default: 1
  ));

  static_assert(_Generic(typeof(vla),
    int[]:   0,
    int[*]:  1,
    default: 0
  ));

  static_assert(_Generic(typeof(vla),
    int[10]: 0,
    int[]:   0,
    int[*]:  1,
    default: 0
  ));
}

^✨ EXAMPLE A bit-precise incomplete integer type can match on multiple kinds of bit int.

int main () {
  _BitInt() bits = 2wb; // type is _BitInt(2)
	
  static_assert(_Generic(typeof(bits),
    signed _BitInt(): 1,
    unsigned _BitInt(): 0,
    default: 0
  ));

  static_assert(_Generic(typeof(3uwb),
    _BitInt(): 0,
    unsigned _BitInt(): 1,
    default: 0
  ));
}

5.2.4. Modify §6.7.3 Type specifiers/§6.7.3.1 General

Syntax

type-specifier:

…

_BitInt ( constant-expression )

_BitInt ( )

…

5.2.5. Modify §6.7.7.3 Array declarators

¹ In addition to optional type qualifiers and the keyword static, the [ and ] can delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be a ~~n incomplete or~~ function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.

…

⁴ If the size is not present the array type is an incomplete type. If the size is * instead of being an expression, the array type is a variable length array type of unspecified size, which can only be usedas part of the nested sequence of declarators or abstract declarators for a parameter declaration, not including anything inside an array size expression in one of those declarators;¹⁶³⁾ such arrays are nonetheless complete types. If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations may support; see 6.10.10.4.)

…

5.2.6. Modify §6.7.11 Initialization

Constraints

…

⁷ For an array, if the element type of an array is itself an incomplete array type, the initializer shall be a brace-enclosed list of initializers. Otherwise, the initializer ~~The initializer for an array~~ shall be either a string literal, optionally enclosed in braces, or a brace-enclosed list of initializers for the elements. An array initialized by character string literal or UTF-8 string literal shall have a character type as element type. An array initialized with a wide string literal shall have element type compatible with a qualified or unqualified wchar_t, char16_t, or char32_t, and the string literal shall have the corresponding encoding prefix (L, u, or U, respectively).

…

Semantics

…

²⁵ If an ~~array of unknown size~~ incomplete array type is initialized, its size is determined by the largest indexed element with an explicit initializer. The array type is completed at the end of its initializer list.

^✨ If a bit-precise incomplete integer type is initialized, its width is determined by the smallest bit-precise integer type needed to hold a value of the type of the initialization.

…

5.3. Wording: Generic Strict Matching and Fallback Matching Harmonization

5.3.1. Modify §6.5.2.1 Generic selection

² A generic selection shall have no more than one default generic association. The type name in a generic association shall specify a type other than a variably modified type. No two generic associations in the same generic selection shall specify compatible types. If the generic controlling operand is an assignment expression, the controlling type of the generic selection expression is the type of the assignment expression as if it had undergone an lvalue conversion, array to pointer conversion, or function to pointer conversion. Otherwise, the controlling type of the generic selection expression is the type designated by the type name. The generic controlling operand specifies the controlling type of the generic expression, which is:

first, the type specified by the provided type name or assignment expression;
otherwise, if none of the generic associations are compatible with that type (excluding default generic associations), the type specified by the provided type name or assignment expression as if it had undergone an lvalue conversion, array to pointer conversion, or function to pointer conversion.

The controlling type shall be compatible with at most one of the types named in the generic association list. If a generic selection has no default generic association, its controlling type shall be compatible with exactly one of the types named in its generic association list. If a generic selection has no default generic association, its controlling type shall be compatible with exactly one of the types named in its generic association list.

N3441
`_Generic` and VLA Realignment and Improvement

Published Proposal, 2024-12-23

Abstract

1. Changelog

1.1. Revision 1 - December 23^rd, 2024

1.2. Revision 0 - September 9^th, 2024

2. Introduction and Motivation

2.1. Unusual Array Behavior

2.2. Implementation Quirks from Complex Expressions and `_Generic(type-name, ...)`

3. Design

3.1. Why `_BitInt`??

3.2. Initialization Changes

3.2.1. Initialization for Pointer Declarators

3.3. `T[]`. `T[*]`, and `T[CONSTANT]` Rules

3.3.1. The Purpose of Having Both `T[]` and `T[*]`

3.3.2. Array Usage Examples: Constant Sized Input

3.3.3. Array Usage Examples: Variable Length Input

3.3.4. What About Array Parameters?

3.4. Harmonizing Between Type-based and Expression-based `_Generic`

4. Prior Art

5. Wording

5.1. Intent

5.2. Wording: Incomplete Types, Initialization, and Arrays

5.2.1. General Standard: Change "unspecified size" to "unknown size" globally

5.2.2. Modify §6.2.5 Types

5.2.3. Modify §6.5.2.1 Generic selection

5.2.4. Modify §6.7.3 Type specifiers/§6.7.3.1 General

5.2.5. Modify §6.7.7.3 Array declarators

5.2.6. Modify §6.7.11 Initialization

5.3. Wording: Generic Strict Matching and Fallback Matching Harmonization

5.3.1. Modify §6.5.2.1 Generic selection

5.3.2. Modify §6.5.2.1 Generic selection EXAMPLE 2 to change `int` and `const int` matching based on type vs. expression

References

Informative References

N3441_Generic and VLA Realignment and Improvement

Published Proposal, 2024-12-23

Abstract

1. Changelog

1.1. Revision 1 - December 23rd, 2024

1.2. Revision 0 - September 9th, 2024

2. Introduction and Motivation

2.1. Unusual Array Behavior

2.2. Implementation Quirks from Complex Expressions and _Generic(type-name, ...)

3. Design

3.1. Why _BitInt??

3.2. Initialization Changes

3.2.1. Initialization for Pointer Declarators

3.3. T[]. T[*], and T[CONSTANT] Rules

3.3.1. The Purpose of Having Both T[] and T[*]

3.3.2. Array Usage Examples: Constant Sized Input

3.3.3. Array Usage Examples: Variable Length Input

3.3.4. What About Array Parameters?

3.4. Harmonizing Between Type-based and Expression-based _Generic

4. Prior Art

5. Wording

5.1. Intent

5.2. Wording: Incomplete Types, Initialization, and Arrays

5.2.1. General Standard: Change "unspecified size" to "unknown size" globally

5.2.2. Modify §6.2.5 Types

5.2.3. Modify §6.5.2.1 Generic selection

5.2.4. Modify §6.7.3 Type specifiers/§6.7.3.1 General

5.2.5. Modify §6.7.7.3 Array declarators

5.2.6. Modify §6.7.11 Initialization

5.3. Wording: Generic Strict Matching and Fallback Matching Harmonization

5.3.1. Modify §6.5.2.1 Generic selection

5.3.2. Modify §6.5.2.1 Generic selection EXAMPLE 2 to change int and const int matching based on type vs. expression

References

Informative References

N3441
`_Generic` and VLA Realignment and Improvement

1.1. Revision 1 - December 23^rd, 2024

1.2. Revision 0 - September 9^th, 2024

2.2. Implementation Quirks from Complex Expressions and `_Generic(type-name, ...)`

3.1. Why `_BitInt`??

3.3. `T[]`. `T[*]`, and `T[CONSTANT]` Rules

3.3.1. The Purpose of Having Both `T[]` and `T[*]`

3.4. Harmonizing Between Type-based and Expression-based `_Generic`

5.3.2. Modify §6.5.2.1 Generic selection EXAMPLE 2 to change `int` and `const int` matching based on type vs. expression