1. Changelog
1.1. Revision 0 - September 4th, 2024
-
Initial release 🎉!
2. Introduction and Motivation
A common annoyance amongst C developers has been the ephemeral nature of the following code snippet:
int main () { const int n = 1 + 2 ; const int a [ n ]; return sizeof ( a ); }
Does this create a VLA type all the time, or is this a valid constant expression that produces a translation time (AKA compile-time) sized array with an extent of
? Will
be executed at compile-time or will it be run at execution time (AKA run-time) and pull the value from somewhere in the binary? Furthermore, if an implementation defines
, is this supposed to compile? All of these questions and more revolved around this issue were brought up in n2713. n2713 was accepted into C23, and subsequently forced the above code to resolve with
being a VLA, even if the implementation could ascertain this was a constant expression and treat it as a constant expression at compile-time. This allowed all implementations to have the same semantic frontend errors and/or warnings, while letting them optimize things as necessary during typical linking and code generation/lowering. (E.g., never using
with a dynamic value and instead just sizing the stack appropriately to accommodate the array directly for a binary implementation.)
However, during National Body (NB) comment processing, an NB comment pointed out that there was a lot of code relying on the fact that this was being treated -- not just by the backend with its optimizations -- but by the frontend of many compilers to be a plain, compile-time C array. This was formalized in n3138, which presented cases similar to the above. It also presented various other constant expressions to make it clear that there is a wide berth of existing practice beyond just MSVC, GCC, and Clang that accept many additional forms of constant expressions in many different situations. However, the array case remains of very significant impact that affects the most existing code. n3138 promised that a potential future version of C should look into the impact of changing constant expressions one way or another again.
This paper introduces a change for a portion of constant expressions in the opposite direction of N2713, by asking that
integer-typed declarations that are also immediately initialized with an integer constant expression are implicitly declared
.
3. Prior Art
This is existing practice on a wide variety of compilers both large and small, ranging from SDCC all the way up to much more powerful compilers like ICC (Intel), Clang and GCC. The snippet in § 2 Introduction and Motivation compiles and runs on many implementations with no run-time execution, even on its intentionally-weakest optimization settings (where applicable for a compiler with such settings). It also runs on many implementations even where VLAs are not allowed (e.g. with
or where
is combined with
).
Furthermore, C++ has a similar feature for all
-declared integer types. However, rather than modeling this after the C++ wording and C++ feature, we instead focus on solidifying and cleaning up the existing practice of implementation’s C mode (for implementations with shared C and C++ modes) and existing purely C compilers. Most importantly, we do not apply the full "manifestly constant evaluated" or "constantly evaluated" powers that C++ has adopted, and instead focus exclusively on what follows from the existing practice of existing C codebases and C implementations.
4. Design
The design of this feature is such that it requires a declaration that is the first declaration of its kind, without external linkage, and is immediately initialized. It also only applies to declarations whose only storage class specifier is
and, optionally, has
,
or
for its storage-class specifiers. (If the storage-class is already
, then this proposal affects no change to the declaration at all.) This means that, under this proposal, of the following declarations:
int file_d0 = 1 ; _Thread_local int file_d1 = 1 ; extern int file_d2 ; static int file_d3 = 1 ; _Thread_local static int file_d4 = 1 ; const int file_d5 = 1 ; constexpr int file_d6 = 1 ; static const int file_d7 = 1 ; int file_d2 = 1 ; int main ( int argc , char * argv []) { int block_d0 = 1 ; extern int block_d1 ; static int block_d2 = 1 ; _Thread_local static int block_d3 = 1 ; const int block_d4 = 1 ; const int block_d5 = file_d6 ; const int block_d6 = block_d4 ; static const int block_d7 = 1 ; static const int block_d8 = file_d5 ; static const int block_d9 = file_d6 ; constexpr int block_d10 = 1 ; static constexpr int block_d11 = 1 ; int block_d12 = argc ; const int block_d13 = argc ; const int block_d14 = block_d0 ; const volatile int block_d15 = 1 ; return 0 ; } int block_d1 = 1 ;
A handful of these declarations become
, as indicated by the table below which explains the changes for the above code snippet:
Declaration | Before ?
| After ?
| Comment |
---|---|---|---|
file_d0 | ❌ | ❌ | no change; implicitly, non-
|
file_d1 | ❌ | ❌ | no change; , implicitly, non-
|
file_d2 | ❌ | ❌ | no change; explicitly, non-
|
file_d3 | ❌ | ❌ | no change; non-
|
file_d4 | ❌ | ❌ | no change; , non-
|
file_d5 | ❌ | ❌ | no change; implicitly
|
file_d6 | ✅ | ✅ | no change; explicitly
|
file_d7 | ❌ | ✅ | and , initialized by constant expression
|
block_d0 | ❌ | ❌ | no change; non-
|
block_d1 | ❌ | ❌ | no change; explicitly, non-
|
block_d2 | ❌ | ❌ | no change; non- ,
|
block_d3 | ❌ | ❌ | no change; , , non-
|
block_d4 | ❌ | ✅ | ; initialized with literal
|
block_d5 | ❌ | ✅ | ; initialized with other variable
|
block_d6 | ❌ | ✅ | , initialized by other constant expression
|
block_d7 | ❌ | ✅ | and , initialized with literal
|
block_d8 | ❌ | ❌ | no change; non-constant expression initializer |
block_d9 | ❌ | ✅ | and , initialized by constant expression
|
block_d10 | ✅ | ✅ | no change; explicitly
|
block_d11 | ✅ | ✅ | no change; explicitly
|
block_d12 | ❌ | ❌ | no change; non- , non-constant expression initializer
|
block_d13 | ❌ | ❌ | no change; non-constant expression initializer |
block_d14 | ❌ | ❌ | no change; non-constant expression initializer |
block_d15 | ❌ | ❌ | no change;
|
This matches the existing practice that occurs today.
4.1. Changes in Existing Code
Besides what is enumerated above for given declarations, some typical consequences on existing code are:
-
Implementation-defined variable-Length Arrays (VLAs) in many cases are promoted to standard-guaranteed Fixed-Length Arrays (typical "C arrays"). This change is anticipated and wanted, and is part of the original motivation for this proposal.
-
Some manner of
expressions are not constant expressions. This is a natural addendum from the concrete rules about generic selection, where if the selected expression is a constant expression, then the_Generic ( …)
expression itself is a constant expression. A small class of these become guaranteed to be constant expressions now since the use of such integer-typed declarations now counts. This is intended, and simply a side-effect of the pass-through nature of_Generic
’s selection process. It does not negatively impact existing code in any appreciable way._Generic
Otherwise, all the effects of this proposal are for newly written code that can confidently take advantage of such now rather than leave it implementation-defined.
4.2. What if Someone Takes the Address of a const
Declaration that has been Promoted to constexpr
?
This is fine. Naked
variables are already implicitly
, and taking the address of one produces an
consistent with having a pointer to a variable that cannot be modified. A compiler may be robbed of a constant expression optimization (e.g., doing literal computation replacement and removing the existence of the variable inside of the program) by such a move, but it is fine and behaves perfectly in-line with the expected semantics of having a
integer. Modification of such an object by casting away its
-ness is, as it is throughout the C standard, Undefined Behavior and it should not be done. If it is done, the same rules apply as ever; undefined behavior. This proposal does not change anything in the way these values were being used to-date in either C or C++.
4.3. Why Not More Than Integer Types?
We limit this proposal to integer types (including enumerations) because that is the widest-spread existing practice and easiest to compute.
serves as not just a marker, but as a way to let an implementation know that no matter how complex the initializer or its contained expressions become, it must be evaluated at compile-time. This represents a contract between the user and the compiler, and also serves as a courtesy so that the compiler can be appropriately prepared when processing the declaration.
Conversely, this is an implicit promotion. To ensure compilers are not unduly burdened, we capture what is already existing practice on the vast majority of existing compilers: integer types. If, in the future, implementations process many more declarations at compile-time, then such expansions can be made easily.
5. Wording
The following wording is relative to the latest draft standard of C.
📝 Editor’s Note: The ✨ characters are intentional. They represent stand-ins to be replaced by the editor.
5.1. Add a new paragraph to 6.7 "Declarations", just after paragraph 12 and before "EXAMPLE 3"
…
If one of a declaration’s init declarator matches the second form (a declarator followed by an equal sign
and an initializer) meets the following criteria:
=
— it is the first visible declaration of the identifier;
— it contains no other storage-class specifiers except
,
static , or
auto ;
register — it does not declare the identifier with external linkage;
— its type is an integer type or an enumeration type that is
-qualified but not otherwise qualified, and is non-atomic;
const — and, its initializer is an integer constant expression (6.6);
then it behaves as if a
storage-class specifier is implicitly added for that declarator specifically. The declared identifier is then a named constant and is valid in all contexts where a named constant of the corresponding type is valid to form a constant expression of that specific kind (6.6).
constexpr …