Proposal for C2y
WG14 3354
Title: Case range expressions, v3
Author, affiliation: Alex Celeste, Perforce
Date: 2024-09-18
Proposal category: New feature
Target audience: Compiler implementers, users
GCC supports an expression syntax to allow the user to describe a contiguous range of integer constant expression in certain contexts. We propose adopting this expression kind and the simplest use case supported by GCC, along with comments on its integration and future directions for expansion of the feature.
Reply-to: Alex Celeste (aceleste@perforce.com)
Document No: N3354
Revises: N3269
Date: 2024-09-18
- ...
- rebase against N3301
- rebase wording on C2y draft
- just use the GCC syntax, scrap the arrows and
::
- remove right-open ranges
- remove arithmetic conversion (conflicted with GCC behavior)
- original proposal
The GNU C dialect includes an operator which can be used to describe a range of integer constant expressions, which is valid in two contexts where the kind of repetition implied makes sense:
switch (n) {
case 1 ... 10:
something ();
break;
}
which is equivalent to writing
if (n >= 1 && n <= 10) {
something ();
}
except that n
is only evaluated once; and
int arr[] = {
[3 ... 6] = n,
};
which is equivalent to writing
int arr[] = {
[3] = n,
[4] = n,
[5] = n,
[6] = n,
};
except that again, n
is only evaluated once.
A range expression is defined as:
integer-constant-expression ellipsis integer-constant-expression
and does not describe a kind of value that otherwise exists in the language;
in GNU C, the case
and designator syntactic constructs can accept either a
single integer constant expression, or a range expression, and the evaluation
or interpretation of the range does not result in a value that can be copied or
passed around further.
Range expressions are very useful for both conciseness of expression, and also
to avoid repeated evaluations in the designator case (by avoiding the need for
a temporary variable). If a switch
has a large number of possible ranges it
can be expressed much more easily by a "wide" range than by stacking the case
labels (e.g. case 1 ... 1000:
is quite plausible but would not be readable if
written out normally). While any switch
can be expressed as an if
, the
readability of the code may be affected and the choice between the two is
important when deciding how best to convey the author's intent.
There are some complexities in the designator case that arise from the fact that designators within an initializer do not need to be unique, and the lack of specification for evaluation order and effects in the existing initializer semantics. Therefore, while we propose introducing range expressions as a general reusable construct, we do not propose adding them for designators until the order of evaluation issues with existing initializer syntax are cleaned up.
We propose that a constant-range-expression is added as a top-level expression kind alongside constant-expression and expression for use by selected syntactic forms.
We propose that the case
label construct be allowed to admit either a
constant-expression or a constant-range-expression as its operand. The
constraints this range would be subject to are largely the same as those which
currently apply to the constant-expression operand.
The constant-range-expression can be reused in other contexts as and when they
emerge, so we do not believe this should be a property locked to case
itself
specifically. There are several other compelling use cases for ranges, including
GCC's own use to describe designator ranges.
Extending the constraints on the current case
label form is relatively simple
as there are no repeated-side effects to consider.
void foo (void);
void use (int n)
{
switch (n) {
case 1:
foo ();
break;
// case 4 : // error, overlaps 2 ... 5
// foo ();
// break;
case 2 ... 5:
foo ();
break;
case 6 ... 6: // OK (but questionable)
foo ();
break;
case 8 ... 7: // not a GCC error
foo ();
break;
case 10 ... 4: // not a GCC error despite overlap
foo ();
break;
}
}
case 4
and case 2 ... 5
clearly and unambiguously overlap. This should be
treated as a violation of 6.8.5.3 paragraph 3, which says "no two of the case
constant expressions ... shall have the same value". The order of the two labels
produces a slightly different error message in tools, but this is a UX issue and
does not make the code more or less valid either way - we do not propose
that single-expression cases should be able to override parts of a range
expression, as this is not existing practice and seems highly likely to lead to
user confusion. (Subjectively, such forms would seem to invalidate the argument
that range-based case
labels would be clearer, too.)
Because a range describes a sequence of value from low to high, GCC does
not consider 8 ... 7
to be an error, but an empty range which cannot match
n
. We optionally propose making this a constraint violation.
Because 10 ... 4
also describes an empty range, tools will usually ignore the
apparent overlap with 2 ... 5
. This does not need any special constraint if
the reversed range is already a constraint violation its own right.
Tools allow 6 ... 6
because the range contains at least one value. We believe
this is fine, as the resulting case
label will have a value that can match.
Tools should probably warn for this situation.
Although the use of the ...
operator is existing practice in GCC and compatible
compilers, it does have a minor usability problem in practice:
case 11...12: // syntax error, PP-number with too many dots
foo ();
break;
A space is needed between the low and high values and the range operator,
because a dot is a valid part of a PP-number in the preprocessor syntax and
therefore the above form 11...12
is lexed by most tools as a single number,
which then fails translation into a valid C-language token.
This is slightly inconsistent:
enum {
ELEVEN = 11,
TWELVE = 12,
};
case ELEVEN...TWELVE: // this is fine 🍵🔥
foo ();
break;
The above snippet parses, because dots do not form part of an identifier, but is fragile, and may break if the code is edited by hand.
Essentially this operator is making the a similar mistake that compound assignment
did in the original B language with the =+
, =-
, =*
operators.
The problem is not as serious since it only results in a syntax error and not in
valid code that silently does something different.
The problem with ...
is not unresolvable and it would be fairly simple to
add a conversion from "PP-number containing ...
" to a sequence of two or three
C-language tokens; but this still seems confusing as it breaks the 1-1 between
preprocessor and C tokens; and anyway neither Clang nor GCC have done so.
The opinion of the Committee at the January 2024 meeting was that the existing practice for operator spelling should be adopted directly, and to treat any problems parsing adjacent tokens as a QoI issue (it is the responsibility of an implementation to preprocess identifier and ellipsis tokens separately and a tool that merged them into a single PP-number would already be non-conforming).
Alternative operators were proposed but would lack the advantage of working on
any existing code. The proposed ::
and ->
tokens also caused ambiguity
problems at the grammar level, which ...
does not suffer from.
In light of the fact that this problem can only manifest as an error, it was not considered an obstacle to direct adoption.
The GCC feature describes a closed interval (in the range expression low ... high
,
the values in the range include both low
and high
).
case 1 ... 4:
// exactly equivalent to
case 1:
case 2:
case 3:
case 4:
A right-open interval is often also intuitive, especially for future uses in array contexts, as it can refer to the element count as the right hand operator:
case 1 ~> 4: // pretend ~> operator exists for right-open
// would be equivalent to
case 1:
case 2:
case 3:
// 4 is past-the-end
However, at the January 2024 meeting the Committee decided that the right-open range would not be useful enough to standardize as part of this proposal.
This is a core GNU C language feature and is supported by most compilers that claim any level of GCC compatibility. GCC itself, as well as Clang, and third party tools emulating GNU C such as QAC, all support this feature.
Experience implementing the feature in Helix QAC suggests that the burden on
implementers is minimal for both current usages. The case
feature is essentially
trivial as the relevant case
labels can, if necessary, be expanded out just
after parsing-time without even needing the backend to be aware of them (though
in practice we found that large ranges expanded this way could potentially
cause surprising slowdowns in an optimizing or dataflow backend, and that it was
better to integrate a backend-aware form of support; this was not complicated but
will depend on the tool architecture, if it can handle if
style expressions in
the branches).
The designator feature is not complicated to implement at all, but does raise
potential user confusion issues as the designators can allow repetition, the
overriding of side effects, etc. These are issues that already exist within the
Standard initializer rules. GCC also surprisingly allows empty ranges to compile
in this context, which it rejects for case
labels; these introduce some more
potential for user confusion as there is now another way for side effects to be
silently eliminated even without the user overriding an initializer. As a result
we think the syntax should be reserved but addition of this part of the feature
held off until the existing issues are cleaned up.
A very related use to designator ranges, which initialize a subrange of an array, is slices, which read from an initialized object (or perhaps describe an lvalue). A number of languages provide this feature and use a range-like expression as the operand to the subscript operator or its equivalent in order to create a slice value (whatever value category that is in the respective language).
We do not propose standardizing anything like this at this time, but by making
the range expression its own expression kind instead of baking it directly into
the syntax of case
specifically, we leave the door open to reuse a single
syntax for all range- and slice-related features that may be introduced in
future. This will keep the language as consistent as possible.
In languages which support slices, one or both operands to the range operator can be omitted and implicitly refer to the maximum extent of the range. Since we are not currently proposing slices, we do not include this in the current proposal. It may prove useful to add in future.
The proposed changes are based on the latest public draft of C2y, which is N3301. Bolded text is new text when inlined into an existing sentence.
Add a new section within 6.6 "Constant expressions":
Syntax
constant-range-expression:
constant-expression...
constant-expressionDescription
A constant range expression is a special form to describe a sequence of constant expressions with contiguous incrementing values, from a low value on the left, to a high value on the right.
The resulting sequence cannot be used as a value expression and is only permitted in specific contexts.
Constraints
The left-hand value and the right-hand value shall be integer constant expressions.
(optionally) The right-hand value shall not have a value less than the left-hand value.
Semantics
A range expression describes a sequence of integer constant expressions without listing each intermediate value explicitly. This is permitted as the operand to a
case
label to indicate that a single label matches multiple values.The sequence described by the
...
operator is a closed range, which contains all integer values in sequence starting from and including the left, low value, up to and including the right, high valuefootnote).footnote) a range is not itself usable as a value and therefore does not have any specific type or representation, or perform any type conversion.
(optional, only if the optional constraint is not included) If the right-hand value of the closed range is less than the left-hand value, the sequence described by the range is empty.
Recommended practice
Implementations are encouraged to emit a diagnostic message when a closed range only describes a single value, that is, when the low and high value are the same.
(optional, only if the optional constraints are not included) Implementations are encouraged to emit a diagnostic message when a range is empty.
EXAMPLE 1 Range expressions may be used as the operand to a
case
label to concisely describe a sequence of matching values in one label:switch (n) { case 2 ... 5: // equivalent to // case 2: // case 3: // case 4: // case 5: f (); break; }
EXAMPLE 2 Because a range expression describes a closed range, it is possible to match past-the-end values such as the size of an array:
int arr[N]; switch (i) { case 0 ... N: // matches the past-the-end range of arr f (arr[i]); // not OK, will dereference arr[N] g (&arr[i]); // may be OK depending on purpose break; } switch (i) { case 0 ... N - 1: // only matches the valid element range of arr f (arr[i]); // OK break; }
EXAMPLE 3 This range expression describes a range with elements that have different types. The implementation will create a range with the values from -10 to +10 in whatever representation it needs in order to preserve the meaning of the expressed range:
case -10 ... 10ul:
Forward references:
switch
statements (6.8.5.3)
We prefer to include the optional constraints instead of the recommended practice.
Within 6.8.2 "Labeled statements", Syntax, paragraph 1 (the grammar), add a new production to label:
label:
attribute-specifier-sequence opt identifier:
attribute-specifier-sequence optcase
constant-range-expression:
attribute-specifier-sequence optcase
constant-expression:
attribute-specifier-sequence optdefault
:
Constraints are added to the description of the range expression itself and are therefore not needed in this section.
Add a new paragraph after paragraph 4:
A
case
label specifies the value of the following constant expression, if followed by a single constant expression, or each value in the sequence described by the following range, if followed by a constant range expression.
Within 6.8.5.3 "The switch
statement", modify paragraph 3:
The expression of each
case
label shall be an integer constant expression or a constant range expression and no two of thecase
constant expressions associated to the sameswitch
statement shall specify the same value after conversion. There may be at most onedefault
label associated to aswitch
statement. (Any enclosedswitch
statement may have adefault
label orcase
-specified values that duplicatecase
-specified values in the enclosingswitch
statement.)
Add a new paragraph after paragraph 3:
The arithmetic values specified by a constant range expression shall not change as a result of conversion to the promoted type of the controlling expression.
(This used to be permitted by GCC and Clang with a warning, but has since been made a hard error. There is no reason to specify a behaviour for what would always be an unintentional value change here.)
Modify existing paragraph 4:
...and on the presence of a
default
label and the specified values of anycase
labels on or in theswitch
body...
Modify paragraph 5:
The integer promotions are performed on the controlling expression. The values specified by each
case
label are converted to the promoted type of the controlling expression. If a converted value matches that of the promoted controlling expression, control jumps to the statement or declaration following the matchedcase
label. Otherwise, if there is adefault
label, control jumps to the statement or declaration following thedefault
label. If no convertedcase
value matches and there is nodefault
label, no part of theswitch
body is executed.
Optionally (if the constraint against empty ranges was not added), add a new sentence to the end of paragraph 5:
A
case
label with a range expression that describes an empty range (which occurs when the low value has a value greater than the high value) never matches any value of the controlling expression footnote.footnote) even if the controlling expression has the same value as one of the operands to the empty range expression.
Modify paragraph 6:
As discussed in 5.3.5.2, the implementation may limit the number of
case
labels in aswitch
statement.
Add a new example after paragraph 7:
EXAMPLE 2 A value specified as part of the sequence in a range expression argument to
case
shall not be repeated by a different label:switch (expr) { case 1 ... 5: // ... break; case 6: // OK, 6 has not been specified // ... case 4: // constraint violation: // ... // 4 was specified by 1 ... 5 break; case 3 ... 8: // constraint violation: // ... // 3, 4, 5, 6 were specified above break; }
Does WG14 want to add constant-range-expression as a toplevel expression kind?
Does WG14 want to include the constraint that the low value must be less than or equal to the high value (i.e. that a range cannot be empty)?
Would WG14 like to see further work on ranges for designators?
Huge thanks to Joseph Myers for detailed and thorough review of previous versions.
C2y public draft
GCC case ranges
GCC designator ranges
Helix QAC
Wikipedia examples of array slicing
Clang case
implementation details