Proposal for C2y
WG14 3194
Title:               Case range expressions
Author, affiliation: Alex Celeste, Perforce
Date:                2023-12-06
Proposal category:   New feature
Target audience:     Compiler implementers, users
Abstract
GCC supports an expression syntax to allow the user to describe a contiguous range of integer constant expression in certain contexts. We propose adopting this expression kind and the simplest use case supported by GCC, along with comments on its integration and future directions for expansion of the feature.
Case range expressions
Reply-to:     Alex Celeste (aceleste@perforce.com)
Document No:  N3194
Revises:      (n/a)
Date:         2023-12-06
Summary of Changes
N3194
- original proposal
Introduction
The GNU C dialect includes an operator which can be used to describe a range of integer constant expressions, which is valid in two contexts where the kind of repetition implied makes sense:
switch (n) {
  case 1 ... 10:
    something ();
    break;
}
which is equivalent to writing
if (n >= 1 && n <= 10) {
  something ();
}
except that n is only evaluated once; and
int arr[] = {
  [3 ... 6] = n,
};
which is equivalent to writing
int arr[] = {
  [3] = n,
  [4] = n,
  [5] = n,
  [6] = n,
};
except that again, n is only evaluated once.
A range expression is defined as:
integer-constant-expression  ellipsis  integer-constant-expression
and does not describe a kind of value that otherwise exists in the language;
in GNU C, the case and designator syntactic constructs can accept either a
single integer constant expression, or a range expression, and the evaluation
or interpretation of the range does not result in a value that can be copied or
passed around further.
Range expressions are very useful for both conciseness of expression, and also
to avoid repeated evaluations in the designator case (by avoiding the need for
a temporary variable). If a switch has a large number of possible ranges it
can be expressed much more easily by a "wide" range than by stacking the case
labels (e.g. case 1 ... 1000: is quite plausible but would not be readable if
written out normally). While any switch can be expressed as an if, the
readability of the code may be affected and the choice between the two is
important when deciding how best to convey the author's intent.
There are some complexities in the designator case that arise from the fact that designators within an initializer do not need to be unique, and the lack of specification for evaluation order and effects in the existing initializer semantics. Therefore, while we propose introducing range expressions as a general reusable construct, we do not propose adding them for designators until the order of evaluation issues with existing initializer syntax are cleaned up.
Proposal
We propose that a constant-range-expression is added as a toplevel expression kind alongside constant-expression and expression for use by selected syntactic forms.
We propose that the case label construct be allowed to admit either a
constant-expression or a constant-range-expression as its operand. The
constraints this range would be subject to are largely the same as those which
currently apply to the constant-expression operand.
The constant-range-expression can be reused in other contexts as and when they
emerge, so we do not believe this should be a property locked to case itself
specifically. There are several other compelling use cases for ranges, including
GCC's own use to describe designator ranges.
Case constraints
Extending the constraints on the current case label form is relatively simple
as there are no repeated-side effects to consider.
void foo (void);
void use (int n)
{
  switch (n) {
    case 1:
      foo ();
      break;
    // case 4 : // error, overlaps 2 ... 5
    //   foo ();
    //   break;
    case 2 ... 5:
      foo ();
      break;
    case 6 ... 6: // OK (but questionable)
      foo ();
      break;
    case 8 ... 7: // not a GCC error
      foo ();
      break;
    case 10 ... 4: // not a GCC error despite overlap
      foo ();
      break;
  }
}
case 4 and case 2 ... 5 clearly and unambiguously overlap. This should be
treated as a violation of 6.4.8.2 paragraph 3, which says "no two of the case
constant expressions ... shall have the same value". The order of the two labels
produces a slightly different error message in tools, but this is a UX issue and
does not make the code more or less valid either way - we do not propose
that single-expression cases should be able to override parts of a range
expression, as this is not existing practice and seems highly likely to lead to
user confusion. (Subjectively, such forms would seem to invalidate the argument
that range-based case labels would be clearer, too.)
Because a range describes a sequence of value from low to high, GCC does
not consider 8 ... 7 to be an error, but an empty range which cannot match
n. We optionally propose making this a constraint violation.
Because 10 ... 4 also describes an empty range, tools will usually ignore the
apparent overlap with 2 ... 5. This does not need any special constraint if
the reversed range is already a constraint violation its own right.
Tools allow 6 ... 6 because the range contains at least one value. We believe
this is fine, as the resulting case label will have a value that can match.
Tools should probably warn for this situation.
Operator syntax
Although the use of the ... operator is existing practice in GCC and compatible
compilers, it does have a significant problem in practice:
case 11...12: // syntax error, PP-number with too many dots
  foo ();
  break;
A space is needed between the low and high values and the range operator,
because a dot is a valid part of a PP-number in the preprocessor syntax and
therefore the above form 11...12 is lexed by most tools as a single number,
which then fails translation into a valid C-language token.
Worse, this is inconsistent:
enum {
  ELEVEN = 11,
  TWELVE = 12,
};
case ELEVEN...TWELVE: // this is fine 🍵🔥
  foo ();
  break;
The above snippet parses, because dots do not form part of an identifier, but is fragile and will break if the code is edited. For instance, if the named constants were implemented as macros:
#define ELEVEN 11
#define TWELVE 12
the source file would parse correctly on a normal translation pass (where the
preprocessor would correctly split the macro name and ellipsis first), but be
broken if the user attempted to generate and use a preprocessed output file
(-E in GCC).
Essentially this operator is making the same mistake that compound assignment
did in the original B language with the =+, =-, =* operators.
We therefore suggest that some other operator may be a better choice for this
syntax. :: is available in the language thanks to C23 attributes, but currently
unused for any expression purpose, and may be a good fit as it is visually
similar to the "slice" syntax used in other languages. -> is also an option,
which while not unused, is not valid with arithmetic operands. A new operator
such as ^^, ~>, etc. might also be possible.
(The problem with ... is not unresolvable and it would be fairly simple to
add a conversion from "PP-number containing ..." to a sequence of two or three
C-language tokens; but this still seems confusing as it breaks the 1-1 between
preprocessor and C tokens; and anyway neither Clang nor GCC have done so.)
For the rest of this document we assume that :: and -> have been chosen as
range operators, but are not wedded to this and welcome feedback about alternative
choices.
Open intervals
The GCC feature describes a closed interval (in the range expression low ... high,
the values in the range include both low and high).
case 1 ... 4:
// exactly equivalent to
case 1:
case 2:
case 3:
case 4:
A right-open interval is often also intuitive, especially for future uses in array contexts, as it can refer to the element count as the right hand operator. We may wish to allow this with a separate operator. This is included below as an optional part of the proposal.
case 1 ~> 4: // pretend ~> operator exists for right-open
// would be equivalent to
case 1:
case 2:
case 3:
// 4 is past-the-end
A fully-open and left-open interval seem less likely to be useful in practice and on balance it is probably not worth assigning dedicated syntax for these, but this decision could be revisited if the demand exists.
Prior Art
This is a core GNU C language feature and is supported by most compilers that claim any level of GCC compatibility. GCC itself, as well as Clang, and third party tools emulating GNU C such as QAC, all support this feature.
Experience implementing the feature in Helix QAC suggests that the burden on
implementers is minimal for both current usages. The case feature is essentially
trivial as the relevant case labels can, if necessary, be expanded out just
after parsing-time without even needing the backend to be aware of them (though
in practice we found that large ranges expanded this way could potentially
cause surprising slowdowns in an optimizing or dataflow backend, and that it was
better to integrate a backend-aware form of support; this was not complicated but
will depend on the tool architecture, if it can handle if style expressions in
the branches).
The designator feature is not complicated to implement at all, but does raise
potential user confusion issues as the designators can allow repetition, the
overriding of side effects, etc. These are issues that already exist within the
Standard initializer rules. GCC also surprisingly allows empty ranges to compile
in this context, which it rejects for case labels; these introduce some more
potential for user confusion as there is now another way for side effects to be
silently eliminated even without the user overriding an initializer. As a result
we think the syntax should be reserved but addition of this part of the feature
held off until the existing issues are cleaned up.
Slice syntax
A very related use to designator ranges, which initialize a subrange of an array, is slices, which read from an initialized object (or perhaps describe an lvalue). A number of languages provide this feature and use a range-like expression as the operand to the subscript operator or its equivalent in order to create a slice value (whatever value category that is in the respective language).
We do not propose standardizing anything like this at this time, but by making
the range expression its own expression kind instead of baking it directly into
the syntax of case specifically, we leave the door open to reuse a single
syntax for all range- and slice-related features that may be introduced in
future. This will keep the language as consistent as possible.
In languages which support slices, one or both operands to the range operator can be omitted and implicitly refer to the maximum extent of the range. Since we are not currently proposing slices, we do not include this in the current proposal. It may prove useful to add in future.
Proposed wording
The proposed changes are based on the latest public draft of C23, which is N3096. Bolded text is new text when inlined into an existing sentence.
Constant expressions
Add a new section within 6.6 "Constant expressions":
6.6.1 Constant range expressions
Syntax
constant-range-expression:
constant-expression::constant-expression
constant-expression->constant-expressionDescription
A constant range expression is a special form to describe a sequence of constant expressions with contiguous incrementing values, from a low value on the left, to a high value on the right.
The resulting sequence cannot be used as a value expression and is only permitted in specific contexts.
Constraints
The left-hand value and the right-hand value shall be integer constant expressions that each evaluate to a representable value in the converted expression type.
(optionally) When the
::operator is used, the right-hand value shall not have a value less than the left-hand value.(optionally) When the
->operator is used, the right-hand value shall not have a value less than or equal to the left-hand value.Semantics
A range expression describes a sequence of integer constant expressions without listing each intermediate value explicitly. This is permitted as the operand to a
caselabel to indicate that a single label matches multiple values.The usual arithmetic conversions are applied to the left-hand and right-hand operands. Each value in the result sequence of integer constant values has the common type of the converted values.
A range expression that uses the
::operator to describe the sequence is a closed range. The resulting sequence contains all integer values in sequence starting from and including the left, low value, up to and including the right, high value.A range expression that uses the
->operator to describe the sequence is a right-open range. The resulting sequence contains all integer values in sequence starting from and including the left, low value, up to the value one less than the right, high value, but not including the high value itself.(optional, only if the optional constraints are not included) If the right-hand value of a closed range is less than the left-hand value, or the right-hand value of a right-open range is less than or equal to the left-hand value, the sequence described by the range is empty.
Recommended practice
Implementations are encouraged to emit a diagnostic message when a closed range only describes a single value, that is, when the low and high value are the same.
(optional, only if the optional constraints are not included) Implementations are encouraged to emit a diagnostic message when a range is empty.
EXAMPLE 1 Range expressions may be used as the operand to a
caselabel to concisely describe a sequence of matching values in one label:switch (n) { case 2::5: // equivalent to // case 2: // case 3: // case 4: // case 5: f (); break; case 6->10: // equivalent to // case 6: // case 7: // case 8: // case 9: f (); break; }EXAMPLE 2 The right-open range is useful when a value describes a past-the-end index, such as the size of an array:
int arr[N]; switch (i) { case 0->N: // matches the range of arr f (arr[i]); break; }EXAMPLE 3 The type of the values in the range is the arithmetic converted type of the left and right operands, so the elements of this range all have the type
unsigned int:case 1 :: 10u:EXAMPLE 4 This range expression describes a range with elements that have the type
unsigned longand where the low value is greater than the high value after conversion to the unsigned type,
(optionally) which is a constraint violation:
(alternatively) resulting in an empty range:case -10 :: 10ul:Forward references:
switchstatements (6.8.4.2)
We prefer to include the optional constraints instead of the recommended practice.
Case
Within 6.8.1 "Labeled statements", Syntax, paragraph 1 (the grammar), add a new production to label:
label:
attribute-specifier-sequence opt identifier:
attribute-specifier-sequence optcaseconstant-range-expression:
attribute-specifier-sequence optcaseconstant-expression:
attribute-specifier-sequence optdefault:
Constraints are added to the description of the range expression itself and are therefore not needed in this section.
Add a new paragraph after paragraph 4:
A
caselabel specifies the value of the following constant expression, if followed by a single constant expression, or each value in the sequence described by the following range, if followed by a constant range expression.
Switch
Within 6.8.4.2 "The switch statement", modify paragraph 3:
The expression of each
caselabel shall be an integer constant expression or a constant range expression and no two of thecaseconstant expressions associated to the sameswitchstatement shall specify the same value after conversion. There may be at most onedefaultlabel associated to aswitchstatement. (Any enclosedswitchstatement may have adefaultlabel orcase-specified values that duplicatecase-specified values in the enclosingswitchstatement.)
Modify paragraph 4:
...and on the presence of a
defaultlabel and the specified values of anycaselabels on or in theswitchbody...
Modify paragraph 5:
The integer promotions are performed on the controlling expression. The values specified by each
caselabel are converted to the promoted type of the controlling expression. If a converted value matches that of the promoted controlling expression, control jumps to the statement or declaration following the matchedcaselabel. Otherwise, if there is adefaultlabel, control jumps to the statement or declaration following thedefaultlabel. If no convertedcasevalue matches and there is nodefaultlabel, no part of theswitchbody is executed.
Optionally (if the constraint against empty ranges was not added), add a new sentence to the end of paragraph 5:
A
caselabel with a range expression that describes an empty range (which occurs when the low value has a value greater than the high value, or equal to the high value for a right-open range) never matches any value of the controlling expression footnote.footnote) even if the controlling expression has the same value as one of the operands to the empty range expression.
Modify paragraph 6:
As discussed in 5.2.4.1, the implementation may limit the number of
caselabels in aswitchstatement.
Add a new example after paragraph 7:
EXAMPLE 2 A value specified as part of the sequence in a range expression argument to
caseshall not be repeated by a different label:switch (expr) { case 1 :: 5: // ... break; case 6: // OK, 6 has not been specified // ... case 4: // constraint violation: // ... // 4 was specified by 1 :: 5 break; case 3 :: 8: // constraint violation: // ... // 3, 4, 5 were specified above break; }
Questions for WG14
Does WG14 want to add constant-range-expression as a toplevel expression kind?
Would WG14 like to allow a second toplevel range expression that does not require its operands to be constant? (not proposed here)
Does WG14 want to include the constraint that the low value must be less than or equal to the high value (i.e. that a range cannot be empty)?
Does WG14 prefer to use some other operators than :: and -> for the range
expressions?
Would WG14 like to see further work on ranges for designators?