Proposal for C2y
WG14 3474

Title:               Simplified lexical scope for labels
Author, affiliation: Alex Celeste, Perforce
                     Jan Schultke
                     Sarah QuiƱones
Date:                2025-02-14
Proposal category:   Feature refinement
Target audience:     Compiler implementers, users

Abstract

Named loops were accepted into C2Y with consensus from WG14 at the Fall 2024 meeting. However, the feature as adopted remains contentious because of some problems caused by reusing the label: feature in this way. This paper aims to show how the major scoping problems of labels can be resolved without needing to move away from label syntax, which would deviate from existing practice in the field. This paper aims to maintain consistency between C and a corresponding C++ feature that is being discussed by WG21.


Simplified lexical scope for labels

Reply-to:     Alex Celeste (aceleste@perforce.com)
Document No:  N3474
Revises:      N/A
Date:         2025-02-14

Summary of Changes

N3474

Introduction

After adoption of N3355 into C2Y, which provides the core "named loop" functionality, various potential issues were identified and laid out by Keane et al. in N3377 "Named Loops Should Name Their Loops".

For a discussion of the motivations for adopting the functionality itself, which has already been accepted into C2Y, please refer to N3355.

Scoping Issues

The most important issue identified in that document is that N3355's proposed syntax reuses labels, which are currently defined in C to have "function scope". This is a single, non-nesting, lexically-unordered scope that spans the entire body of a function definition, in order to allow code like this:

void f (int x, int y)
{
  if (x == y) {
    goto mylabel;  // looks "into" two braced scopes on subsequent lines
  }                // to find the declaration of the label to jump to
  
  for (int z = 0; z < y; ++ z) {
    if (z > x) {
      mylabel:
      g ();
    }
  }
}

...as well as forward jumps within the same level of bracing (used for early-exits), jumps backward and potentially into nested braces, and so on. This is useful functionality and core to how goto works in C.

A consequence of this is that, without extensions such as GCC's locally declared labels, it is not possible to reuse the same label name twice within the same function body:

#define SomeWork(X) ({          \
  if (bad (X)) goto failblock;  \
  ...work                       \
failblock:                      \
  result;                       \
})

// cannot be used this way:
void f (int x, int y) {
  y = SomeWork(x);
  y = SomeWork(x);  // error - duplicate 'failblock'
}

GCC adds locally declared labels (just) to make this possible:

#define MoreWork(X) ({          \
  __label__ failblock;          \
  if (bad (X)) goto failblock;  \
  ...work                       \
failblock:                      \
  result;                       \
})

void g (int x, int y) {
  y = MoreWork(x);
  y = MoreWork(x);  // OK
}

The __label__ feature declares an identifier has having block scope, which can later be used to define a goto target. The advantage of this for GCC is that it does not interfere with common use of labels in non-GNU parts of the code.

For named loops, this manifests as every loop in a function having to have a distinct name, even if the loop was introduced by a macro. Kean et al. provide an example of an update macro which expands to a named loop, but the problem generalizes to:

void h (int x, int y) {
  loop: for (int z = 0; z < x; ++ z) {
    if (z == 10) break loop;
  }
  
  loop: for (int z = 0; z < y; ++ z) {
    if (z == 10) break loop;
  }
}

It is clear enough to a human reader what this means - the name is associated with the block, and there is only one reasonable target for the break in each loop - so reusing the name loop ought to make exactly as much sense as reusing the name z for the loop counter. But C's label scoping rules do not allow this.

Other issues

N3377 also raises a stylistic concern about whether the loop label properly implies the jump target. We believe this is ultimately a subjective matter and do not address it here.

Any syntax will become intuitive if a user makes use of it enough (see also: declaration-reflects-use, ternary operators, [static], ...).

N3377 raises a concern that because labels can be freely positioned within a block in C23 (and in many implementations, already could), an implementation is not required to associate a label with the entity it "names". We do not believe this is a real barrier to implementability; from experience (Helix QAC already internally treats labels as separate statements, "associating" them with the next block item), this is easy enough to work with.

Proposed solution

Whereas Keane et al. propose a new kind of syntax for label names, which adds new scoping rules for the new form, we propose simply relaxing the scoping rules around labels, for the scenarios where the code makes intuitive sense.

Instead of contorting the concept of "scope" to use it to define labels in a way that goto can then use, we propose that the special lookup property becomes associated with the goto statement specifically, rather than with labels.

Labels are not objects and do not have a value (outside of extensions which we do not propose to adopt); they have no storage, and no storage duration, and cannot be used in arbitrary expressions (again outside of extensions). Therefore, instead of treating them as having a single consistent scope, or kind of scope, we suggest that their lookup should be made entirely context dependent:

The effect of this is implicit: labelled break and continue "just work", because the jump target is already defined in terms of the containing named control structure ("...the jump exits the switch or iteration statement named by the label ..."; "... the jump is to the loop-continuation of the iteration statement named by the label ..."). Specifying that lookup refers to an "innermost enclosing" control structure is more direct than using scope as a tool of abstraction here.

The goto statement also defines its action in terms of "a label located somewhere in the enclosing function" and does not need to make reference to any kind of scope for this to work.

All existing valid C code making use of goto continues to be valid with these relaxed constraints. Additionally, the desirable code above, where the same identifier can name more than one loop or switch in the same function body, works in the intuitive and expected way without needing additional text.

Impact

The GCC Labels as Values extension allows label names to appear within arbitrary expressions, and therefore uses a more traditional form of lookup.

Since this extension is used with goto *, we suggest that the label-address operator && should impose the same constraint as goto itself on any identifier that appears as its operand.

This should mean all existing code using && continues to work.

Currently, the same label (not just the same identifier) can be used to both name a control structure, and as the target of a goto. This should continue to work without change so long as only one control structure has the given name. If two control structures share the same name, the user should not expect goto to work because the target is ambiguous. There is no special overlap between constraints here.

Alternatives

N3377

A number of potential issues with the alternative syntax proposed in N3377 [were also identified by Schultke et al. in P3568][https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3568r0.html#opposition-to-n3377].

N3377 introduces a new kind of label, increasing complexity of the language for users and for tooling. It occupies a new position in the syntax, which rules out certain extensions that aim to put context-sensitive keywords in that place. Subjectively, it may be more difficult to read as it adopts a "middle-endian" convention, and does not fix the subjective issues with label placement at the beginning of a loop. It is not symmetric with the rest of the language syntax and would not permit extensibility such as block-breaking (not currently proposed for C, but logically consistent and present in other languages).

Further discussion in greater detail is provided in P3568.

GNU locally declared labels

Locally declared labels can be used effectively to solve the specific problem of designing macros that are meant to be self-contained and not clash with the user scope or other macros within it. The feature is already widely-used in order to provide this capability, especially in combination with statement expressions. However, outside of macros, writing it out is subjectively ugly, and raises complexity because it requires an explicit opt-in to the "right" semantics by way of an extra declaration. It requires a new keyword and introduces more complicated scoping concepts instead of simplifying them.

The interactions between locally-declared labels, regular labels, and other GNU C extensions, is also not necessarily entirely clear: by changing the scope discipline, the power of goto is also changed to implicitly allow emergent functionalities like jumping out of nested function invocations.

gensym

A gensym macro could be provided by the implementation or built atop other features like __COUNTER__. This could be used to create new identifiers at the point of expansion, which would allow every expansion of a macro relying on label names to be unique.

This also requires the user to opt-in to the correct or intuitive behaviour, and is similarly subjectively ugly and potentially quite difficult to use correctly, depending on how the expansion rules are defined. This seems unnecessarily complicated compared to the other available options, and cannot be used for the non-macro case at all.

C++ Compatibility

A similar proposal has been presented to WG21 as P3568.

WG21 voted to strongly support the feature itself, but did not express a strong preference between the N3355 syntax and the N3377 syntax.

WG21 expects to adopt whichever syntax WG14 prefers.

Prior Art

Named loops also have a distinct advantage of having substantial prior art across multiple other programming languages. C is not bound by any other language but to have a control feature behave in exactly the same way as precedent set by the wider Community is extremely good for readability and lowers the surprise factor. The idiom has been proven to work well in practice, and there is no good reason for C to diverge from a model the rest of the programming language meta-community seems to have found clearest.

For instance, in Rust:

#![allow(unreachable_code, unused_labels)]

fn main() {
    'outer: loop {
        println!("Entered the outer loop");

        'inner: loop {
            println!("Entered the inner loop");

            // This would break only the inner loop
            //break;

            // This breaks the outer loop
            break 'outer;
        }

        println!("This point will never be reached");
    }

    println!("Exited the outer loop");
}

In Javascript (see also ECMA-262 14.8, 14.9, 14.13):

let str = '';

loop1: for (let i = 0; i < 5; i++) {
  if (i === 1) {
    continue loop1;
  }
  str = str + i;
}

In Java (see also JLS 14.7, 14.15, 14.16):

search:
for (i = 0; i < arrayOfInts.length; i++) {
    for (j = 0; j < arrayOfInts[i].length;
         j++) {
        if (arrayOfInts[i][j] == searchfor) {
            foundIt = true;
            break search;
        }
    }
}

The proposed syntax matches all three of these languages exactly (modulo Rust's slightly different syntax for the label name itself). This is therefore almost certainly the least confusing and most user-friendly option to imitate.

Identical syntax to N3355's adopted version was also used by the Cpp2 project, and very similar syntax has been present in Perl for a long time.

Teachability

(i.e. consistency with the Prior Art of C itself)

The advantage of the existing structured jump operators is that the user does not need to think about where the jump to, whereas with goto, knowing where the target label is, is core to understanding what the jump itself will do.

We consider that the same principle continues to hold with the named jump variants:

Even with names changing which block the jump targets the end of, the direction of the jump and its local effect on execution of the currently-enclosing block is completely consistent, no matter what containing control structure is targeted. Therefore, statement-local reasoning doesn't change at all, and zooming slightly outwards, the reasoning is consistent (more consistent when considering the ability to eliminate the asymmetric behaviour of unlabeled jumps w.r.t switch).

Proposed wording

The proposed changes are based on the latest public draft of C2y, which is N3467. Bolded text is new text when inlined into an existing sentence.

Delete 6.2.1 "Scopes of identifiers, type names, and compound literals", paragraph 3, and replace by:

A label name is a special kind of identifier that is not associated with a scope. A label name can be used (in a goto statement) from anywhere in the function in which it appears, including any point before its definition; and if not used by goto, the same identifier may appear as the name of more than one label.

A description of the mechanics of named break and continue, or even goto, is not needed here.

Modify 6.8.2 "Labeled statements":

Delete paragraph 3 ("shall be unique").

Modify paragraph 7 to add a number ("1") to the EXAMPLE, and add a second example after it:

EXAMPLE 2 Labels with the same identifier may name more than one statement within the same function:

loop: for (int z = 0; z < x; ++ z) {
  /* loop may be used from here, referring to the first instance */
}

loop: for (int z = 0; z < y; ++ z) {
  /* loop may be used from here, referring to the second instance */
}

Both for loops are named by an identifier loop.

Modify 6.8.7.2 "The goto statement", adding a new paragraph to "Constraints" after paragraph 1:

There shall be exactly one label in the enclosing function with the same identifier named by a goto statement.

Modify 6.8.7.3 "The continue statement", paragraph 4:

If the continue statement has an identifier operand, the jump is to the loop-continuation of the innermost enclosing iteration statement named by the label with the corresponding identifier. Otherwise, the jump is to the loop-continuation of the innermost enclosing iteration statement.

Modify 6.8.7.4 "The break statement", paragraph 4:

If the break statement has an identifier operand, the jump exits the innermost enclosing switch or iteration statement named by the label with the corresponding identifier. Otherwise, the jump exits the innermost enclosing switch or iteration statement.

Modify Annex I, adding a bullet point to I.2:

Questions for WG14

Does WG14 want to accept the proposed change to relax the scoping rules for labels, retaining the existing syntax for named loops?

Acknowledgements

Thanks to Jan Schulkte, Sarah QuiƱones, and Herb Sutter for supporting this counter proposal.

References

C2y public draft
P3568
N3355
N2859
N3377
Locally Declared Labels (GCC)
Labels as Values
Named loops in Rust
Named loops in Javascript
ECMAScript 2023
Named loops in Java
Java language specification
Named loops in Cpp2
Named loops in Perl
WG21 votes on named loops
gensym (Common Lisp Hyper Spec)
The __COUNTER__ predefined macro