JTC1/SC22/WG21 N3994

Document number: N3994
Date: 2014-05-22
Project: Programming Language C++, Evolution Working Group
Reply-to: Stephan T. Lavavej <stl@microsoft.com>


Range-Based For-Loops: The Next Generation (Revision 1)


I. Introduction

This updates N3853 (see [1]) which proposed the syntax "for (elem : range)",
by adding support for attributes and answering additional questions.
Please see the original proposal for the rationale behind this feature,
which is not repeated here.


II. Standardese

1. In 6.5 [stmt.iter]/1 and A.5 [gram.stmt], after:

    iteration-statement:
        [...]
        for ( for-range-declaration : for-range-initializer ) statement

add:

        for ( for-range-identifier : for-range-initializer ) statement

2. In 6.5 [stmt.iter]/1 and A.5 [gram.stmt], after:

    for-range-initializer:
        expression
        braced-init-list

add:

    for-range-identifier:
        identifier attribute-specifier-seq<sub>opt</sub>

3. At the end of 6.5.4 [stmt.ranged], add a new paragraph:

    A range-based for statement of the form
        for ( for-range-identifier : for-range-initializer ) statement
    is equivalent to
        for ( auto&& for-range-identifier : for-range-initializer ) statement


III. Questions And Answers

Q15. Has this been implemented?

A15. Yes!  David Vandevoorde and Jonathan Caves have reported that they were
able to implement N3853 in less than an hour each.  As N3853's "Q8. What about
attributes?" was an open question, Vandevoorde took the opportunity to support
them on both sides of the identifier, while this revision differs for the
following reason:

Q16. Why are attributes not permitted to appear before the identifier?

A16. Their meanings would be ambiguous to humans.  A for-range-declaration of
the form "auto&& elem" can be marked with attributes in several places:
"[[attr1]] auto [[attr2]] && [[attr3]] elem [[attr4]]".  (See the definitions
of for-range-declaration in 6.5 [stmt.iter]/1, decl-specifier-seq in
7.1 [dcl.spec]/1, ptr-operator in 8 [dcl.decl]/4, and noptr-declarator in
8 [dcl.decl]/4, respectively.)  Here's what these attributes appertain to:

attr1 appertains to elem.  6.5.4 [stmt.ranged]/1 produces
"for-range-declaration = *__begin;", and 7 [dcl.dcl]/1 says:
"The attribute-specifier-seq in a simple-declaration appertains to each of the
entities declared by the declarators of the init-declarator-list."

attr2 appertains to auto.  7.1 [dcl.spec]/1 says: "The optional
attribute-specifier-seq in a decl-specifier-seq appertains to the type
determined by the preceding decl-specifiers (8.3)."

attr3 appertains to auto&&.  8.3.2 [dcl.ref]/1 says: "The optional
attribute-specifier-seq appertains to the reference type."

attr4 appertains to elem.  8.3 [dcl.meaning]/1 says: "The optional
attribute-specifier-seq following a declarator-id appertains to the entity
that is declared."

Permitting "for ([[attrBefore]] elem : range)" could lead to confusion - does
that expand to "for ([[attrBefore]] auto&& elem : range)" (like attr1) or to
"for (auto&& [[attrBefore]] elem : range)" (like attr3)?  The Standard could
unambiguously choose a particular meaning, but programmers could still be
confused - not everyone reads the Standard for a living.

In contrast, permitting "for (elem [[attrAfter]] : range)" isn't problematic.
As one would expect, attrAfter appertains to elem because this expands to
"for (auto&& elem [[attrAfter]] : range)" (like attr4) with no potential
for confusion.

In the highly unlikely event that a programmer needs to apply an attribute to
auto or auto&&, they can simply fall back to The Original Syntax of range-for.

Q17. Are you sure that you want to use auto&& (to permit modification)
instead of const auto& (to forbid modification)?

A17. Yes.  This is a common question, because most loops observe elements and
unintentional modification is dangerous.  However, some loops have to modify
elements - not just through assignments, but also through calling non-const
member functions.  The philosophy behind this proposal's minimal range-for
syntax is that programmers basically never view elements as being separate
from their containers (or ranges in general).  To avoid surprises, range-for
should "transparently" access elements.  That certainly means in-place
(instead of copying), but it also means with the same constness as the range.

Implementation experience actually exists to guide this decision.  Visual C++
provides the non-Standard syntax "for each (Elem elem in range)".  In addition
to being more verbose and less flexible than C++11's range-for syntax (which
permits ADL customization), the implementation of "for each" adds constness
for poorly understood reasons, so "for each (Elem& elem in range)" cannot be
used to modify elements in-place.  This limitation has repeatedly confused
users, as encountered on Microsoft's internal mailing lists.

Programmers who really want to add constness when observing a non-const range
will still be able to say "for (const auto& elem : range)", but it would be
confusing and limiting if "for (elem : range)" silently added constness.

If the EWG wants to make adding constness slightly more convenient,
syntax like "for (const elem : range)", "for (elem : const range)",
or "for const (elem : range)" could be considered, but isn't being
proposed here.

Q18. Are you sure that you want to use auto&& to handle prvalues instead of
decltype(auto) or something else?

A18. Yes.  N3853 considered various alternatives (see Q4), which would cause
more problems than they would solve.  In Issaquah, the EWG didn't object to
relying on auto&&, which works for proxy objects most of the time
(and compilers can freely warn about the dangers).

Thiago Macieira has suggested using decltype(auto) (see [2]), which has
slightly different behavior than auto&&.  They both produce X& for lvalues
and X&& for xvalues.  For prvalues, decltype(auto) produces X, while auto&&
produces X&&.  But while decltype(auto) preserves information about the
element's value category, it doesn't work with non-copyable/non-movable types,
whereas auto&& works (as Marc Glisse observed in c++std-ext-14747 and
Richard Smith confirmed in c++std-ext-14749).  Here's Smith's example,
slightly expanded:

struct X {
    X(int) { }
    X(const X&) = delete;
};

X f() {
    return { 0 };
}

int main() {
    X&& r1 = f(); // OK: no copying
    auto&& r2 = f(); // also OK

    X x3 = f(); // error: copying
    decltype(auto) x4 = f(); // also error
}

This proposal uses auto&& because it works with anything that *__begin
can return.

Q19. Instead of this proposal's semantics, should the syntax
"for (elem : range)" be given the semantics of assigning to an
elem variable previously declared outside the loop?

A19. No.  This question (also raised by Macieira in [2]) is reasonable,
and related to N3853's "Q7. What about shadowing?".  The minimal syntax
"for (elem : range)" can be given only one meaning, so it should definitely
be chosen carefully.  However, "outside-element-variable" semantics would not
be useful in the vast majority of cases, would prevent this proposal from
solving the problem of unintentional copies in C++11's range-for, and would
actually encourage unnecessary copy assignments.

First, consider traditional iterator/pointer/index loops.  (For brevity,
I'll refer to iterators, but pointers and indices behave identically here.)
It's usually preferable for iterators to be scoped to their loops - usually,
but not always.  Iterators declared outside of their loops can be used to do
a couple of things: carry information into the loop, and carry information
out of the loop.  Occasionally, a function obtains an iterator inside a range,
and wants to loop over the remaining subrange, instead of starting from the
beginning.  A loop-scoped iterator could be copied from the given iterator,
but directly using the given iterator is often simpler (as it avoids
introducing an additional variable).  More importantly, longer-lived iterators
can be used to carry information outside of their loops.  After a loop with
one or more potential breaks has finished, the iterator can be inspected to
determine whether the loop ran to the end of the range, or broke out earlier.

Next, observe how range-based for-loops are different.  They give elements to
users, not iterators (although they internally use iterators).  They insist
on starting at the beginning, which couldn't be affected by
"outside-element-variable" semantics, because element values don't contain
positioning information like iterator values do.  (If the EWG wants to make it
convenient to start range-based for-loops somewhere other than the beginning,
that should be accomplished via range adapters; such adapters would work
equally well with C++11's range-for and this proposal.)

Most importantly, "outside-element-variable" semantics would have significant
difficulties with carrying information out of the loop:

* Observing the element that the loop was looking at when it finished
(either normally or early) doesn't provide positioning information
(i.e. where the loop finished), unlike observing an iterator.

* The case of running to completion, and the case of breaking while
observing the range's last element, produce the same observable state
for an "outside-element-variable", unlike an iterator.

* Dealing with a potentially empty loop is problematic for an
"outside-element-variable", unlike an iterator.  This is especially
problematic if there are no "sentinel values" available, i.e. values for
initializing the outside element that can be distinguished from any
elements expected in the input range.

So while outside-iterator loops are occasionally very useful,
"outside-element-variable" loops are fraught with peril (and inefficiency
due to copy assignments).

N3853 argued that "In addition to reducing overall verbosity, making common
cases terse has the bonus effect of making uncommon cases stand out due to
their remaining verbosity."  For ranges, the most common case is looping over
elements in-place.  Giving the current element a name (that can be conveniently
mentioned by the body of the loop) requires initializing a reference that's
scoped to the current iteration - it has to be a reference in order to work
in-place, and references can't be rebound.  (And as we've just seen, an
"outside-element-variable" isn't really useful.)  The minimal syntax and
in-place semantics of this proposal have been chosen to work together.


IV. Acknowledgements

Thanks to David Vandevoorde and Jonathan Caves for providing implementation
experience.  Additionally, Vandevoorde suggested the support for attributes
that was added in this revision.  Thanks to Thiago Macieira, Marc Glisse,
and Richard Smith for their comments.  Thanks to Deskin Miller, Eric Albright,
Giovanni Dicanio, and Neil Coles for reviewing this proposal.


V. References

All of the Standardese citations in this proposal are to Working Paper N3936:
http://www.open-std.org/jtc1/sc22/wg21/prot/14882fdis/n3936.pdf

[1] N3853 "Range-Based For-Loops: The Next Generation" by Stephan T. Lavavej:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3853.htm

[2] "Why do range-for loops require a variable declaration?"
by Thiago Macieira:
https://groups.google.com/a/isocpp.org/d/msg/std-proposals/BgE7b7aqE08/4fJVPes-8fgJ

(end)