Document number: | N1757 |
05-0017 | |
Author: | Daveed Vandevoorde |
Edison Design Group | |
Date: | 2005-01-14 |
Ever since the introduction of angle brackets, C++ programmers have been surprised by the fact that two consecutive right angle brackets must be separated by whitespace:
The problem is an immediate consequence of the the “maximum munch” principle and the fact that >> is a valid token (right shift) in C++.#include <vector> typedef std::vector<std::vector<int> > Table; // OK typedef std::vector<std::vector<bool>> Flags; // Error
This issue is a minor, but persisting, annoying, and somewhat embarrassing problem. If the cost is reasonable, it seems therefore worthwhile to eliminate the surprise.
The purpose of this document is to explain ways to allow >> to be treated as two closing angle brackets, as well as to discuss the resulting issues. A specific option is proposed along with wording that would implement the proposal in the current working paper.
The example above shows the most common context of double right angle brackets: Nested template-ids. However, the “new-style” cast syntax may also participate in such constructs. For example:
This situation currently occurs fairly rarely because the template-ids involved always represent class types, whereas these casts usually involve pointer, pointer-to-member, or reference types.static_cast<List<B>>(ld)
However, if template aliases make it into the language (and it seems likely they will), then template-ids will be able to represent nonclass types. It seems therefore desirable to address the issue for all constructs with right angle brackets, not just for templates.
It is also worth noting that the problem can also occur with the >>= and >= tokens. For example
Both of these forms are currently ill-formed. It may be desirable to also address this issue, but this paper does not propose to do so.void func(List<B>= default_val1); void func(List<List<B>>= default_val2);
Solving our problem amounts to decreeing that under some circumstances a >> token is treated as two right angle brackets instead of a right shift operator. As it turns out, there are several general approaches to defining those “circumstances.”
Approach 1. The first approach is the simplest: Decree that if a left angle bracket is active (i.e. not yet matched by a right angle bracket) the >> token is treated as two right angle brackets instead of a shift operator, except within parentheses or brackets that are themselves within the angle brackets. A slight variation on that theme (call it “Approach 1b”) is to require at least two left angle brackets to be active since otherwise the construct would be an error (because there would be an excess of right angle brackets).
This strategy is similar to the treatment of the > token: If a left angle bracket is active, the token is treated as a right angle bracket, except within parentheses. For example:
A<(X>Y)> a; // The first > token appears within parentheses and // therefore is not a right angle bracket. The second one // is a right angle bracket because a left angle bracket // is active and no parentheses are more recently active.
Unfortunately, some programs may be broken by this approach. Consider the following example:
This program is valid today; it produces the following output:#include <iostream> template<int I> struct X { static int const c = 2; }; template<> struct X<0> { typedef int c; }; template<typename T> struct Y { static int const c = 3; }; static int const c = 4; int main() { std::cout << (Y<X<1> >::c >::c>::c) << '\n'; std::cout << (Y<X< 1>>::c >::c>::c) << '\n'; }
With the right angle bracket rule proposed above, the >> token in the second statement would change its meaning (from right shift to double right angle bracket) and the output would therefore become:0 3
0 0
Approach 2. To avoid the backward incompatibility, an alternative solution it to modify the rule proposed above to only treat the >> token as two right angle brackets when parsing template type arguments or template template arguments, but not when parsing template nontype arguments. This approach would make A<B<int>> valid, but would leave C<D<12>> ill-formed.
Another way to view this alternative approach is that a template argument is always parsed as far as possible (which may include right shift operators). When an argument is parsed, the next token must be a comma, a > treated as a single closing angle bracket, or (with this proposal) a >> token treated as a double angle bracket.
Approach 3. Finally, a third way to tackle the problem is to eliminate the right shift token altogether and to modify the grammar so that two consecutive > tokens are treated as a right shift operation in the appropriate circumstances. This would for example allow the following form:
If limited to the right shift token, this approach introduces no known new ambiguities, but it does introduce at least one backward compatibility issue: The ## preprocessing token can no longer be applied to two > tokens. However, it would be surprising to eliminate the right shift token and not the left shift token. Eliminating the left shift token does introduce new parsing ambiguities (e.g., &X::operator< <Y>). The shift-assign operators (<<= and >>=) lead to similar considerations. It may also come as a surprise that shift operations are realized through a two-token construct, whereas other operations (e.g., prefix and postfix --, or &&) use a single two-character token.int i = 10000 > > x;
Approach 1. As mentioned, the first proposal is analogous to the existing language rule for the > token. We therefore do not expect implementation difficulty for the approach.
Approach 2. The GNU and EDG C++ compilers currently implement the second proposed alternative for error recovery purposes. It would be trivial to promote the error recovery procedure to a correct parse procedure. (Other compilers appear to have a facility for the same purpose, but I do not know their exact strategy.)
Approach 3. I'm unaware of implementation experience with eliminating shift tokens and replacing them with grammar that allows two-token shift expressions.
I suggest we pursue “Approach 1” (which breaks some valid programs). Specifically, I propose that if even a single left angle bracket is active, a >> token not enclosed in parentheses is treated as two right angle brackets and not as a right shift operator. I do not recommend the variation described as “Approach 1b.”
My arguments for doing so are the following:
(While the approach of eliminating the shift tokens (approach 3) was presented for the sake of completeness, I find that it has enough small technical and aesthetic problems to make the other approaches far preferable.)
Insert after the last normative sentence of 14.2/3, but before the example:
Similarly, the first non-nested >> is treated as two consecutive but distinct > tokens, the first of which is taken as the end of the template-argument-list and completes the template-id. [ Note: The second > token produced by this replacement rule may terminate an enclosing template-id construct or it may be part of a different construct (e.g., a cast). --end note ]
Replace the example of 14.2/3 by the following:
[ Example:template<int i> class X { /* ... */ }; X< 1>2 > x1; // Syntax error. X<(1>2)> x2; // Okay. template<class T> class Y { /* ... */ }; Y<X<1>> x3; // Okay, same as "Y<X<1> > x3;". Y<X<6>>1>> x4; // Syntax error. Instead, write "Y<X<(6>>1)>> x4;".
Insert just before the first "Note:" of translation phase "7." in 2.1/1:
[ Note: The process of analyzing and translating the tokens may occasionally result in one token being replaced by a sequence of other tokens (14.2 temp.names). --end note ]
Insert a new paragraph 5.2/2 that reads:
[ Note: The > token following the type-id in a dynamic_cast, static_cast, reinterpret_cast, or const_cast, may be the product of replacing a >> token by two consecutive > tokens (14.2 temp.names). --end note ]
Insert in 14/1 just after the grammar rules:
[ Note: The > token following the template-parameter-list of a template-declaration may be the product of replacing a >> token by two consecutive > tokens (14.2 temp.names). --end note ]
Append to 14.1/1 (following the grammar rules):
[ Note: The > token following the template-parameter-list of a type-parameter may be the product of replacing a >> token by two consecutive > tokens (14.2 temp.names). --end note ]
Reflector messages: c++std-ext-6767,6771,6773,6775,6779,6786,6788,6789,6792,6793,6794,6799,6801,6809.
Previous revision: N1649/04-0089, N1699/0139.