This paper proposes syntax extensions to C++ in order to be able to write large numeric literals with separators between the digits to make them more readable.
This paper is largely based on N2281 = 07-0141 "Digit Separators" by Lawrence Crowl. The proposed wording changes have been updated for C++11 (more specifically, the latest working draft N3290).
This paper does not propose to add binary literals or hexadecimal floating-point literals; those are considered largely independent of this paper and thus can be addressed separately.
For most people, reading large numbers without additional (redundant) visual cues is hard. Examples:
Using a space character would cause a literal potentially to become two or more preprocessing-tokens, with rather substantial impact not only on the lexing phase, but also on the parsing phase of C++. Therefore, this paper proposes to use the underscore variant.
Using underscores conflicts with user-defined literals. Appropriate disambiguation is already provided for in the current wording, see 2.14.8 lex.ext paragraph 1, but the example can be improved for the new situation. In effect, that means a user-defined literal may not start with underscore-digit. Given that user-defined literals are already severely constrained (see 2.14.8 lex.ext and 17.6.4.3.5 userlit.suffix), this seems to be a mild inconvenience for the next revision of the standard.
The grammar production pp-number in 2.10 lex.ppnumber already permits underscores inside (via identifier-nondigit and nondigit). No changes are necessary.
Change in 2.14.2 lex.icon:
Change in 2.14.2 lex.icon paragraph 1:decimal-literal: nonzero-digit decimal-literal underscoreopt digit octal-literal: 0 octal-literal underscoreopt octal-digit hexadecimal-literal: 0x hexadecimal-digit 0X hexadecimal-digit hexadecimal-literal underscoreopt hexadecimal-digit underscore: _
An integer literal is a sequence of digits that has no period or exponent part, with optional separating underscores that are ignored when determining its value. ... [ Example: the number twelve can be written 12, 1_2, 014, 01_4, or 0XC. -- end example ]
Change in 2.14.4 lex.fcon:
Change in 2.14.4 lex.fcon paragraph 1:digit-sequence: digit digit-sequence underscoreopt digit
... The integer and fraction parts both consist of a sequence of decimal (base ten) digits, with optional separating underscores that are ignored when determining the value. ...Change in 2.14.8 lex.ext paragraph 1:
If a token matches both user-defined-literal and another literal kind, it is treated as the latter. [ Example: 123_km is a user-defined-literal, but 123_456 and 12LL are integer-literals12LL is an integer-literal. -- end example ] ...