Digit Separators

ISO/IEC JTC1 SC22 WG21 N3661 - 2013-04-19

Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org

Problem
Solution
Proposal
    2.14.2 Integer literals [lex.icon]
    2.14.4 Floating literals [lex.fcon]
    2.14.8 User-defined literals [lex.ext]
    C.new.new Clause 2: lexical conventions [diff.cpp11.lex]

Problem

Numeric literals of more than a few digits are hard to read. Consider the following tasks.

Solution

The problem has a long history of solutions in writing and typography, digit separators. In the English-speaking world, commas are usually used to separate digits.

We wish to introduce digit separators into C++. Much discussion of constraints and alternatives appears in N3499. We propose using an underscore (aka low line) as a digit separator and a double radix point (aka double dot) as a disambiguating suffix separator.

Proposal

2.14.2 Integer literals [lex.icon]

Edit the grammar as follows. Editor, note the change to the binary literal syntax as described in N3472.

integer-literal:
decimal-literal integer-suffixopt
octal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt
decimal-literal:
nonzero-digit
decimal-literal digit-separatoropt digit
octal-literal:
0
octal-literal digit-separatoropt octal-digit
hexadecimal-literal:
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-literal digit-separatoropt hexadecimal-digit
binary-literal:
0b binary-digit
0b binary-digit
hexadecimal-literal digit-separatoropt binary-digit
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
octal-digit: one of
0 1 2 3 4 5 6 7
hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
digit-separator:
_

Edit paragraph 1 as follows.

An integer literal is a sequence of digits that has no period or exponent part, with optional digit separators. These separators are ignored when determining its value. .... [Example: the The number twelve can be written 12, 014, or 0XC. The literals 1048576, 1_048_576, 0X100000, 0x10_0000, and 0_004_000_000 all have the same value.end example]

2.14.4 Floating literals [lex.fcon]

Edit the grammar as follows.

floating-literal:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
sign: one of
+ -
digit-sequence:
digit
digit-sequence digit-separatoropt digit

Edit within paragraph 1 as follows.

.... The integer and fraction parts both consist of a sequence of decimal (base ten) digits, with optional digit separators. These separators are ignored when determining the value. [Example: The literals 1.602_176_565e-19 and 1.602176565e-19 have the same value. —end example] ....

2.14.8 User-defined literals [lex.ext]

Edit the grammar as follows. Editor, note the change to the binary literal syntax as described in N3472.

user-defined-literal:
user-defined-integer-literal
user-defined-floating-literal
user-defined-string-literal
user-defined-character-literal
user-defined-integer-literal:
decimal-literal ud-suffix separated-suffix
octal-literal ud-suffix separated-suffix
hexadecimal-literal ud-suffix separated-suffix
binary-literal ud-suffix separated-suffix
user-defined-floating-literal:
fractional-constant exponent-partopt ud-suffix separated-suffix
digit-sequence exponent-part ud-suffix separated-suffix
user-defined-string-literal:
string-literal ud-suffix
user-defined-character-literal:
character-literal ud-suffix
separated-suffix:
suffix-separatoropt ud-suffix
suffix-separator:
..
ud-suffix:
identifier

Edit paragraph 1 as follows.

If a token matches both user-defined-literal and another literal kind, it is treated as the latter. [Example: 123_km and 123.._km is a user-defined-literal are user-defined-literals, but 123_456 and 12LL is an integer-literal are integer-literals. —end example] The syntactic non-terminal preceding the ud-suffix or separated-suffix in a user-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.

C.new.new Clause 2: lexical conventions [diff.cpp11.lex]

Add a new section as follows. Editor: please incorporate with N3652.

Add the new text block below.

2.14 [lex.literal]

Change: Digit separator support.

Rationale: Required for new features.

Effect on original feature: Valid C++ 2011 code may change meaning, and hence possibly fail to compile, in this International Standard. A user-defined literal suffix that begins with an underscore followed by a character that may be interpreted as a digit within the context of the enclosing literal may change meaning. For example, 10_10 changes from integer 10 with a suffix of _10 to an integer 1010. The original meaning can be restored with 10.._10. The literal 0x1234_goo has suffix _goo but the literal 0x1234_foo has suffix oo. The 0x1234.._foo has suffix _foo.