P2713R0
Escaping improvements in std::format

Published Proposal,

Author:
Audience:
LEWG
Project:
ISO/IEC JTC1/SC22/WG21 14882: Programming Language — C++

1. Proposal

This paper provides wording for the resolution of national body comments [US38-098] and [FR005-134] per direction voted in SG16 Unicode and LEWG. The direction is summarized in [US38-098]:

The first poll confirms the intent that an escaped string be usable as a string literal (e.g., that it can be copied and pasted into a C++ program) such that, when evaluated as a string literal, the input used to produce the escaped string is reproduced. No changes are required to satisfy this poll.

The second poll clarifies that it is intended that the escaped string be readable by humans. The context for this poll was concern about producing visually ambiguous output. The SG16 conclusion is that escaped strings are not intended to produce visually unambiguous results; it is ok for the escaped string to contain unescaped characters that might be confused with other characters (e.g., characters considered "confusables" by Unicode). No changes are required to satisfy this poll.

The third poll clarifies that it is intended that separator and non-printable characters continue to be escaped. No changes are required to satisfy this poll.

The last poll indicates a change in direction relative to the current wording. SG16 desires that combining characters (those with the Unicode property Grapheme_Extend=Yes) shall be escaped if they are not preceded by a non-escaped lead character (or another combining character that is preceded by a lead character). Satisfying this poll will require normative changes to [format.string.escaped]p2.

SG16 poll results for [US38-098]:

Poll 2.1: [US 38-098] SG16 agrees that the formatted code units in the escaped string are intended to be usable as a string literal that reproduces the input.
Attendees: 8
No objection to unanimous consent.

Poll 2.2: [US 38-098] SG16 agrees that the escaped string is intended to be readable for its textual content in any Unicode script.
Attendees: 8
No objection to unanimous consent.

Poll 2.3: [US 38-098] SG16 agrees that separators and non-printable characters ([format.string.escaped]p(2.2.1.2)) shall be escaped in the escaped string.
Attendees: 8
No objection to unanimous consent.

Poll 2.4: [US 38-098] SG16 agrees that combining code points shall not be escaped unless there is no leading code point or the previous character was escaped.
Attendees: 8
No objection to unanimous consent.

SG16 poll results for [FR005-134]:

Poll 1: [FR 005-134]: SG16 recommends accepting the comment in the direction presented in the first bullet of the proposed change and as recommended in the polls for US 38-098.
Attendees: 8
Unanimous consent

LEWG poll results:

POLL: We agree with the direction of the proposed SG16 recommendation for US 38-098 & FR005-134.

SF F N A SA
9 7 0 0 0

Outcome: Unanimous consent

2. Wording

In [format.string.escaped]:

1 A character or string can be formatted as escaped to make it more suitable for debugging or for logging.

2 The escaped string E representation of a string S is constructed by encoding a sequence of characters as follows. The associated character encoding CE for charT (Table 13) is used to both interpret S and construct E.

...

[Example 1:

string s0 = format("[{}]", "h\tllo");               // s0 has value: [h    llo]
string s1 = format("[{:?}]", "h\tllo");             // s1 has value: ["h\tllo"]
string s3 = format("[{:?}, {:?}]", '\'', '"');      // s3 has value: ['\'', '"']

// The following examples assume use of the UTF-8 encoding
string s4 = format("[{:?}]", string("\0 \n \t \x02 \x1b", 9));
                                                    // s4 has value: ["\u{0} \n \t \u{2} \u{1b}"]
string s5 = format("[{:?}]", "\xc3\x28");           // invalid UTF-8, s5 has value: ["\x{c3}("]
string s6 = format("[{:?}]", "\u0301");             // s6 has value: ["\u{301}"]
string s7 = format("[{:?}]", "\\\u0301");           // s7 has value: ["\\\u{301}"]
string s8 = format("[{:?}]", "e\u0301\u0323");      // s8 has value: ["ẹ́"]

end example]

3. Acknowledgements

Thanks to Tom Honermann for nicely summarizing the resolution of NB comments in [US38-098] which is quoted in this paper.

References

Informative References

[FR005-134]
FR 005-134 22.14.6.4 [format.string.escaped] Aggressive escaping. URL: https://github.com/cplusplus/nbballot/issues/408
[US38-098]
US 38-098 22.14.6.4p1 [format.string.escaped] Escaping for debugging and logging. URL: https://github.com/cplusplus/nbballot/issues/515