Doc. no.: | P1652R1 |
Date: | 2019-07-17 |
Audience: | LWG |
Reply-to: | Zhihao Yuan <zy@miator.net>
Victor Zverovich <victor.zverovich@gmail.com> |
Changes since R0
- rebase the wording on top of P0645R10
- replace “applying” with “applied”
- replace “the ‘0’” with “the ‘0’ character”
- add an example demonstrating that the ‘0’ character is ignored when used together with alignment
- replace “. In such a case” with “, in which case”
Introduction
Printf heavily influences the formatting behavior design of std::format
and Python str.format
. However, in the process of development, the current specification of std::format
misses a few beneficial outcomes comparing to printf
and Python but inherits some unnecessary compromise from iostreams. This document is to show these corner cases and propose solutions in C++20.
Problem 1: ‘#o’ specification should not print 0 as “00”
variant |
behavior |
printf |
#o and #x print “0” for 0 |
Python |
#o, #x, and #b print “0o0”, “0x0”, “0b0”, respectively, for 0 |
format |
#o, #x, and #b print “00”, “0x0”, “0b0”, respectively, for 0 |
0odddd is not a pattern for octal literals in C++, so std::format
replaces it with printf’s pattern dddd for #o. However, the #
flag in printf is specified as follows:
For o
conversion, it increases the precision, if and only if necessary, to force the first digit of the result to be a zero (if the value and precision are both 0, a single 0 is printed).
The output here matches C++ syntax where 0 is an octal literal. We propose to respecify std::format
‘#o’ to match printf output.
Before:
std::string s = std::format("{:#o}", 0);
After:
std::string s = std::format("{:#o}", 0);
Problem 2: ‘c’ should be able to print 65 as “A” (ASCII)
variant |
behavior |
printf |
‘c’ prints “A” for 65, ‘lc’ prints “A” for (wint_t) 65 |
Python |
‘c’ prints “A” for 65 |
format |
throws an exception |
Not allowing ‘c’ to print integer generates a usability problem – the users won’t be able to print the return value of invoking cin.get()
(also getc
and fgetc
) as characters. It is hostile to C++ learners if a cast is required to use stdio or iostreams with std::format
for such a trivial task, while “{:c}” can be a way for them to express “show me a character
here.”
We propose to let integer presentation types support a new flag, ‘c’, which prints the argument x
as-if static_cast<charT>(x)
, where charT
is the character type of the format string defined in P0645. If the argument is not in the range representable by charT
, format_error
is thrown.
Before:
int c = 'A';
std::string s = std::format("{:c}", c);
After:
int c = 'A';
std::string s = std::format("{:c}", c);
Problem 3: “-000nan” is not a floating point value
What printf("%07", -nan(""))
prints is underspecified until C99 and POSIX 2008, where the effect of ‘0’ flag is described as:
For d, i, o, u, x, X, a, A, e, E, f, F, g, and G conversions, leading zeros (following any indication of sign or base) are used to pad to the field width rather than performing space padding, except when converting an infinity or NaN. […]
The last clause did not present in C89, C90, and POSIX 2003. The output “-000nan” cannot be correctly parsed by iostreams and strtod
. As of 2016, FreeBSD libc, glibc, and Microsoft UCRT have all avoided it.
However, iostreams mandates this pathological output with the internal
iomanip. This output also presents in Python and fmt where the =
alignment type is functionally equivalent to internal
. Even worse, the dedicated ‘0’ std-format-spec is specified as “[…] equivalent to a fill character of ‘0’ with an alignment type of ‘=’”. So the output of ‘0’ flag in Python and fmt is incompatible with printf ‘0’ flag.
The observations are:
- The
internal
iomanip only affects numeric output and does it poorly;
- The ‘=’ alignment type inherited all issues from
internal
and is verbose to write, hard to interpret, compared to ‘0’.
Therefore, we propose to remove the ‘=’ alignment type and respecify ‘0’ to match C99 printf’s output. Note that Rust std::fmt
, a newer Python-like formatting facility, also removed the ‘=’ align spec.
Before:
double nan = std::numeric_limits<double>::quiet_NaN();
std::string s1 = std::format("{:0=6}", nan);
std::string s2 = std::format("{:06}", nan);
After:
double nan = std::numeric_limits<double>::quiet_NaN();
std::string s1 = std::format("{:0=6}", nan);
std::string s2 = std::format("{:06}", nan);
variant |
behavior |
printf |
does not print bool as “true” or “false” |
iostreams |
via boolalpha iomanip |
Python |
no type format specifier for bool but empty format specification invokes str() which returns “True” or “False” |
format |
no type format specifier for bool but empty format specification gives “true” or “false” |
So std::format
can only print bool
without a type format specifier, distinguishing it from all other fundamental types and string-like types. We consider ‘s’ flag to be a “Do What I Mean” (DWIM) improvement to this caveat. Note that the fmt library supports printing bool
via %s
in printf-compatible syntax, but did not propose the syntax for standardization.
Before:
std::string s = std::format("{:s}", true);
After:
std::string s = std::format("{:s}", true);
Problem 5: double
does not roundtrip float
variant |
roundtrip double in shortest decimal representation |
float behavior |
printf |
No |
float is promoted to double |
iostreams |
No |
float is converted to double |
Python |
Yes |
does not support float32 |
format |
Yes |
float is converted to double |
Python prints shortest round-trip representations for floating point values by default; so does std::format
– but not for float
. Single-precision floating point values roundtrip in their realm and are already supported by std::to_chars
. We should print a float
as float
rather than a long string used for disambiguating the value in double
's realm.
Before:
std::string s = std::format("{}", 3.31f);
After:
std::string s = std::format("{}", 3.31f);
Wording
The wording is relative to P0645R10.
Modify 19.?.2 [format.string] as follows:
format-spec ::= std-format-spec | custom-format-spec
std-format-spec ::= [[fill] align] [sign] ['#'] ['0'] [width] ['.' precision] [type]
fill ::= <a character other than '{' or '}'>
align ::= '<' | '>' | '=' | '^'
sign ::= '+' | '-' | ' '
width ::= nonzero-digit [integer] | '{' arg-id '}'
precision ::= integer | '{' arg-id '}'
type ::= 'a' | 'A' | 'b' | 'B' | 'c' | 'd' | 'e' | 'E' | 'f' | 'F' |
'g' | 'G' | 'n' | 'o' | 'p' | 's' | 'x' | 'X'
[…]
The meaning of the various alignment options is as follows:
Option |
Meaning |
'<' |
Forces the field to be left-aligned within the available space. This is the default for non-arithmetic types, charT , and bool , unless an integer presentation type is specified. |
'>' |
Forces the field to be right-aligned within the available space. This is the default for arithmetic types other than charT and bool or when an integer presentation type is specified. |
'=' |
Forces the padding to be placed after the sign or prefix (if any) but before the digits. This is used for printing fields in the form +000000120 . This alignment option is only valid for arithmetic types other than charT and bool or when an integer presentation type is specified. |
'^' |
Forces the field to be centered within the available space by inserting N/ 2 and N- N/ 2 fill characters before and after the value respectively, where N is the total number of fill characters to insert. |
[Example:
char c = 120;
string s0 = format("{:6}", 42); // s0 == " 42"
string s1 = format("{:6}", 'x'); // s1 == "x "
string s2 = format("{:*<6}", 'x'); // s2 == "x*****"
string s3 = format("{:*>6}", 'x'); // s3 == "*****x"
string s4 = format("{:*^6}", 'x'); // s4 == "**x***"
string s5 = format("{:=6}", 'x'); // Error: '=' with charT and no integer presentation type
string s65 = format("{:6d}", c); // s65 == " 120"
string s7 = format("{:=+06d}", c); // s7 == "+00120"
string s8 = format("{:0=#6x}", 0xa); // s8 == "0x000a"
string s96 = format("{:6}", true); // s96 == "true "
–end example]
The '#'
option causes the alternate form to be used for the conversion. This option is only valid for arithmetic types other than charT
and bool
or when an integer presentation type is specified. For integers, when binary , octal, or hexadecimal output is used, this option adds the respective prefix "0b"
("0B"
) , "0"
, or "0x"
("0X"
) to the output value. Whether the prefix is lower-case or upper-case is determined by the case of the type format specifier. The option prefixes the output value with "0"
when octal output is used on nonzero integers. For floating-point numbers […]
width is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content.
Preceding the width field by a zero ('0'
) character enables sign-aware zero-padding for arithmetic types. This is equivalent to a fill character of '0'
with an alignment type of '='
. pads leading zeros (following any indication of sign or base) to the field width, except when applied to an infinity or NaN. This option is only valid for arithmetic types other than charT
and bool
or when an integer presentation type is specified. If the ‘0’ character and an align option both appear, the ‘0’ character is ignored. [Example:
char c = 120;
string s1 = format("{:+06d}", c); // s1 == "+00120"
string s2 = format("{:#06x}", 0xa); // s2 == "0x000a"
string s3 = format("{:<06}", -42); // s3 == "-42 " ('0' is ignored because of the '<' alignment)
–end example]
[…]
The available integer presentation types and their mapping to to_chars
are:
Option |
Meaning |
'b' |
to_chars(first, last, value, 2) ; using the '#' option with this type adds the prefix "0b" to the output. |
'B' |
The same as 'b' , except that the '#' option adds the prefix "0B" to the output. |
'c' |
Copies the characterstatic_cast<charT>(value) to the output. Throws format_error if value is not in the range of representable values for charT . |
'd' |
to_chars(first, last, value) . |
[…] |
[…] |
none |
The same as 'd' if the formatting argument type is not charT or bool . |
Integer presentation types can also be used with charT
and bool
values . , in which case a value
of type bool
is treated as static_cast<unsigned char>(value)
. Values of type bool
are formatted using textual representation, either "true"
or "false"
, if the presentation type is not specified. [Example: … –end example]
[Drafting note:
A drive-by fix – to_chars
has no overload for bool
.
–end note]
For lower-case presentation types, infinity and NaN are formatted as "inf"
and "nan"
, respectively, with sign, if any. For upper-case presentation types, infinity and NaN are formatted as "INF"
and "NAN"
, respectively, with sign, if any.
The available bool
presentation types are:
Type |
Meaning |
's' |
Copies textual representation, either “true” or “false”, to the output. |
none |
The same as ‘s’. |
The available pointer presentation types and their mapping to to_chars
are:
[…]
Modify 19.?.4.1 [format.arg] as follows:
namespace std {
template<class Context>
class basic_format_arg {
public:
class handle;
using char_type = typename Context::char_type; // exposition only
variant<monostate, bool, char_type,
int, unsigned int, long long int, unsigned long long int,
float, double, long double,
const char_type*, basic_string_view<char_type>,
const void*, handle> value; // exposition only
basic_format_arg() noexcept;
[…]
explicit basic_format_arg(float n) noexcept;
Effects: Initializes value
with static_cast<double>(n)
.
explicit basic_format_arg(double n) noexcept;
explicit basic_format_arg(long double n) noexcept;
Effects: Initializes value
with n
.
References
Victor Zverovich <victor.zverovich@gmail.com>
Printf corner cases in
std::format
Changes since R0
Introduction
Printf heavily influences the formatting behavior design of
std::format
and Pythonstr.format
. However, in the process of development, the current specification ofstd::format
[1] misses a few beneficial outcomes comparing toprintf
and Python but inherits some unnecessary compromise from iostreams. This document is to show these corner cases and propose solutions in C++20.Problem 1: ‘#o’ specification should not print 0 as “00”
0odddd is not a pattern for octal literals in C++, so
std::format
replaces it with printf’s pattern dddd for #o. However, the#
flag in printf is specified as follows[2]:The output here matches C++ syntax where 0 is an octal literal. We propose to respecify
std::format
‘#o’ to match printf output.Before:
std::string s = std::format("{:#o}", 0); // s == "00"
After:
std::string s = std::format("{:#o}", 0); // s == "0"
Problem 2: ‘c’ should be able to print 65 as “A” (ASCII)
(wint_t)
65Not allowing ‘c’ to print integer generates a usability problem – the users won’t be able to print the return value of invoking
cin.get()
(alsogetc
andfgetc
) as characters. It is hostile to C++ learners if a cast is required to use stdio or iostreams withstd::format
for such a trivial task, while “{:c}” can be a way for them to express “show me a character here.”We propose to let integer presentation types support a new flag, ‘c’, which prints the argument
x
as-ifstatic_cast<charT>(x)
, wherecharT
is the character type of the format string defined in P0645. If the argument is not in the range representable bycharT
,format_error
is thrown.Before:
int c = 'A'; std::string s = std::format("{:c}", c); // throws format_error
After:
int c = 'A'; std::string s = std::format("{:c}", c); // s == "A"
Problem 3: “-000nan” is not a floating point value
What
printf("%07", -nan(""))
prints is underspecified until C99[2:1] and POSIX 2008[3], where the effect of ‘0’ flag is described as:The last clause did not present in C89, C90, and POSIX 2003. The output “-000nan” cannot be correctly parsed by iostreams and
strtod
. As of 2016, FreeBSD libc, glibc, and Microsoft UCRT have all avoided it.However, iostreams mandates this pathological output with the
internal
iomanip. This output also presents in Python and fmt where the=
alignment type is functionally equivalent tointernal
. Even worse, the dedicated ‘0’ std-format-spec is specified as “[…] equivalent to a fill character of ‘0’ with an alignment type of ‘=’”. So the output of ‘0’ flag in Python and fmt is incompatible with printf ‘0’ flag.The observations are:
internal
iomanip only affects numeric output and does it poorly;internal
and is verbose to write, hard to interpret, compared to ‘0’.Therefore, we propose to remove the ‘=’ alignment type and respecify ‘0’ to match C99 printf’s output. Note that Rust
std::fmt
, a newer Python-like formatting facility, also removed the ‘=’ align spec.[4]Before:
double nan = std::numeric_limits<double>::quiet_NaN(); std::string s1 = std::format("{:0=6}", nan); // s1 == "000nan" std::string s2 = std::format("{:06}", nan); // s2 == "000nan"
After:
double nan = std::numeric_limits<double>::quiet_NaN(); std::string s1 = std::format("{:0=6}", nan); // throws format_error std::string s2 = std::format("{:06}", nan); // s2 == " nan"
Problem 4: bool needs a type format specifier
bool
as “true” or “false”boolalpha
iomanipbool
but empty format specification invokesstr()
[5] which returns “True” or “False”bool
but empty format specification gives “true” or “false”So
std::format
can only printbool
without a type format specifier, distinguishing it from all other fundamental types and string-like types. We consider ‘s’ flag to be a “Do What I Mean” (DWIM) improvement to this caveat. Note that the fmt library supports printingbool
via%s
in printf-compatible syntax[6], but did not propose the syntax for standardization.Before:
std::string s = std::format("{:s}", true); // throws format_error
After:
std::string s = std::format("{:s}", true); // s == "true"
Problem 5:
double
does not roundtripfloat
double
inshortest decimal representation
float
behaviorfloat
is promoted todouble
float
is converted todouble
float
is converted todouble
Python prints shortest round-trip representations for floating point values by default; so does
std::format
– but not forfloat
. Single-precision floating point values roundtrip in their realm and are already supported bystd::to_chars
. We should print afloat
asfloat
rather than a long string used for disambiguating the value indouble
's realm.Before:
std::string s = std::format("{}", 3.31f); // s == "3.309999942779541"
After:
std::string s = std::format("{}", 3.31f); // s == "3.31"
Wording
The wording is relative to P0645R10.
Modify 19.?.2 [format.string] as follows:
[…]
The meaning of the various alignment options is as follows:
'<'
charT
, andbool
, unless an integer presentation type is specified.'>'
charT
andbool
or when an integer presentation type is specified.'='
Forces the padding to be placed after the sign or prefix (if any) but before the digits. This is used for printing fields in the form+000000120
. This alignment option is only valid for arithmetic types other thancharT
andbool
or when an integer presentation type is specified.'^'
/ 2
and N-
N/ 2
fill characters before and after the value respectively, where N is the total number of fill characters to insert.[Example:
–end example]
The
'#'
option causes the alternate form to be used for the conversion. This option is only valid for arithmetic types other thancharT
andbool
or when an integer presentation type is specified. For integers, when binary, octal,or hexadecimal output is used, this option adds the respective prefix"0b"
("0B"
),or"0"
,"0x"
("0X"
) to the output value. Whether the prefix is lower-case or upper-case is determined by the case of the type format specifier. The option prefixes the output value with"0"
when octal output is used on nonzero integers. For floating-point numbers […]width is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content.
Preceding the width field by a zero (
'0'
) characterenables sign-aware zero-padding for arithmetic types. This is equivalent to a fill character ofpads leading zeros (following any indication of sign or base) to the field width, except when applied to an infinity or NaN. This option is only valid for arithmetic types other than'0'
with an alignment type of'='
.charT
andbool
or when an integer presentation type is specified. If the ‘0’ character and an align option both appear, the ‘0’ character is ignored. [Example:–end example]
[…]
The available integer presentation types and their mapping to
to_chars
are:'b'
to_chars(first, last, value, 2)
; using the'#'
option with this type adds the prefix"0b"
to the output.'B'
'b'
, except that the'#'
option adds the prefix"0B"
to the output.'c'
static_cast<charT>(value)
to the output. Throwsformat_error
ifvalue
is not in the range of representable values forcharT
.'d'
to_chars(first, last, value)
.'d'
if the formatting argument type is notcharT
orbool
.Integer presentation types can also be used with
charT
andbool
values., in which case avalue
of typebool
is treated asstatic_cast<unsigned char>(value)
.Values of type[Example: … –end example]bool
are formatted using textual representation, either"true"
or"false"
, if the presentation type is not specified.[Drafting note: A drive-by fix –
to_chars
has no overload forbool
. –end note]For lower-case presentation types, infinity and NaN are formatted as
"inf"
and"nan"
, respectively, with sign, if any. For upper-case presentation types, infinity and NaN are formatted as"INF"
and"NAN"
, respectively, with sign, if any.The available
bool
presentation types are:'s'
The available pointer presentation types and their mapping to
to_chars
are:[…]
Modify 19.?.4.1 [format.arg] as follows:
[…]
References
Text Formatting. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0645r10.html ↩︎
ISO/IEC 9899:TC3 Committee Draft. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf ↩︎ ↩︎
dprintf, fprintf, printf, snprintf, sprintf - print formatted output, The Open Group Base Specifications Issue 7, 2018 edition. http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html ↩︎
Syntax, Module
std::fmt
. https://doc.rust-lang.org/std/fmt/#syntax ↩︎Printing boolean values True/False with the format() method in Python. https://stackoverflow.com/questions/23655005/printing-boolean-values-true-false-with-the-format-method-in-python/23666923 ↩︎
Formatting bool with ‘s’ type specifier should give textual output. https://github.com/fmtlib/fmt/issues/224 ↩︎