"Safety doesn’t happen by accident."
― unknown
1. Introduction
This paper proposes the following improvements to the C++20 formatting facility:
-
Improving safety via compile-time format string checks
-
Reducing binary code size of
format_to
2. Revision history
Changes since R2:
-
Added an entry to Annex C summarizing the differences from C++20.
-
Italicized
,format - string
,wformat - string
,basic - format - string
and deitalicizedstr
.< Args ... > -
Replaced
withfmt
in the Effects offmt . str
(wide overloads).format_to -
Made the
member ofstr
private.basic - format - string -
Replaced "
is a format string forstr
" with "there existArgs
of typesargs
such thatArgs
is a format string forstr
" in the Remarks clause of theargs
constructor and removed the definition of a format string for argument types from [format.string.general].basic - format - string -
Replaced 14882:2017 with 14882:2020 in [diff.cpp20].
Changes since R1:
-
Made the
member ofstr
exposition-only.basic - format - string -
Added missing
to theconsteval
constructor.basic - format - string -
Added "A string
is a format string for argument typess
if there existArgs
of typesargs
such thatArgs
is a format string fors
." to [format.string.general].args -
Replaced "Initializes
" with "Direct-non-list-initializesstr
" in the the Effects clause of thestr
constructor.basic - format - string -
Replaced "
" with "is_convertible_v < const T & , basic_string_view < charT >>
modelsconst T &
" in the Constraints clause of theconvertible_to < basic_string_view < charT >>
constructor.basic - format - string -
Replaced "Mandates:
is a format string fors
." with "Remarks: A call to this function is not a core constant expression ([expr.const]) unlessArgs
is a format string forstr
." in the specification of theArgs
constructor.basic - format - string
Changes since R0:
-
Removed "Passing an argument
that is not a format string for parameter packfmt
is ill-formed with no diagnostic required." per LEWG feedback.args -
Changed the wording to use C++20 facilities and an exposition-only
type for guaranteed diagnostic and no reliance on compiler extensions per LEWG feedback.basic - format - string -
Added an implementation sketch to § 5 Compile-time checks.
-
Clarified why code bloat cannot be addressed just as a quality of implementation issue in § 6 Binary size.
-
Added an example illustrating one of the cases where code bloat occurs to § 6 Binary size.
3. LEWG polls (R1)
We prefer Option 2 (only string literals) over option 1 (all constexpr format str)
SF | F | N | A | SA |
---|---|---|---|---|
0 | 3 | 0 | 7 | 2 |
Stay with option 1
We want to adopt the binary size reduction presented in P2216r1 even if it is a breaking change against C++20.
SF | F | N | A | SA |
---|---|---|---|---|
4 | 8 | 0 | 0 | 0 |
Strong consensus for change.
We would prefer the binary size reduction change to be made as a DR against C++20
SF | F | N | A | SA |
---|---|---|---|---|
7 | 6 | 3 | 0 | 0 |
Strong consensus for DR
We would prefer the compile time checking change to be made as a DR against C++20
SF | F | N | A | SA |
---|---|---|---|---|
7 | 7 | 1 | 0 | 0 |
Strong consensus for DR
Pending a wording review from Tim Song we want the next revision of this paper to proceed to electronic balloting with priority B1 (focus).
SF | F | N | A | SA |
---|---|---|---|---|
9 | 5 | 0 | 0 | 0 |
Strong consensus, we want this paper to proceed
4. LEWG polls (R0)
We should promise more committee time to pursuing the compile time checking aspects of P2216R0, knowing that our time is scarce and this will leave less time for other work.
SF | F | N | A | SA |
---|---|---|---|---|
6 | 6 | 3 | 0 | 0 |
Consensus to pursue
We should promise more committee time to pursuing the code bloat aspects of P2216R0, knowing that our time is scarce and this will leave less time for other work.
SF | F | N | A | SA |
---|---|---|---|---|
3 | 8 | 6 | 0 | 0 |
Consensus to pursue
We are comfortable having
compile time check failures cause the
program to be ill-formed, no diagnostic required (IFNDR).
SF | F | N | A | SA |
---|---|---|---|---|
0 | 1 | 2 | 4 | 8 |
LEWG is not comfortable with IFNDR
LEWG would prefer
compile time check failures to cause the program
to be ill-formed (diagnostic required).
SF | F | N | A | SA |
---|---|---|---|---|
5 | 7 | 1 | 0 | 0 |
LEWG prefers ill-formed
We are comfortable having
compile time checks rely on compiler
extensions to be implementable.
SF | F | N | A | SA |
---|---|---|---|---|
3 | 3 | 4 | 4 | 0 |
LEWG is somewhat is uncomfortable with relying on compiler extensions for this facility
5. Compile-time checks
Consider the following example:
std :: string s = std :: format ( "{:d}" , "I am not a number" );
In C++20 ([N4861]) it throws
because
is not a valid format
specifier for a null-terminated character string.
We propose making it ill-formed resulting in a compile-time rather than a runtime error. This will significantly improve safety of the formatting API and bring it on par with other languages such as D ([D-FORMAT]) and Rust ([RUST-FMT]).
This proposal has been successfully implemented in the open-source {fmt} library
([FMT]) using only C++20 facilities and tested on Clang 11 and GCC 10. It will
become the default in the next major release of the library. The implementation
is very simple and straightforward because format string parsing in C++20 has
been designed with such checks in mind ([P0645]) and is already
.
There are two options:
-
Provide compile-time checks for all format strings known at compile time.
-
Limit checks to string literals only.
Here is a sketch of the implementation:
#ifdef OPTION_1 // exposition only // Option 1: template < class charT , class ... Args > struct basic_format_string { basic_string_view < charT > str ; template < class T , enable_if_t < is_convertible_v < const T & , basic_string_view < charT >> , int > = 0 > consteval basic_format_string ( const T & s ) : str ( s ) { // Report a compile-time error if s is not a format string for Args. } }; #else // Option 2: template < class charT , class ... Args > struct basic_format_string { basic_string_view < charT > str ; template < size_t N > consteval basic_format_string ( const charT ( & s )[ N ]) : str ( s ) { // Report a compile-time error if s is not a format string for Args. } template < class T , enable_if_t < is_convertible_v < const T & , basic_string_view < charT >> , int > = 0 > basic_format_string ( const T & s ) : str ( s ) {} }; #endif // Same for Option 1 & Option 2: template < class ... Args > using format_string = basic_format_string < char , type_identity_t < Args > ... > ; template < class ... Args > string format ( format_string < Args ... > fmt , const Args & ... args ) { return vformat ( fmt . str , make_format_args ( args ...)); }
Compiling our example produces the following diagnostic on Clang:
<source>:36:26: error: call to consteval function 'basic_format_string<char, char [18]>::basic_format_string<5>' is not a constant expression std::string s = format("{:d}", "I am not a number"); ^ /opt/compiler-explorer/libs/fmt/trunk/include/fmt/format.h:1422:13: note: non-constexpr function 'on_error' cannot be used in a constant expression handler.on_error("invalid type specifier"); ^ ...
Comparison of different options:
Code | C++20 | Option 1 | Option 2 |
---|---|---|---|
| OK | OK | OK |
| throws | ill-formed | ill-formed |
| OK | OK | OK |
| OK | ill-formed | ill-formed |
| throws | ill-formed | throws |
| OK | ill-formed | OK |
Option 1 is safer but has the same limitation as Rust’s
of only
accepting format strings known at compile time. However, it is still possible to
pass a runtime string via
:
const char * fmt = "{:d}" ; auto s = vformat ( fmt , make_format_args ( 42 ));
Additionally we can provide a convenience wrapper for passing runtime strings:
const char * fmt = "{:d}" ; auto s = format ( runtime_format ( fmt ), 42 );
Note that in the vast majority of cases format strings are literals.
For example, analyzing a sample of 100
calls from [CODESEARCH] showed
that 98 of them are string literals and 2 are string literals wrapped in the
gettext macro:
printf ( _ ( "call to tc_aout_fix_to_chars \n " ));
In this case translation and runtime format markers can be combined without any impact on usability.
We propose making
exposition-only because it is an
implementation detail and in the future the same functionality can be
implemented using [P1221] (see e.g. https://godbolt.org/z/hcnxfY) or [P1045].
From the extensive usage experience in the {fmt} library ([FMT]) that provides
compile-time checks as an opt-in we’ve found that users expect errors in literal
format strings to be diagnosed at compile time by default. One of the reasons is
that such diagnostic is commonly done in
, for example:
printf ( "%d" , "I am not a number" );
gives a warning both in GCC and clang:
so users expect the same or better level of diagnostics from a similar C++ facility.warning : format specifies type 'int 'but the argument has type 'const char * '[ - Wformat ]
6. Binary size
The
functions take format arguments parameterized on the output
iterator via the formatting context:
template < class Out , class charT > using format_args_t = basic_format_args < basic_format_context < Out , charT >> ; template < class Out > Out vformat_to ( Out out , string_view fmt , format_args_t < type_identity_t < Out > , char > args );
Unfortunately it may result in significant code bloat because formatting code
will have to be instantiated for every iterator type used with
or
, for example:
std :: vector < char > v ; std :: format_to ( std :: back_inserter ( v ), "{}" , 42 ); // Formatting functions are instantiated for std::back_insert_iterator<std::vector<char>>. std :: string s ; std :: format_to ( std :: back_inserter ( s ), "{}" , 42 ); // Formatting functions are instantiated for std::back_insert_iterator<std::string>.
This happens even for argument types that are not formatted,
clearly violating "you don’t pay for what you don’t use" principle. Also this is
unnecessary because the iterator type can be erased via the internal buffer as
it is done in
and
without affecting performance for the
common case of containers with contiguous storage. Therefore we propose using
and
instead of
in these overloads:
template < class Out > Out vformat_to ( Out out , string_view fmt , format_args args );
specializations will continue to support output iterators so this
only affects type-erased API and not the one with compiled format strings that
will be proposed separately. The latter will not be affected by the code bloat
issue because instantiations will be limited only to used argument types.
In addition to reducing the code bloat this will simplify the API.
The code bloat problem cannot be solved just as a quality of implementation
issue because the iterator type is observable through the
API.
This proposal has been successfully implemented in the {fmt} library ([FMT]).
7. Impact on existing code
Making invalid format strings ill-formed and modifying the problematic
overloads are breaking changes although at the time of writing none
of the standard libraries implements the С++20 formatting facility and therefore
there is no code using it.
8. Wording
All wording is relative to the C++ working draft [N4861].
Update the value of the feature-testing macro
to the date of
adoption in [version.syn]:
Change in [format.syn]:
namespace std { // 20.20.3, error reporting template < class charT , class ... Args > struct basic - format - string { // exposition only private : basic_string_view < charT > str ; // exposition only public : template < class T > consteval basic - format - string ( const T & s ); }; template < class ... Args > using format - string = basic - format - string < char , type_identity_t < Args > ... > ; // exposition only template < class ... Args > using wformat - string = basic - format - string < wchar_t , type_identity_t < Args > ... > ; // exposition only // 20.20.4, formatting functions template < class ... Args > string format ( string_view format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > wstring format ( wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class ... Args > string format ( const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > wstring format ( const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args ); ... template < class Out , class ... Args > Out format_to ( Out out , string_view format - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > Out format_to ( Out out , wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > Out format_to ( Out out , const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > Out format_to ( Out out , const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class Out > Out vformat_to ( Out out , string_view fmt , format_args_t < type_identity_t < Out > , char > format_args args ); template < class Out > Out vformat_to ( Out out , wstring_view fmt , format_args_t < type_identity_t < Out > , wchar_t > wformat_args args ); template < class Out > Out vformat_to ( Out out , const locale & loc , string_view fmt , format_args_t < type_identity_t < Out > , char > format_args args ); template < class Out > Out vformat_to ( Out out , const locale & loc , wstring_view fmt , format_args_t < type_identity_t < Out > , wchar_t > wformat_args args ); ... template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , string_view format - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class ... Args > size_t formatted_size ( string_view format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > size_t formatted_size ( wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class ... Args > size_t formatted_size ( const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > size_t formatted_size ( const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args ); ... // 20.20.6.3, class template basic_format_args ... template < class Out , class charT > using format_args_t = basic_format_args < basic_format_context < Out , charT >> ;
Change in [format.string.general]:
...
If all arg-ids in a format string are omitted (including those in the format-spec, as interpreted by the corresponding
specialization),
argument indices 0, 1, 2, ... will automatically be used in that order. If some arg-ids are omitted and some are present, the string is not a format string.
[Note: A format string cannot contain a mixture of automatic and manual
indexing. — end note] [Example:
string s0 = format ( "{} to {}" , "a" , "b" ); // OK, automatic indexing string s1 = format ( "{1} to {0}" , "a" , "b" ); // OK, manual indexing string s2 = format ( "{0} to {}" , "a" , "b" ); // not a format string (mixing automatic and manual indexing), // throws format_error ill-formed string s3 = format ( "{} to {1}" , "a" , "b" ); // not a format string (mixing automatic and manual indexing), // throws format_error ill-formed
— end example]
Change in [format.err.report]:
... Failure to allocate storage is reported by throwing an exception as described in 16.5.5.13.
template < class charT , class ... Args > struct basic - format - string { // exposition only private : basic_string_view < charT > str ; // exposition only public : template < class T > consteval basic - format - string ( const T & s ); };
Constraints:template < class T > consteval basic - format - string ( const T & s );
const T &
models convertible_to < basic_string_view < charT >>
.
Effects: Direct-non-list-initializes
with
.
Remarks: A call to this function is not a core constant expression
([expr.const]) unless there exist
of types
such that
is a format string for
.
Change in [format.functions]:
template < class ... Args > string format ( string_view format - string < Args ... > fmt , const Args & ... args );
Effects: Equivalent to:
return vformat ( fmt . str , make_format_args ( args ...));
template < class ... Args > wstring format ( wstring_view wformat - string < Args ... > fmt , const Args & ... args );
Effects: Equivalent to:
return vformat ( fmt . str , make_wformat_args ( args ...));
template < class ... Args > string format ( const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args );
Effects: Equivalent to:
return vformat ( loc , fmt . str , make_format_args ( args ...));
template < class ... Args > wstring format ( const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args );
Effects: Equivalent to:
return vformat ( loc , fmt . str , make_wformat_args ( args ...));
...
Effects: Equivalent to:template < class Out , class ... Args > Out format_to ( Out out , string_view format - string < Args ... > fmt , const Args & ... args );
return vformat_to ( out , fmt . str , make_format_args ( args ...));
template < class Out , class ... Args > Out format_to ( Out out , wstring_view wformat - string < Args ... > fmt , const Args & ... args );
Effects: Equivalent to:
using context = basic_format_context < Out , decltype ( fmt ) :: value_type > ; return vformat_to ( out , fmt , make_format_args < context > ( args ...)); return vformat_to ( out , fmt . str , make_wformat_args ( args ...));
Effects: Equivalent to:template < class Out , class ... Args > Out format_to ( Out out , const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args );
return vformat_to ( out , loc , fmt . str , make_format_args ( args ...));
template < class Out , class ... Args > Out format_to ( Out out , const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args );
Effects: Equivalent to:
using context = basic_format_context < Out , decltype ( fmt ) :: value_type > ; return vformat_to ( out , loc , fmt , make_format_args < context > ( args ...)); return vformat_to ( out , loc , fmt . str , make_wformat_args ( args ...));
template < class Out > Out vformat_to ( Out out , string_view fmt , format_args_t < type_identity_t < Out > , char > format_args args ); template < class Out > Out vformat_to ( Out out , wstring_view fmt , format_args_t < type_identity_t < Out > , wchar_t > wformat_args args ); template < class Out > Out vformat_to ( Out out , const locale & loc , string_view fmt , format_args_t < type_identity_t < Out > , char > format_args args ); template < class Out > Out vformat_to ( Out out , const locale & loc , wstring_view fmt , format_args_t < type_identity_t < Out > , wchar_t > wformat_args args );
...
template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , string_view format - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args ); template < class Out , class ... Args > format_to_n_result < Out > format_to_n ( Out out , iter_difference_t < Out > n , const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args );
Let
—
be
...
template < class ... Args > size_t formatted_size ( string_view format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > size_t formatted_size ( wstring_view wformat - string < Args ... > fmt , const Args & ... args ); template < class ... Args > size_t formatted_size ( const locale & loc , string_view format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > size_t formatted_size ( const locale & loc , wstring_view wformat - string < Args ... > fmt , const Args & ... args );
Let
be
.
...
Add to Annex C (informative) Compatibility [diff] the following new subclause:
C.? C++ and ISO C++ 2020 [diff.cpp20]
This subclause lists the differences between C++ and ISO C++ 2020 (ISO/IEC 14882:2020, Programming Languages — C++), by the chapters of this document.
C.?.1 Clause 20: general utilities library [diff.cpp20.utilities]
Affected subclauses: 20.20
Change: Signature changes:
,
,
,
,
. Removal of
.
Rationale: Improve safety via compile-time format string checks, avoid
unnecessary template instantiations.
Effect on original feature: Valid C++20 code that contained errors in format
strings or relied on previous format string signatures or
may
become ill-formed. For example:
auto s = std :: format ( "{:d}" , "I am not a number" ); // ill-formed, // previously threw format_error
9. Acknowledgements
Thanks to Hana Dusíková for demonstrating that the optimal formatting API can be implemented with P1221 and Tim Song for reviewing the wording.