Document #: | P2757R3 |
Date: | 2023-06-15 |
Project: | Programming Language C++ |
Audience: |
LWG |
Reply-to: |
Barry Revzin <barry.revzin@gmail.com> |
Since [P2757R2], wording changes and note about feature-test macro after LWG discussion.
Since [P2757R1], added feature-test macro.
Since [P2757R0], reverted basic_format_parse_context
constructor and removed check_dynamic_spec_arithmetic
- “arithmetic” types technically include bool
and char
per the language wording, but those are very unlikely to be actually desired in the context where you’re asking for something that could also be an int
or a double
. Can always be added back in some form if dynamic floating point argument use-cases surface.
std::format
supports compile-time checking of format strings [P2216R3], which is a fantastic feature. A compile-time error is always better than a runtime error, and we can see that happen in a lot of cases:
expression
|
result
|
---|---|
format("{:d}", "I am not a number") |
compile error (invalid specifier for strings) |
format("{:7^*}", "hello") |
compile error (should be *^7 ) |
format("{:>10}", "hello") |
ok |
format("{0:>{1}}", "hello", 10) |
ok |
format("{0:>{2}}", "hello", 10) |
compile error (argument 2 is out of bounds) |
format("{:>{}}", "hello") |
compile error (missing an argument for dynamic width) |
format("{:>{}}", "hello", "10") |
runtime error |
Wait, why is the last one a runtime error instead of compile-time error?
formatter<T>::parse
gets an instance of basic_format_parse_context
, which looks like this (22.14.6.5 [format.parse.ctx]):
namespace std { template<class charT> class basic_format_parse_context { public: using char_type = charT; using const_iterator = typename basic_string_view<charT>::const_iterator; using iterator = const_iterator; private: iterator begin_; // exposition only iterator end_; // exposition only enum indexing { unknown, manual, automatic }; // exposition only indexing indexing_; // exposition only size_t next_arg_id_; // exposition only size_t num_args_; // exposition only public: constexpr explicit basic_format_parse_context(basic_string_view<charT> fmt, size_t num_args = 0) noexcept; basic_format_parse_context(const basic_format_parse_context&) = delete; basic_format_parse_context& operator=(const basic_format_parse_context&) = delete; constexpr const_iterator begin() const noexcept; constexpr const_iterator end() const noexcept; constexpr void advance_to(const_iterator it); constexpr size_t next_arg_id(); constexpr void check_arg_id(size_t id); }; }
The user facing logic here is divided into two parts:
begin()
, end()
, and advance_to(it)
)next_arg_id()
and check_arg_id(id)
)Note that the parse context here doesn’t get access to the arguments themselves, it only knows how many arguments there are and, if doing automatic indexing, what the current argument index is. This portion of the API can be used to validate that dynamic arguments exist (ensuring that two of the rows above fail) and, for automatic indexing, storing the argument index for future access in formatter<T>::format
.
The parse context doesn’t get access to the arguments largely for code size reasons, and also because now that parse()
is invoked during constant evaluation time, it’s unlikely or simply impossible to provide the arguments at that time anyway.
But this API has the limitation that it cannot currently allow diagnosing that last line:
Here, the issue is that we have a dynamic width (the {}
part), which refers to the next argument, which is "10"
. But for char const*
, the width needs to be integral, which "10"
is not. Now, we don’t need to know the value of the argument in order to reject this case - we only need to know the type. Which we definitely have. So maybe we can do better?
{fmt}
The {fmt}
library actually does reject this example at compile time. It does so by constructing a different kind of parse context that is only used at compile time: the appropriately-named compile_parse_context
. This is a basic_format_parse_context
that additionally stores information about what types the arguments are, except type-erased to the set that of types that is correctly stored in the variant
in basic_format_context
.
The relevant API of compile_parse_context
looks like this (in {fmt}
, basic_format_parse_context
has a second template parameter that is the error handler. It’s not relevant for this example. The rest of the code is slightly altered for paper-ness):
enum class type { none_type, // Integer types should go first, int_type, uint_type, long_long_type, ulong_long_type, int128_type, uint128_type, bool_type, char_type, last_integer_type = char_type, // followed by floating-point types. float_type, double_type, long_double_type, last_numeric_type = long_double_type, cstring_type, string_type, pointer_type, custom_type }; constexpr auto is_integral_type(type t) -> bool { return t > type::none_type && t <= type::last_integer_type; } template <typename Char, typename ErrorHandler> class basic_format_parse_context : private ErrorHandler { public: // these are the same as in std constexpr auto next_arg_id() -> int; constexpr auto check_arg_id(int arg_id) -> void; // but this one is new constexpr auto check_dynamic_spec(int arg_id) -> void; }; template <typename Char, typename ErrorHandler> class compile_parse_context : basic_format_parse_context<Char, ErrorHandler> { std::span<type const> types_; public: constexpr auto arg_type(int id) const -> type { return types_[id]; } constexpr auto check_dynamic_spec(int arg_id) -> void { if (arg_id < types_.size() and not is_integral_type(types_[arg_id])) { // this ensures that the call is not a constant expression this->on_error("width/precision is not an integer"); } } }; template <typename Char, typename ErrorHandler> constexpr auto basic_format_parse_context<Char, ErrorHandler>::check_dynamic_spec(int arg_id) -> void { if consteval { using compile_context = compile_parse_context<Char, ErrorHandler>; static_cast<compile_context*>(this)->check_dynamic_spec(arg_id); } }
There are several important things to note here.
First, the implementation is the only one constructing the parse context, so it’s free to do something like - construct a compile_parse_context
if during constant evaluation time so that this downcast is safe.
Second, the type check only happens during constant evaluation time. This is important. In typical uses, parse
will be called twice: once during compile time (for initial type checking) and then once later during runtime. If we already did the type check during compile time, we don’t have to do it again during runtime. The conditional checking during if consteval
is the right way to go.
Third, {fmt}
uses an enum type that maps all user-defined types to custom_type
. This is exposed to the user via check_dynamic_spec
(which checks that the argument type is integral) and arg_type
(which simply returns the enum). There is no user-provided code being run here - which is important because that lets us basically hide this check behind compile time and not have to worry about whether some arbitrary user-defined predicate is being run or not. It also means that users don’t have to worry about the potential overhead of these checks, since they can just choose to call check_dynamic_spec
and know that this has no runtime overhead – rather than them having to write if consteval
(and probably forget to).
Finally, because compile_parse_context
inherits from basic_format_parse_context
, implementations of formatter<T>::parse
can still happily take a basic_format_parse_context<char>&
and continue to work. It’s just that now, during compile time, the dynamic type of that context will be different. This means we can add this functionality without breaking user code or requiring the user to make any other changes.
Note that even here, compile_parse_context
doesn’t have the actual format arguments - just their types.
basic_format_parse_context
Currently, we specify a constructor for basic_format_parse_context
, though we don’t do the same for basic_format_context
. Only the implementation should be constructing a basic_format_parse_context
anyway - the constructor we do specify doesn’t let us propagate the state properly, and the thing isn’t copyable or movable. The constructor is a bit problematic in that its presence would seem to require specifying just how all this type information from the arguments is encoded.
However, actually using this constructor in a way that requires reading arguments is inherently problematic - as the user has no way of providing those arguments in the future. Using this constructor just to parse a format string is at least potentially usable:
The latter implementation is significantly more tedious, but only requires parsing the format string for the int
once. This is something that somebody might actually write, so it needs to stay supported. But this is really only useful in the case where the “fake” parse context has no arguments - which is happily the case where we also don’t have to worry about how to propagate type information for those arguments, since there aren’t any.
In {fmt}
, we have check_dynamic_spec(int)
. This is sufficient for all the standard types - for whom a dynamic spec is integral, and that’s the only thing you’d want to check. But user-defined types might have arbitrary other dynamic parameters, which need not be integral themselves. So the user will need to specify what the allowed types are somehow - in a way that doesn’t require an arbitrary predicate (since we want to avoid the question of dealing with side effects).
There’s basically two ways of doing this:
enum
, similar to fmt::detail::type
, and add a function like:enum
, instead make this a function template (the implementation would then convert those types into the corresponding enum anyway):In both cases, this function only has effects during constant evaluation time - and the only effect is to force a compile error. Either way, we can then also, for convenience, provide a few helpers for all the common cases:
These both have clear use-cases: dynamic width or precision for the former, dynamic delimiter for the latter.
The enum approach requires specifying an enum. The template approach, if users make their formatter<T>::parse
a function template (which is going to be the common case, especially since you can just write auto&
), requires writing .template
(which is… still shorter, but also awful):
This paper proposes the template approach.
Change 22.14.2.2 [format.string.std]/10:
10 If
{ arg-idopt }
is used in a width or precision option, the value of the corresponding formatting argument is used as the value of the option.If theThe option is valid only if the corresponding formatting argument isnotof standard signed or unsigned integer type, or. If its value is negative, an exception of type format_error is thrown.
Add to 22.14.6.5 [format.parse.ctx]:
namespace std { template<class charT> class basic_format_parse_context { public: using char_type = charT; using const_iterator = typename basic_string_view<charT>::const_iterator; using iterator = const_iterator; private: iterator begin_; // exposition only iterator end_; // exposition only enum indexing { unknown, manual, automatic }; // exposition only indexing indexing_; // exposition only size_t next_arg_id_; // exposition only size_t num_args_; // exposition only public: - constexpr explicit basic_format_parse_context(basic_string_view<charT> fmt, - size_t num_args = 0) noexcept; + constexpr explicit basic_format_parse_context(basic_string_view<charT> fmt) noexcept; basic_format_parse_context(const basic_format_parse_context&) = delete; basic_format_parse_context& operator=(const basic_format_parse_context&) = delete; constexpr const_iterator begin() const noexcept; constexpr const_iterator end() const noexcept; constexpr void advance_to(const_iterator it); constexpr size_t next_arg_id(); constexpr void check_arg_id(size_t id); + template<class... Ts> + constexpr void check_dynamic_spec(size_t id) noexcept; + constexpr void check_dynamic_spec_integral(size_t id) noexcept; + constexpr void check_dynamic_spec_string(size_t id) noexcept; }; }
Remove the constructor:
constexpr explicit basic_format_parse_context(basic_string_view<charT> fmt
,size_t num_args = 0) noexcept;2 Effects: Initializes
begin_
withfmt.begin()
,end_
withfmt.end()
,indexing_
withunknown
,next_arg_id_
with0
, andnum_args_
with0. [ Note 1: Any call tonum_args
next_arg_id
,check_arg_id
, orcheck_dynamic_spec
on an instance ofbasic_format_parse_context
initialized using this constructor is not a core constant expression. — end note ]
And then add at the bottom:
9 Effects: If
indexing_ != automatic
, equivalent to:10 Throws:
format_error
ifindexing_ == automatic
which indicates mixing of automatic and manual argument indexing.11 Remarks: Call expressions where
id >= num_args_
are not core constant expressions ([expr.const]).12 Mandates: The types in
Ts...
are unique. Each type inTs...
is one ofbool
,char_type
,int
,unsigned int
,long long int
,unsigned long long int
,float
,double
,long double
,const char_type*
,basic_string_view<char_type>
, orconst void*
.13 Remarks: Call expressions where
id >= num_args_
or the type of the corresponding format argument (after conversion tobasic_format_arg<Context>
) is not one of the types inTs...
are not core constant expressions ([expr.const]).14 Effects: Equivalent to:
15 Effects: Equivalent to:
Bump __cpp_lib_format
in 17.3.2 [version.syn]:
[ Editor's note: There are three formatting papers that are going to be moved at the same time: P2510 (Formatting Pointers), this one, and P2637 (Member Visit). They should end up with three consecutive values for __cpp_lib_format
(not all the same one): 202304L
, 202305L
, and 202306L
, respectively ]
Thanks to Tim Song for discussing the issues and helping with the wording. Thanks to Victor Zverovich for having already solved the problem.
[P2216R3] Victor Zverovich. 2021-02-15. std::format improvements.
https://wg21.link/p2216r3
[P2757R0] Barry Revzin. 2023-01-08. Type checking format args.
https://wg21.link/p2757r0
[P2757R1] Barry Revzin. 2023-03-14. Type checking format args.
https://wg21.link/p2757r1
[P2757R2] Barry Revzin. 2023-05-16. Type checking format args.
https://wg21.link/p2757r2