Document #: | P2286R1 |
Date: | 2021-02-19 |
Project: | Programming Language C++ |
Audience: |
LEWG |
Reply-to: |
Barry Revzin <barry.revzin@gmail.com> |
[P2286R0] suggested making all the formatting implementation-defined. Several people reached out to me suggesting in no uncertain terms that this is unacceptable. This revision lays out options for such formatting.
[LWG3478] addresses the issue of what happens when you split a string and the last character in the string is the delimiter that you are splitting on. One of the things I wanted to look at in research in that issue is: what do other languages do here?
For most languages, this is a pretty easy proposition. Do the split, print the results. This is usually only a few lines of code.
Python:
outputs
Java (where the obvious thing prints something useless, but there’s a non-obvious thing that is useful):
import java.util.Arrays; class Main { public static void main(String args[]) { System.out.println("xyx".split("x")); System.out.println(Arrays.toString("xyx".split("x"))); } }
outputs
Rust (a couple options, including also another false friend):
use itertools::Itertools; fn main() { println!("{:?}", "xyx".split('x')); println!("[{}]", "xyx".split('x').format(", ")); println!("{:?}", "xyx".split('x').collect::<Vec<_>>()); }
outputs
Kotlin:
outputs
Go:
outputs
JavaScript:
outputs
And so on and so forth. What we see across these languages is that printing the result of split is pretty easy. In most cases, whatever the print mechanism is just works and does something meaningful. In other cases, printing gave me something other than what I wanted but some other easy, provided mechanism for doing so.
Now let’s consider C++.
#include <iostream> #include <string> #include <ranges> #include <format> int main() { // need to predeclare this because we can't split an rvalue string std::string s = "xyx"; auto parts = s | std::views::split('x'); // nope std::cout << parts; // nope (assuming std::print from P2093) std::print("{}", parts); std::cout << "["; char const* delim = ""; for (auto part : parts) { std::cout << delim; // still nope std::cout << part; // also nope std::print("{}", part); // this finally works std::ranges::copy(part, std::ostream_iterator<char>(std::cout)); // as does this for (char c : part) { std::cout << c; } delim = ", "; } std::cout << "]\n"; }
This took me more time to write than any of the solutions in any of the other languages. Including the Go solution, which contains 100% of all the lines of Go I’ve written in my life.
Printing is a fairly fundamental and universal mechanism to see what’s going on in your program. In the context of ranges, it’s probably the most useful way to see and understand what the various range adapters actually do. But none of these things provides an operator<<
(for std::cout
) or a formatter specialization (for format
). And the further problem is that as a user, I can’t even do anything about this. I can’t just provide an operator<<
in namespace std
or a very broad specialization of formatter
- none of these are program-defined types, so it’s just asking for clashes once you start dealing with bigger programs.
The only mechanisms I have at my disposal to print something like this is either
ranges::copy
into an output iterator (which is more differently bad), orfmt::format
.That’s right, there’s a fourth option for C++ that I haven’t shown yet, and that’s this:
#include <ranges> #include <string> #include <fmt/ranges.h> int main() { std::string s = "xyx"; auto parts = s | std::views::split('x'); fmt::print("{}\n", parts); fmt::print("[{}]\n", fmt::join(parts, ",")); }
outputting
And this is great! It’s a single, easy line of code to just print arbitrary ranges (include ranges of ranges).
And, if I want to do something more involved, there’s also fmt::join
, which lets me specify both a format specifier and a delimiter. For instance:
std::vector<uint8_t> mac = {0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff}; fmt::print("{:02x}\n", fmt::join(mac, ":"));
outputs
fmt::format
(and fmt::print
) solves my problem completely. std::format
does not, and it should.
The Ranges Plan for C++23 [P2214R0] listed as one of its top priorities for C++23 as the ability to format all views. Let’s go through the issues we need to address in order to get this functionality.
The standard library is the only library that can provide formatting support for standard library types and other broad classes of types like ranges. In addition to ranges (both the conrete containers like vector<T>
and the range adaptors like views::split
), there are several very commonly used types that are currently not printable.
The most common and important such types are pair
and tuple
(which ties back into Ranges even more closely once we adopt views::zip
and views::enumerate
). fmt
currently supports printing such types as well:
outputs
Another common and important set of types are std::optional<T>
and std::variant<Ts...>
. fmt
does not support printing any of the sum types. There is not an obvious representation for them in C++ as there might be in other languages (e.g. in Rust, an Option<i32>
prints as either Some(42)
or None
, which is also the same syntax used to construct them).
However, the point here isn’t necessarily to produce the best possible representation (users who have very specific formatting needs will need to write custom code anyway), but rather to provide something useful. And it’d be useful to print these types as well. However, given that optional
and variant
are both less closely related to Ranges than pair
and tuple
and also have less obvious representation, they are less important.
There are several questions to ask about what the representation should be for printing.
vector<int>{1, 2, 3}
be printed as {1, 2, 3}
(as fmt
currently does and as the type is constructed in C++) or [1, 2, 3]
(as is typical representationally outside of C++)?pair<int, int>{3, 4}
be printed as {3, 4}
(as the type is constructed) or as (3, 4)
(as fmt
currently does and is typical representationally outside of C++)?char
and string
in the context of ranges and tuples be printed as quoted (as fmt
currently does) or unquoted (as these types are typically formatted)?What I’m proposing is the following:
print:
That is: ranges are surrounded with []
s and delimited with ", "
. Pairs and tuples are surrounded with ()
s and delimited with ", "
. Types like char
, string
, and string_view
are printed quoted.
It is more important to me that ranges and tuples are visually distinct (in this case []
vs ()
, but the way that fmt
currently does it as {}
vs ()
is also okay) than it would be to quote the string-y types. My rank order of the possible options for the map m
above is:
Ranges | Tuples | Quoted? | Formatted Result |
---|---|---|---|
[]
|
()
|
✔️ |
[("hello", 'h'), ("world", 'w')]
|
{}
|
()
|
✔️ |
{("hello", 'h'), ("world", 'w')}
|
[]
|
()
|
❌ |
[(hello, h), (world, w)]
|
{}
|
()
|
❌ |
{(hello, h), (world, w)}
|
{}
|
{}
|
❌ |
{{hello, h}, {world, w}}
|
My preference for avoiding {}
in the formatting is largely because it’s unlikely the results here can be used directly for copying and pasting directly into initialization anyway, so the priority is simply having visual distinction for the various cases.
With quoting, the question is how does the library choose if something is string-like and thus needs to be quoted. Currently, fmtlib makes this determination by checking for the presence of certain operations (p->length()
, p->find('a')
, and p->data()
). We could instead add a variable template called enable_formatting_as_string
to allow for opting into this kind of formatting.
There’s three layers of potential functionality:
Top-level printing of ranges: this is fmt::print("{}", r);
A format-joiner which allows providing a format specifier for each element and a delimiter: this is fmt::print("{:02x}", fmt::join(r, ":"))
.
A more involved version of a format-joiner which takes a delimiter and a callback that gets invoked on each element. fmt
does not provide such a mechanism, though the Rust itertools library does:
This paper suggests the first two and encourages research into the third.
The implementation experience in fmt
is that directly formatting ranges does not support any format specifiers, but fmt::join
supports providing a specifier per element as well as providing the delimiter and wrapping brackets.
We could add the same format specifier support for direct formatting of ranges as fmt::join
supports, but it doesn’t seem especially worthwhile. If you don’t care about formatting, "{}"
is all you need. If you do care about formatting, it’s likely that you care about more than just the formatting of each individual element — you probably care about other things do. At which point, you’d likely need to use fmt::join
anyway.
That seems like the right mix of functionality to me.
const
-iterable?There are several C++20 range adapters which are not const
-iterable. These include views::filter
and views::drop_while
. But std::format
takes its arguments by Args const&...
, so how could they be printable?
fmt
handles this just fine even with its formatter specialization taking by const by converting the range to a view first.
So we can do something like:
But, this still doesn’t cover all the cases. If R
is a type such that R const
is a view
but R
is move-only, that won’t compile. We can work around this case. But if R
is a type that such that view<R> and not range<R const> and not copyable<R>
, there’s really no way of getting this to work without changing the API.
Notably, the proposed std::generator<T>
is one such type [P2168R0]
If we do want to support this case (and fmt
does not), then we will need to change the API of format
to take by forwarding reference instead of by const reference. That is, replace
with:
But it’s not that easy. While we would need std::generator<T>
to bind to Args&&
rather than Args const&
, there are other cases which have the exact opposite problem: bit-fields need to bind to Args const&
and cannot bind to Args&&
.
So we can’t really change the API to support this use-case without breaking other use-cases. But there are workarounds for this case:
fmt::join
, which does take by forwarding referenceranges::ref_view
can wrap an lvalue of such a view, the resulting view would be const
-iterable.For example:
auto ints_coro(int n) -> std::generator<int> { for (int i = 0; i < n; ++i) { co_yield i; } } fmt::print("{}", ints_coro(10)); // error fmt::print("[{}]", fmt::join(ints_coro(10), ", ")); // ok auto v = ints_coro(10); fmt::print("{}", std::ranges::ref_view(v)); // ok, if tedious fmt::print("{}", ints_coro(10) | ranges::to<std::vector>); // ok, but expensive fmt::print("{}", views::iota(0, 10)); // ok, but harder to implement
You can see an example of formatting a move-only, non-const
-iterable range on compiler explorer [fmt-non-const].
It’s quite important that a std::string
whose value is "hello"
gets printed as hello
rather than something like [h, e, l, l, o]
.
This would basically fall out no matter how we approach implementing such a thing, so in of itself is not much of a concern. However, for users who have either custom containers or want to customize formatting of a standard container for their own types, they need to make sure that they can provide a specialization which is more constrained than the standard library’s for ranges. To ensure that they can do that, I think we need to be clear about the specific constraint we use when we specify this.
format
or std::cout
?Just format
is sufficient.
vector<bool>
?Nobody expected this section.
The value_type
of this range is bool
, which is formattable. But the reference
type of this range is vector<bool>::reference
, which is not. In order to make the whole type formattable, we can either make vector<bool>::reference
formattable (and thus, in general, a range is formattable if both its value and reference types are formattable) or allow formatting to fall back to constructing a value
for each reference
(and thus, in general, a range is formattable if at least its value is).
For most ranges, the value_type
is remove_cvref_t<reference>
, so there’s no distinction here between the two options. And even for zip
, there’s still not much distinction since it just wraps this question in tuple
since again for most ranges the types will be something like tuple<T, U>
vs tuple<T&, U const&>
, so again there isn’t much distinction.
vector<bool>
is one of the very few ranges in which the two types are truly quite different. So it doesn’t offer much in the way of a good example here, since bool
is cheaply constructible from vector<bool>::reference
. Though it’s also very cheap to provide a formatter specialization for vector<bool>::reference
. We might as well.
The standard library should add specializations of formatter
for:
range
whose value_type
and reference
are formattable,pair<T, U>
if T
and U
are formattable,tuple<Ts...>
if all of Ts...
are formattable,vector<bool>::reference
(which does as bool
does).Ranges should be formatted as [x, y, z]
while tuples should be formatted as (a, b, c)
. For types that satisfy both (e.g. std::array
), they’re treated as ranges. In the context of formatting ranges, types that are string-like (e.g. char
, string
, string_view
) should be formatted as being quoted (with string-like being determined via variable template trait).
Formatting ranges does not support any additional format specifiers.
The standard library should also add a utility std::format_join
(or any other suitable name, knowing that std::views::join
already exists), following in the footsteps of fmt::join
, which allows the user to provide more customization in how ranges and tuples get formatted.
For types like std::generator<T>
(which are move-only, non-const-iterable ranges), users will have to either use std::format_join
facility or use something like ranges::ref_view
as shown earlier.
[fmt-non-const] Barry Revzin. 2021. Formatting a move-only non-const-iterable range.
https://godbolt.org/z/149YqW
[LWG3478] Barry Revzin. views::split drops trailing empty range.
https://wg21.link/lwg3478
[P2168R0] Corentin Jabot, Lewis Baker. 2020-05-16. generator: A Synchronous Coroutine Generator Compatible With Ranges.
https://wg21.link/p2168r0
[P2214R0] Barry Revzin, Conor Hoekstra, Tim Song. 2020-10-15. A Plan for C++23 Ranges.
https://wg21.link/p2214r0
[P2286R0] Barry Revzin. 2021-01-15. Formatting Ranges.
https://wg21.link/p2286r0