1. Introduction
As found from usage experience, implicit conversions introduced by [P1391] have undesirable side effects and break existing use cases for
as
a string-reference type ([N3921]). Moreover, [P1391] uses contiguity as a
proxy for detecting if a type is string-like which appears to be conceptually
wrong. This paper proposes removing the problematic conversions.
2. Problems
[P1391] made
and, more generally,
implicitly
constructible from any contiguous range of characters and, as it turned out but
not mentioned in the paper, not just characters.
Like
this seemed like a good idea at the time even to the author
of the current paper who didn’t put too much thought into the implications or
whether the design is particularly sound.
Unfortunately even a brief exposure to the actual implementation and initial
usage experience revealed severe issues, some of which are summarized in this
paper.
2.1. Is vector
a string?
Consider the following simple example:
template < typename Container > auto ( const Container & c ) -> std :: enable_if_t <! std :: is_convertible_v < Container , std :: string_view >> { std :: cout << '[' ; const char * sep = "" ; for ( const auto & item : c ) { std :: cout << sep << item ; sep = ", " ; } std :: cout << ']' ; } void ( std :: string_view s ) { std :: cout << '"' << s << '"' ; }
The
function takes an argument and prints it either as a quoted string
or as a comma-separated list of values delimited by
.
What will
output?
Thanks to newly added implicit conversions the answer depends on the C++
standard version. In C++17 - C++20 this prints
as expected while in
the upcoming C++23 the output suddenly changes to
.
In C++17 - C++20 it was perfectly reasonable to assume that
means
a reference to something string-like such as
or a string literal.
Quoting [N3921] that introduced
:
Google, LLVM, and Bloomberg have independently implemented a string-reference type to encapsulate this kind of argument.
is implicitly constructible from
string_view and
const char * .
std :: string
The string nature of
is also indicated in its name, its API and
the fact that
takes character traits as a template
parameter.
The new implicit conversions broke that assumption, effectively changing the
meaning of
to denote not a reference to a string-like type but a
reference to an arbitrary contiguous range.
Conceptually the problem with these conversions is that they confuse representation with semantics using contiguity as a proxy for being a string. Consider
( std :: list { 'a' , 'b' }); ( std :: deque { 'a' , 'b' }); ( std :: vector { 'a' , 'b' });
C++17 | C++20 | C++23 | |
Output |
|
|
|
Why is
different from other containers and is it really "string-like"
as implicit convertibility to
suggests?
Of course we could change the definition of
to workaround the issue but
this won’t fix the underlying problem. We no longer have a string-reference
type which invalidates the goal of [N3921]. Instead we have a
contiguous-range-reference type with a misleading name and a string-like API.
The introduction of
in C++20 makes this design look even stranger
because
is a natural representation for the above type.
If this wasn’t bad enough, the same applies to non-character types as well, enabling such fun examples as:
std :: basic_string_view s = std :: vector { 42.0 };
or, more practically, a generic version of the
example above where
a contiguous range of non-characters can be printed as a pseudo-string.
This is not a theoretical problem. There were at least two bug reports in {fmt}, an open-source formatting library ([FMT-BUG-2585], [FMT-BUG-2634]), of subtle breakages related to this change even though it requires opting into an experimental C++23 standard library implementation which is very uncommon.
To solve the problem in {fmt} we’ll have to indefinitely continue using a
replacement for
together with a few workarounds with no
chance of eventually converging on
as a string-reference
type. The same is likely true for some other text processing and serialization
use cases that need to distinguish between strings and containers.
2.2. Type unsafety
Another problem is that
has double meaning and is used as a code unit
type or as a byte depending on the context. Additional semantic context may be
added by types built on top of
, some of which may now become
unexpectedly convertible to
. As pointed out by users,
and
are commonly used as byte buffers and implicit
conversions could introduce type safety problems.
2.3. Nonexisting practice
Maybe [P1391] standardizes existing practice? Let’s look at the types that
inspired
:
-
LLVM’s
StringRef -
Google’s
StringPiece
None of them provides a constructor from a contiguous range or even a vector. So this feature doesn’t standardize existing practice but is completely novel which explains why even a brief exposure to the implementation revealed a number of issues.
3. Alternatives
The main part of the motivation of [P1391] is compelling:
While P1206 gives a general motivation for range constructors, it’s especially important for string_view because there exist in a lot of codebases string types that would benefit from being convertible to
. For example,
string_view ,
llvm :: StringRef ,
QByteArray ,
fbstring ...
boost :: container :: string
However, the solution is overreaching and as shown above breaks the main use case for a string-like reference type by introducing semantically lossy implicit conversions.
Whether a type is string-like should generally be controlled by the class
author, not detected via some heuristic. We already have a mechanism for this
that is used in
, namely
. If it is
insufficient a proper solution would be to introduce another opt-in mechanism
such as a trait that specifies if the type is string-like and is eligible for
conversion into a
. The latter is not proposed by the current paper
which only tries to mitigate the damage done by [P1391] before it is too late.
4. Proposal
Remove wording introduced by [P1391] from the standard.
5. Acknowledgements
Thanks Matthias Moulin and Barry Revzin for independently bringing up this issue.