1. Changelog
-
R5
-
Changed the proposed wording in order to take into account the changes proposed by [LWG3950].
-
Removed the Annex C entry at the request of LWG.
-
-
R4
-
Incorporated feedback from the 2023-04-18 LEWG telecon.
-
Reverted to a free function template implementation (Approach 2, as described below), following the poll(s) that showed no longer a consensus for using hidden friends and thus introducing an asymmetry.
-
Added the requested tests (involving
and a test producing ambiguities) to the working prototype.std :: filesystem :: path -
Rebased the wording on top of the latest draft.
-
-
R3
-
Reading improvements.
-
-
R2
-
Changed the signatures of the proposed operators to take precisely string views (instead of anything convertible to string views).
-
Made the proposed operators hidden friends, following a LEWG vote in a telecon.
-
Added a discussion about the implications of such a change.
-
Added an Annex C entry.
-
-
R1
-
Minor reading improvements.
-
-
R0
-
First submission.
-
2. Motivation and Scope
The Standard is currently lacking support for concatenating strings and string views by means of operator+ :
std :: string calculate ( std :: string_view prefix ) { return prefix + get_string (); // ERROR }
This constitutes a major asymmetry when considering the rest of
’s API related to string concatenation. In such APIs
there is already support for the corresponding view classes.
In general, this makes the concatenation APIs between string and string views have a poor usability experience:
std :: string str ; std :: string_view view ; // Appending str + view ; // ERROR str + std :: string ( view ); // OK, but inefficient str + view . data (); // Compiles, but BUG! std :: string copy = str ; copy += view ; // OK, but tedious to write (requires explicit copy) copy . append ( view ); // OK, ditto // Prepending view + str ; // ERROR std :: string copy = str ; copy . insert ( 0 , view ); // OK, but tedious and inefficient
Similarly, the current situation is asymmetric when considering concatenation against raw pointers:
std :: string str ; str + "hello" ; // OK str + "hello" sv ; // ERROR "hello" + str ; // OK "hello" sv + str ; // ERROR
All of this is just bad ergonomics; the lack of
is
extremely surprising for end-users
(cf. this StackOverflow question),
and harms teachability and usability of
in lieu of raw
pointers.
Now, as shown above, there are workarounds available either in terms
of named functions (
,
, ...) or explicit conversions.
However it’s hard to steer users away from the convenience syntax
(which is ultimately the point of using
in the first
place). The availability of the other overloads of
opens
the door to bad code; for instance, it risks neglecting the value of
view classes:
std :: string prepend ( std :: string_view prefix ) { return std :: string ( prefix ) + get_string (); // inefficient }
And it may even open the door to (subtle) bugs:
std :: string result1 = str + view ; // ERROR. <Sigh>, ok, let me rewrite as... std :: string result2 = str + std :: string ( view ); // OK, but this is inefficient. How about... std :: string result3 = str + view . data (); // Compiles; but BUG!
The last line exhibits undefined behavior if
is not NUL terminated,
and also behaves differently in case it has embedded NULs.
This paper proposes to fix these API flaws by adding suitable
overloads between string and string view classes. The
changes required for such operators are straightforward and should pose
no burden on implementations.
2.1. Why are those overloads missing in the first place?
[N3685] ("
: a non-owning reference to a string, revision
4") offers the reason:
I also omitted
because LLVM returns a lightweight object from this overload and only performs the concatenation lazily. If we define this overload, we’ll have a hard time introducing that lightweight concatenation later.
operator + ( basic_string , basic_string_view )
Subsequent revisions of the paper no longer have this paragraph.
There is a couple of considerations that we think are important here.
-
has been approved for C++17 in Jacksonville (February 2016). At the time of this writing, such a "string builder" facility has not been proposed for standardization (as far as we know). Neglecting a completely reasonable feature to users (concatenation viastring_view
) for so long, in the name of an yet unseen future "major" feature, is a disservice to them.operator + -
We strongly feel that overloading
is completely outside of the design space for a string builder class. There is absolutely no reason whyoperator +
should use the builder, butstr + "hello" sv
should not -- not to mention cases likestr + "hello"
. One cannot however change the semantics of the existingstrA + strB + strC
overloads without breaking API/ABI compatibility. In Qt, [QStringBuilder] usesoperator +
by default; blindly replacingoperator %
withoperator +
when concatenating strings comes with its own share of problems (not only it is API incompatible, but it causes dangling references in a number of scenarios).operator %
In short: we do not see any reason to further withhold the proposed additions.
3. Impact On The Standard
This proposal is a pure library extension.
This proposal does not depend on any other library extensions.
This proposal does not require any changes in the core language.
4. Design Decisions
4.1. Minimizing the number of allocations
The proposed wording builds on top / reuses of the existing one for
. In particular, no attempts have been made at e.g. minimizing
memory allocations (by allocating only one buffer of suitable size,
then concatenating in that buffer). Implementations already employ such
mechanisms internally, and we would expect them to do the same also for
the new overloads (for instance, see here for libstdc++ and here for libc++).
4.2. Should the proposed operators be hidden friends? Should they be function templates?
There are several ways to define the proposed overloads.
4.2.1. Approach 1: free non-friend function templates, taking exactly a string view
The signature would look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] }; template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( const basic_string < charT , traits , Allocator >& lhs , basic_string_view < charT , traits > rhs ); // Repeat for the other overloads with swapped arguments, rvalues, etc.
This approach closely follows the pre-existing overloads for
. In particular, here the newly added operators are not
hidden friends (which may increase compilation times, give worse
compile errors, etc.).
Still: just like hidden friends, it is not possible to use these
operator with datatypes implicitly convertible to
/
specializations:
class convertible_to_string { public : /* implicit */ operator std :: string () const ; }; convertible_to_string cts ; cts + "hello" s ; // ERROR (pre-existing) cts + "hello" sv ; // ERROR
The error stems from the fact that the existing (and the proposed)
are function templates, and implicit conversions are not
possible given the signatures of these functions: all the parameter
types of
contain a template-parameter that needs to be
deduced, in which case implicit conversions are not considered
(this is [temp.arg.explicit/7]).
While the lack of support for types implictly convertible to strings may be desiderable (for symmetry), the lack of support for types implictly convertible to string views is questionable. String view operations explictly support objects of types convertible to them. For instance:
std :: string s ; convertible_to_string cts ; s == cts ; // ERROR std :: string_view sv ; convertible_to_string_view ctsv ; sv == ctsv ; // OK; [string.view.comparison/1]
The above definition of the overloads would prevent types convertible to string views to be appended/prepended to strings, again because the implicit conversion towards the string view type would be prevented. This would even be inconsistent with the existing string’s member functions:
std :: string s ; convertible_to_string_view ctsv ; s . append ( ctsv ); // OK, [string.append/3] s + ctsv ; // ERROR, ???
Finally, overloads added as non-member/non-friend function templates
are not viable when using something like
:
std :: reference_wrapper < std :: string > rs ( ~~~ ); std :: reference_wrapper < std :: string_view > rsv ( ~~~ ); rs + rs ; // ERROR (pre-existing) rs + rsv ; // ERROR
This is because an argument of type e.g.
(i.e.
)
can never match against a parameter of type
.
4.2.2. Approach 2: free non-friend function templates, taking anything convertible to a string view
This is similar to approach n. 1, except that the string view argument
would also accept any type which is convertible to a string view. The
precedent for this would be the existing functions for
concatenating/inserting strings (e.g.
,
,
), all of which take a parameter of any type convertible to a string view; as well as the comparison operators for string views,
where "[...] implementations shall provide sufficient additional overloads
[...] so that an object
with an implicit conversion to
can be
compared" ([string.view.comparison/1]).
Therefore, the proposed signatures would look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] }; template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( const basic_string < charT , traits , Allocator >& lhs , type_identity_t < basic_string_view < charT , traits >> rhs ); // ^^^^^^^^^^^^^^^ make a non-deduced context // Repeat for the other overloads with swapped arguments, rvalues, etc.
Note: this may or may not be the actual proposed wording. We could instead handwave the actual overload set by using the "sufficient additional overloads" wording. An implementation could therefore choose to use another implementation strategy, such as SFINAE, constraints, and so on. (See also [LWG3950].)
Apart from allowing to concatenate strings with objects of types convertible to string views, this approach still forbids the usage of types convertible to
strings, as well as types such as
:
std :: string s ; std :: string_view sv ; convertible_to_string cts ; convertible_to_string_view ctsv ; s + sv ; // OK s + ctsv ; // OK s + cts ; // ERROR cts + sv ; // ERROR
4.2.3. Approach 3: hidden friends, non-template functions
Basically, this would be an application of the Barton–Nackman idiom in combination with hidden friends ([hidden.friends]).
The proposed operators would look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] constexpr friend basic_string operator + ( const basic_string & lhs , basic_string_view < charT , traits > ) { /* hidden friend */ } // Repeat for the other overloads with swapped arguments, rvalues, etc. };
In such an approach, one of the arguments must still be a string object, otherwise the overload is not even added to the overload set (hidden friend).
The other argument can be any object implicitly convertible to a string view. Since the overload is not a function template, implicit conversions here "kick in" and work as expected, without the need of adding additional overloads (or declaring the operators as function templates):
std :: string s ; convertible_to_string_view ctsv ; s + ctsv ; // OK
There is a perhaps surprising side-effect, however: defining this overload set would also allow concatenation between a string and an object convertible to a string. For instance:
std :: string s ; convertible_to_string cts ; s == cts ; // ERROR s + cts ; // OK (!)
In the last line, the lhs of type
makes the various
overloads visible to lookup.
Then, the
is selected,
converting the
from
to
and the
from
to a rvalue
.
Finally, using types such as
would work
transparently:
std :: reference_wrapper < std :: string > rs ( ~~~ ); rs + "hello" sv ; // OK
In this example, ADL would add the hidden friend operators to the
overload set (cf. [basic.lookup.argdep/3.2]), operators which again are
non-template functions. Then, the
is selected, since we can implicitly convert the
first parameter from the argument of type
.
4.2.4. Approach 4: hidden friends, function templates, taking anything convertible to a string view
This approach is similar to approach 2, however makes the proposed operators hidden friends.
The proposed operators would in principle look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , basic_string_view < C , T > ) { /* hidden friend */ } template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , type_identity_t < basic_string_view < C , T >> ) { /* hidden friend */ } // Repeat for the other overloads with swapped arguments, rvalues, etc. };
In practice the above does not work, as it leads to redefinition
errors if
is instatiated with different template
parameters (which is of course the case). Note that, in order for the
operators to be hidden friends, their definition must be present in
’s class body; multiple instantiations of
would therefore redefine the same function template multiple times.
Hence, an actual implementation has to employ some tricks, such as isolating the operators in a non-template base class:
template < class charT , class traits , class Allocator > class basic_string ; class __basic_string_base // exposition-only { template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , basic_string_view < C , T > ) { /* hidden friend */ } template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , type_identity_t < basic_string_view < C , T >> ) { /* hidden friend */ } // Repeat for the other overloads with swapped arguments, rvalues, etc. }; template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string : __basic_string_base { // [...] };
This approach brings the same semantics as of approach 2, with
the exception that the operators are not found through ordinary
unqualified/qualified lookup (because they are hidden friends).
It is still not possible to call these operators using an
argument of a type convertible to string, nor to call them through
.
4.2.5. Summary
Works between...? | Approach 1 | Approach 2 | Approach 3 | Approach 4 | |
and
| ✔ | ✔ | ✔ | ✔ | |
and an object convertible to
| ✘ | ✔ | ✔ | ✔ | |
and an object convertible to
| ✘ | ✘ | ✔ | ✔ | |
Two objects convertible to
| ✘ | ✘ | ✘ | ✘ | |
and
| ✘ | ✘ | ✔ | ✘ |
4.2.6. Which strategy should be used?
The R1 revision of this paper implemented approach 2, for symmetry with
the pre-existing overloads of
between strings.
During the 2022-08-16 LEWG telecon, a poll indicated weak consensus (2/5/4/2/0) for making the proposed operators hidden friends, even at the cost of making them inconsistent with the existing overloads.
R3 of this paper implemented approach 3, and elaborated on the consequences of the different approaches, including the (possibly unexpected) ability of concatenating objects of types convertible to strings. During the review of R3 in the LEWG telecon on 2023-04-18, when presented with this information, there was no longer consensus for the hidden friends approach.
Therefore, in R4 we are reverting to approach 2, again with the idea of keeping the overload set consistent with the pre-existing overloads.
4.3. Backwards compatibility and Annex C
Library Evolution has requested a note to be added to Annex C in case the proposed operators break backwards compatibility.
If users define an
overload between classes from the
Standard Library (in another namespace than
), and then the
Standard Library starts providing such an overload and user code stops
compiling (due to redefinitions, ambiguities, etc.), does this
constitute a source-incompatible change?
[SD-8] is not particularly explicit on the subject of adding new overloads for operators, although it does state that:
Primarily, the standard reserves the right to:
[...]
Add new names to any entity within any reserved namespace, including but not limited to:
Functions (this includes new member functions and overloads to existing functions)
Operators are functions, but they’re also a particular class of them, as they are practically never called using an explicit function-call expression. Instead, any ordinary code relies on the special rules of overload resolution for operators ([over.match.oper]).
The question here is therefore is whether the Standard Library is simply allowed to alter the overload set available to operators, when they are used on objects of datatypes defined in the library itself. It is easy to argue that, if both arguments to an operator overload are library datatypes, then the library reserves the right to add such overload without worrying about any possible breakage. Implicit conversions and ADL make however the situation slightly more complex.
We can construct an example as follows. The proposed
overloads require one of the arguments to be an object of a
specialization (see the discussion above
regarding [temp.arg.explicit/7]). Let’s therefore focus on the other argument’s type.
Suppose that a user declared a
overload like this:
struct user_datatype ; R operator + ( std :: string , user_datatype );
Then this overload will always be preferred to the ones that we are
proposing (when passing a parameter of type
). This works
even if the type is implictly convertible to
and therefore
overload resolution does not exclude the overloads of the present proposal:
struct convertible_to_string_view { /* implicit */ operator std :: string_view () const ; }; R operator + ( std :: string , convertible_to_string_view ); // pre-existing convertible_to_string_view ctsv ; "hello" s + ctsv ; // still calls the user-defined operator+, as it's a better match
Let’s furthermore consider a further type convertible to both a
user-defined datatype as well as
. This could be, for
instance, a type convertible to a pre-C++17 custom string view class
which has also been "modernized" by adding a conversion to
:
struct my_string_view ; // pre-c++17, legacy std :: string operator + ( std :: string , my_string_view ); struct char_buffer { /* implicit */ operator my_string_view () const ; // legacy /* implicit */ operator std :: string_view () const ; // modern }; char_buffer buf ; std :: string result = "hello" s + buf ; // OK
Although it may seem that the call to
would now be
ambiguous between
and
, it actually is not
ambiguous and even calls the pre-existing
taking a
. The reason for this is that although both
overloads are viable, the one taking a
as defined
above is not a function template, while the one taking a
is actually a function template specialization; the
former overload ranks better ([over.match.best.general]/2.4).
What if the user-defined
is itself a function template?
For instance:
template < typename Char > struct basic_my_string_view ; using my_string_view = basic_my_string_view < char > ; template < typename Char > std :: basic_string < Char > operator + ( std :: basic_string < Char > , basic_my_string_view < Char > ); struct char_buffer { /* implicit */ operator my_string_view () const ; /* implicit */ operator std :: string_view () const ; }; char_buffer buf ; std :: string result = "hello" s + buf ; // was: ERROR; with the proposed changes: OK
The above code does not compile without the changes introduced by this
paper, again because implicit conversions are not considered due to
the deducible
template parameter. With the changes introduced by
this paper, the code now compiles, and
is called; the pre-existing overload is still not
viable. In other words: in this specific scenario the impact is positive.
What if user code employed some technique to enable implicit conversions
with
, for instance like this:
template < typename Char > struct basic_my_string_view ; using my_string_view = basic_my_string_view < char > ; template < typename Char > std :: basic_string < Char > operator + ( std :: basic_string < Char > , std :: type_identity_t < basic_my_string_view < Char >> ); struct char_buffer { /* implicit */ operator my_string_view () const ; /* implicit */ operator std :: string_view () const ; }; char_buffer buf ; std :: string result = "hello" s + buf ; // was: OK; with the proposed changes: AMBIGUOUS
In this last scenario the call to
becomes ambiguous with
the proposed changes.
[SD-8] does not seem to offer guidance here: is it OK for the Standard Library to break code that is "too generous" in its implicit conversions? In case, we are going to stay on the safe side, and document this possible breakage in Annex C.
(Note: this documentation has been removed in R5 at LWG’s request. It is still available in R4.)
5. Implementation experience
A working prototype of the changes proposed by this paper, done on top of GCC 13.1, is available in this GCC branch on GitHub. The entire libstdc++ testsuite passes with the changes applied. A smoke test is included.
Will Hawkins has very kindly contributed an implementation in libc++.
6. Technical Specifications
All the proposed changes are relative to [N4950].
6.1. Feature testing macro
In [version.syn], modify
#define __cpp_lib_string_view 201803L YYYYMML // also in <string>, <string_view>
with the value specified as usual (year and month of adoption of the present proposal).
6.2. Proposed wording
Modify [string.syn] as shown:
namespace std { [...] template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( basic_string < charT , traits , Allocator >&& lhs , charT rhs ); template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( const basic_string < charT , traits , Allocator >& lhs , type_identity_t < basic_string_view < charT , traits >> rhs ); template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( basic_string < charT , traits , Allocator >&& lhs , type_identity_t < basic_string_view < charT , traits >> rhs ); template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( type_identity_t < basic_string_view < charT , traits >> lhs , const basic_string < charT , traits , Allocator >& rhs ); template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( type_identity_t < basic_string_view < charT , traits >> lhs , basic_string < charT , traits , Allocator >&& rhs ); template < class charT , class traits , class Allocator > constexpr bool operator == ( const basic_string < charT , traits , Allocator >& lhs , const basic_string < charT , traits , Allocator >& rhs ) noexcept ;
Append the following content at the end of [string.op.plus]:
template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( const basic_string < charT , traits , Allocator >& lhs , type_identity_t < basic_string_view < charT , traits >> rhs ); � Effects: Equivalent to:
basic_string < charT , traits , Allocator > r = lhs ; r . append ( rhs ); return r ;
template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( basic_string < charT , traits , Allocator >&& lhs , type_identity_t < basic_string_view < charT , traits >> rhs ); � Effects: Equivalent to:
lhs . append ( rhs ); return std :: move ( lhs );
template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( type_identity_t < basic_string_view < charT , traits >> lhs , const basic_string < charT , traits , Allocator >& rhs ); � Effects: Equivalent to:
basic_string < charT , traits , Allocator > r = rhs ; r . insert ( 0 , lhs ); return r ;
template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( type_identity_t < basic_string_view < charT , traits >> lhs , basic_string < charT , traits , Allocator >&& rhs ); � Effects: Equivalent to:
rhs . insert ( 0 , lhs ); return std :: move ( rhs );
[Note: The usage ofas parameter ensures that an object of type
type_identity_t can always be concatenated to an object of a type
basic_string < charT , traits , Allocator > with an implicit conversion to
T , and vice versa, as per [over.match.oper]. — end note]
basic_string_view < charT , traits >
7. Acknowledgements
Thanks to KDAB for supporting this work.
Thanks to Will Hawkins for the discussions and the prototype implementation in libc++.
All remaining errors are ours and ours only.