1. Changelog
-
R3
-
Reading improvements.
-
-
R2
-
Changed the signatures of the proposed operators to take precisely string views (instead of anything convertible to string views).
-
Made the proposed operators hidden friends, following a LEWG vote in a telecon.
-
Added a discussion about the implications of such a change.
-
Added an Annex C entry.
-
-
R1
-
Minor reading improvements.
-
-
R0
-
First submission.
-
2. Motivation and Scope
The Standard is currently lacking support for concatenating strings and string views by means of operator+ :
std :: string calculate ( std :: string_view prefix ) { return prefix + get_string (); // ERROR }
This constitutes a major asymmetry when considering the rest of
's API related to string concatenation. In such APIs
there is already support for the corresponding view classes.
In general, this makes the concatenation APIs between string and string views have a poor usability experience:
std :: string str ; std :: string_view view ; // Appending str + view ; // ERROR str + std :: string ( view ); // OK, but inefficient str + view . data (); // Compiles, but BUG! std :: string copy = str ; copy += view ; // OK, but tedious to write (requires explicit copy) copy . append ( view ); // OK, ditto // Prepending view + str ; // ERROR std :: string copy = str ; copy . insert ( 0 , view ); // OK, but tedious and inefficient
Similarly, the current situation is asymmetric when considering concatenation against raw pointers:
std :: string str ; str + "hello" ; // OK str + "hello" sv ; // ERROR "hello" + str ; // OK "hello" sv + str ; // ERROR
All of this is just bad ergonomics; the lack of
is
extremely surprising for end-users
(cf. this StackOverflow question),
and harms teachability and usability of
in lieu of raw
pointers.
Now, as shown above, there are workarounds available either in terms
of named functions (
,
, ...) or explicit conversions.
However it’s hard to steer users away from the convenience syntax
(which is ultimately the point of using
in the first
place). The availability of the other overloads of
opens
the door to bad code; for instance, it risks neglecting the value of
view classes:
std :: string prepend ( std :: string_view prefix ) { return std :: string ( prefix ) + get_string (); // inefficient }
And it may even open the door to (subtle) bugs:
std :: string result1 = str + view ; // ERROR. <Sigh>, ok, let me rewrite as... std :: string result2 = str + std :: string ( view ); // OK, but this is inefficient. How about... std :: string result3 = str + view . data (); // Compiles; but BUG!
The last line exhibits undefined behavior if
is not NUL terminated,
and also behaves differently in case it has embedded NULs.
This paper proposes to fix these API flaws by adding suitable
overloads between string and string view classes. The
changes required for such operators are straightforward and should pose
no burden on implementations.
2.1. Why are those overloads missing in the first place?
[N3685] ("
: a non-owning reference to a string, revision
4") offers the reason:
I also omitted
because LLVM returns a lightweight object from this overload and only performs the concatenation lazily. If we define this overload, we’ll have a hard time introducing that lightweight concatenation later.
operator + ( basic_string , basic_string_view )
Subsequent revisions of the paper no longer have this paragraph.
There is a couple of considerations that we think are important here.
-
has been approved for C++17 in Jacksonville (February 2016). At the time of this writing, such a "string builder" facility has not been proposed for standardization (as far as we know). Neglecting a completely reasonable feature to users (concatenation viastring_view
) for so long, in the name of an yet unseen future "major" feature, is a disservice to them.operator + -
We strongly feel that overloading
is completely outside of the design space for a string builder class. There is absolutely no reason whyoperator +
should use the builder, butstr + "hello" sv
should not -- not to mention cases likestr + "hello"
. One cannot however change the semantics of the existingstrA + strB + strC
overloads without breaking API/ABI compatibility. In Qt, [QStringBuilder] usesoperator +
by default; blindly replacingoperator %
withoperator +
when concatenating strings comes with its own share of problems (not only it is API incompatible, but it causes dangling references in a number of scenarios).operator %
In short: we do not see any reason to further withhold the proposed additions.
3. Impact On The Standard
This proposal is a pure library extension.
This proposal does not depend on any other library extensions.
This proposal does not require any changes in the core language.
4. Design Decisions
4.1. Minimizing the number of allocations
The proposed wording builds on top / reuses of the existing one for
. In particular, no attempts have been made at e.g. minimizing
memory allocations (by allocating only one buffer of suitable size,
then concatenating in that buffer). Implementations already employ such
mechanisms internally, and we would expect them to do the same also for
the new overloads (for instance, see here for libstdc++ and here for libc++).
4.2. Should the proposed operators be hidden friends? Should they be function templates?
There are several ways to define the proposed overloads.
4.2.1. Approach 1: free non-friend function templates, taking exactly a string view
The signature would look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] }; template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( const basic_string < charT , traits , Allocator >& lhs , basic_string_view < charT , traits > rhs ); // Repeat for the other overloads with swapped arguments, rvalues, etc.
This approach closely follows the pre-existing overloads for
. In particular, here the newly added operators are not
hidden friends (which may increase compilation times, give worse
compile errors, etc.).
Still: just like hidden friends, it is not possible to use these
operator with datatypes implicitly convertible to
/
specializations:
class convertible_to_string { public : /* implicit */ operator std :: string () const ; }; convertible_to_string cts ; cts + "hello" s ; // ERROR (pre-existing) cts + "hello" sv ; // ERROR
The error stems from the fact that the existing (and the proposed)
are function templates, and implicit conversions are not
possible given the signatures of these functions: all the parameter
types of
contain a template-parameter that needs to be
deduced, in which case implicit conversions are not considered
(this is [temp.arg.explicit/7]).
While the lack of support for types implictly convertible to strings may be desiderable (for symmetry), the lack of support for types implictly convertible to string views is questionable. String view operations explictly support objects of types convertible to them. For instance:
std :: string s ; convertible_to_string cts ; s == cts ; // ERROR std :: string_view sv ; convertible_to_string_view ctsv ; sv == ctsv ; // OK; [string.view.comparison/1]
The above definition of the overloads would prevent types convertible to string views to be appended/prepended to strings, again because the implicit conversion towards the string view type would be prevented. This would even be inconsistent with the existing string’s member functions:
std :: string s ; convertible_to_string_view ctsv ; s . append ( ctsv ); // OK, [string.append/3] s + ctsv ; // ERROR, ???
Finally, overloads added as non-member/non-friend function templates
are not viable when using something like
:
std :: reference_wrapper < std :: string > rs ( ~~~ ); std :: reference_wrapper < std :: string_view > rsv ( ~~~ ); rs + rs ; // ERROR (pre-existing) rs + rsv ; // ERROR
This is because an argument of type e.g.
(i.e.
)
can never match against a parameter of type
.
4.2.2. Approach 2: free non-friend function templates, taking anything convertible to a string view
This is similar to approach n. 1, except that the string view argument
would also accept any type which is convertible to a string view. The
precedent for this would be the existing functions for
concatenating/inserting strings (e.g.
,
,
), all of which take a parameter of any type convertible to a string view; as well as the comparison operators for string views,
where "[...] implementations shall provide sufficient additional overloads
[...] so that an object
with an implicit conversion to
can be
compared" ([string.view.comparison/1]).
Therefore, the proposed signatures would look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] }; template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( const basic_string < charT , traits , Allocator >& lhs , basic_string_view < charT , traits > rhs ); template < class charT , class traits , class Allocator > constexpr basic_string < charT , traits , Allocator > operator + ( const basic_string < charT , traits , Allocator >& lhs , type_identity_t < basic_string_view < charT , traits >> rhs ); // Repeat for the other overloads with swapped arguments, rvalues, etc.
Note: this would not actually be the actual proposed wording. We could (and, in fact, would) handwave the actual overload set by using the "sufficient additional overload" wording. An implementation could therefore choose to use another implementation strategy, such as SFINAE, constraints, and so on.
Apart from allowing to concatenate strings with objects of types convertible to string views, this approach still forbids the usage of types convertible to
strings, as well as types such as
:
std :: string s ; std :: string_view sv ; convertible_to_string cts ; convertible_to_string_view ctsv ; s + sv ; // OK s + ctsv ; // OK s + cts ; // ERROR cts + sv ; // ERROR
4.2.3. Approach 3: hidden friends, non-template functions
Basically, this would be an application of the Barton–Nackman idiom in combination with hidden friends ([hidden.friends]).
The proposed operators would look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] constexpr friend basic_string operator + ( const basic_string & lhs , basic_string_view < charT , traits > ) { /* hidden friend */ } // Repeat for the other overloads with swapped arguments, rvalues, etc. };
In such an approach, one of the arguments must still be a string object, otherwise the overload is not even added to the overload set (hidden friend).
The other argument can be any object implicitly convertible to a string view. Since the overload is not a function template, implicit conversions here "kick in" and work as expected, without the need of adding additional overloads (or declaring the operators as function templates):
std :: string s ; convertible_to_string_view ctsv ; s + ctsv ; // OK
There is a perhaps surprising side-effect, however: defining this overload set would also allow concatenation between a string and an object convertible to a string. For instance:
std :: string s ; convertible_to_string cts ; s == cts ; // ERROR s + cts ; // OK (!)
In the last line, the lhs of type
makes the various
overloads visible to lookup.
Then, the
is selected,
converting the
from
to
and the
from
to a rvalue
.
Finally, using types such as
would work
transparently:
std :: reference_wrapper < std :: string > rs ( ~~~ ); rs + "hello" sv ; // OK
In this example, ADL would add the hidden friend operators to the
overload set (cf. [basic.lookup.argdep/3.2]), operators which again are
non-template functions. Then, the
is selected, since we can implicitly convert the
first parameter from the argument of type
.
4.2.4. Approach 4: hidden friends, function templates, taking anything convertible to a string view
This approach is similar to approach 2, however makes the proposed operators hidden friends.
The proposed operators would in principle look like this:
template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { // [...] template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , basic_string_view < C , T > ) { /* hidden friend */ } template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , type_identity_t < basic_string_view < C , T >> ) { /* hidden friend */ } // Repeat for the other overloads with swapped arguments, rvalues, etc. };
In practice the above does not work, as it leads to redefinition
errors if
is instatiated with different template
parameters (which is of course the case). Note that, in order for the
operators to be hidden friends, their definition must be present in
's class body; multiple instantiations of
would therefore redefine the same function template multiple times.
Hence, an actual implementation has to employ some tricks, such as isolating the operators in a non-template base class:
template < class charT , class traits , class Allocator > class basic_string ; class __basic_string_base // exposition-only { template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , basic_string_view < C , T > ) { /* hidden friend */ } template < class C , class T , class A > constexpr friend basic_string < C , T , A > operator + ( const basic_string < C , T , A >& lhs , type_identity_t < basic_string_view < C , T >> ) { /* hidden friend */ } // Repeat for the other overloads with swapped arguments, rvalues, etc. }; template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string : __basic_string_base { // [...] };
This approach brings the same semantics as of approach 2, with
the exception that the operators are not found through ordinary
unqualified/qualified lookup (because they are hidden friends).
It is still not possible to call these operators using an
argument of a type convertible to string, nor to call them through
.
4.2.5. Summary
Works between...? | Approach 1 | Approach 2 | Approach 3 | Approach 4 | |
String and string views | ✔ | ✔ | ✔ | ✔ | |
Strings and types convertible to string views | ✘ | ✓ | ✓ | ✓ | |
Strings and types convertible to strings | ✘ | ✘ | ✓ | ✓ | |
Types convertible to strings | ✘ | ✘ | ✘ | ✘ | |
of strings and string views
| ✘ | ✘ | ✔ | ✘ |
4.2.6. Which strategy should be used?
The R1 revision of this paper implemented approach 2, for symmetry with
the pre-existing overloads of
between strings.
During the 2022-08-16 LEWG telecon, a poll indicated weak consensus (2/5/4/2/0) for making the proposed operators hidden friends, even at the cost of making them inconsistent with the existing overloads.
This still leaves a choice between two approaches (non-template vs. template functions). Given that modern code should use the non-template approach, we are going to use approach 3.
4.3. Backwards compatibility and Annex C
Library Evolution has requested a note to be added to Annex C in case the proposed operators break backwards compatibility.
If users define an
overload between classes from the
Standard Library (in another namespace than
), and then the Standard Library starts providing such
an overload and user code stops compiling (due to redefinitions,
ambiguities, etc.), does this constitute a source-incompatible change?
[SD-8] is not particularly explicit on the subject of adding new overloads for operators, although it does state that:
Primarily, the standard reserves the right to:
[...]
Add new names to any entity within any reserved namespace, including but not limited to:
Functions (this includes new member functions and overloads to existing functions)
Operators are functions, but they’re also a particular class of them, as they are practically never called using an explicit function-call expression. Instead, any ordinary code relies on the special rules of overload resolution for operators ([over.match.oper]).
The question here is therefore is whether the Standard Library is simply allowed to alter the overload set available to operators, when they are used on objects of datatypes defined in the library itself. It is easy to argue that, if both arguments to an operator overload are library datatypes, then the library reserves the right to add such overload without worrying about any possible breakage. Implicit conversions and ADL make however the situation slightly more complex.
We can construct an example as follows. The proposed
overloads are all hidden friends of
, therefore one
of the arguments must be an object of a
specialization. Let’s therefore focus on the other argument’s type.
Suppose that a user declared a
overload like this:
struct user_datatype ; R operator + ( std :: string , user_datatype );
Then this overload will always be preferred to the ones that we are
proposing (when passing a parameter of type
). This works
even if the type is implictly convertible to
and therefore
overload resolution does not exclude the overloads of the present proposal:
struct convertible_to_string_view { /* implicit */ operator std :: string_view () const ; }; R operator + ( std :: string , convertible_to_string_view ); convertible_to_string_view ctsv ; "hello" s + ctsv ; // calls the user-defined operator+, as it’s a better match
However, let’s consider a further type convertible to both a
user-defined datatype as well as
:
struct user_datatype {}; R operator + ( std :: string , user_datatype ); struct multi_conv { /* implicit */ operator user_datatype () const ; /* implicit */ operator std :: string_view () const ; }; multi_conv mc ; "hello" s + mc ; // ERROR
With the present proposal, the last line stops compiling, because the newly added overloads make the call ambiguous.
The code shown in this very last snippet is not completely unreasonable. For instance, using better names for the various types involved, a user might have something like:
struct my_string_view ; // pre-c++17, legacy std :: string operator + ( std :: string , my_string_view ); struct char_buffer { /* implicit */ operator my_string_view () const ; // legacy /* implicit */ operator std :: string_view () const ; // modern }; char_buffer buf ; std :: string result = "hello" s + buf ;
[SD-8] does not seem to offer guidance here: is it OK for the Standard Library to break code that is "too generous" in its implicit conversions? In case, we are going to stay on the safe side, and document this possible breakage in Annex C.
5. Implementation experience
A working prototype of the changes proposed by this paper, done on top of GCC 12.1, is available in this GCC branch on GitHub. The entire libstdc++ testsuite passes with the changes applied. A smoke test is included.
Will Hawkins has very kindly contributed an implementation in libc++.
6. Technical Specifications
All the proposed changes are relative to [N4910].
6.1. Feature testing macro
In [version.syn], modify
#define __cpp_lib_string_view 201803L YYYYMML // also in <string>, <string_view>
with the value specified as usual (year and month of adoption of the present proposal).
6.2. Proposed wording
Modify [basic.string.general] as shown:
namespace std { template < class charT , class traits = char_traits < charT > , class Allocator = allocator < charT >> class basic_string { [...] // [string.ops], string operations [...] constexpr bool contains ( const charT * x ) const ; // [strings.op.plus.string_view], concatenation of strings and string views constexpr friend basic_string operator + ( const basic_string & lhs , basic_string_view < charT , traits > rhs ); constexpr friend basic_string operator + ( basic_string && lhs , basic_string_view < charT , traits > rhs ); constexpr friend basic_string operator + ( basic_string_view < charT , traits > lhs , const basic_string & rhs ); constexpr friend basic_string operator + ( basic_string_view < charT , traits > lhs , basic_string && rhs ); };
Add a new subclause after [string.ops] with the following content:
� Concatenation of strings and string views [strings.op.plus.string_view]
constexpr friend basic_string operator + ( const basic_string & lhs , basic_string_view < charT , traits > rhs ); � Effects: Equivalent to:
basic_string r = lhs ; r . append ( rhs ); return r ; � Remarks: This function is to be found via argument-dependent lookup only.
constexpr friend basic_string operator + ( basic_string && lhs , basic_string_view < charT , traits > rhs ); � Effects: Equivalent to:
lhs . append ( rhs ); return std :: move ( lhs ); � Remarks: This function is to be found via argument-dependent lookup only.
constexpr friend basic_string operator + ( basic_string_view < charT , traits > lhs , const basic_string & rhs ); � Effects: Equivalent to:
basic_string r = rhs ; r . insert ( 0 , lhs ); return r ; � Remarks: This function is to be found via argument-dependent lookup only.
constexpr friend basic_string operator + ( basic_string_view < charT , traits > lhs , basic_string && rhs ); � Effects: Equivalent to:
rhs . insert ( 0 , lhs ); return std :: move ( rhs ); � Remarks: This function is to be found via argument-dependent lookup only.
In [diff], add a new subclause, tentatively named [diff.cpp26.strings].
Note to the editor: such a subclause should be under [diff.cpp26], which by the time this proposal is adopted, may or may not exist yet. The naming and contents of the parent [diff.cpp26] subclause should match the existing ones (e.g. [diff.cpp20]), of course adapted to C++26.
C.�.� [strings]: strings library [diff.cpp26.strings]
(1) Affected subclause: [string.classes]Change: Additional overloads of
between
operator + specializations and types convertible to
basic_string specializations have been added.
basic_string_view Rationale: Make
consistent with the existing overloads.
operator + Effect on original feature: Valid C++23 code may fail to compile in this revision of C++. For instance:
struct my_string_view ; std :: string operator + ( std :: string , my_string_view ); struct char_buffer { operator my_string_view () const ; operator std :: string_view () const ; }; int main () { char_buffer buf ; std :: string result = std :: string ( "hello" ) + buf ; // ill-formed (ambiguous); previously well-formed }
7. Acknowledgements
Thanks to KDAB for supporting this work.
Thanks to Will Hawkins for the discussions and the prototype implementation in libc++.
All remaining errors are ours and ours only.