Document number: |
P2438R1 |
Date: |
2021-11-30 |
Project: |
Programming Language C++, Library Evolution Working Group |
Reply-to: |
1. Before/After table
without this proposal |
with this proposal |
|
|
Value of |
Value of |
|
|
|
|
A temporary |
A temporary |
|
Value of |
As |
2. Revision history
2.1. Revision 1
-
Corrected target audience (LEWG instead of LWG).
-
Included implementation experience section.
-
Discussion on change to existing
const
overload in light of P1787R6: Declarations and where to find them. -
Expanded section of
substr
with user-suplied allocator. -
Rebased wording on N4902.
3. Motivation
Since C++11 the C++ language supports move semantic. All classes where it made sense where updated with move constructors and move assignment operators. This made it possible to take advantage of rvalues and "steal" resources, thus avoiding, for example, unnecessary costly copies.
Some classes that came in later revisions of the language also take advantage of move semantic for member functions, like std::optional::value
and std::optional::value_or
.
In the case of std::string::substr()
, it is possible to take advantage of move semantic to.
Consider following two code snippets:
// example 1
benchmark = std::string(argv[i]).substr(12);
// example 2
name_ = obs.stringValue().substr(0,32);
In the first example, argv[1]
is copied in a temporary string, then substr
creates a new object.
In this case one could use string_view
to avoid the unnecessary copy, but changing already working code has a cost too.
In the second example, if stringValue()
returns an std::string
by value, the user of that API cannot use a string_view
to avoid an unnecessary copy, like in the first case.
If std::string
would have an overload for substr() &&
, in both cases the standard library could avoid unnecessary work, and instead of copying the data "steal" it.
It is true that adding a new overload increases the already extremely high number of member functions of std::string
.
On the other hand most users do not need to know it’s existence to take advantage of the provided optimization.
Thus this paper is not extending API surface, there is no names or behavior to be learned by user, and we just get extension that follows established language convection.
For users aware of the overload, they can move a string in order to "steal" it’s storage in a natural way:
std::string foo = ...;
std::string bar = std::move(foo).substr(...);
3.1. Couldn’t a library vendor provide such overload as QOI?
No, because it is a breaking change. Fur such library, following code would misbehave
std::string foo = ...;
std::string bar = std::move(foo).substr(...);
[res.on.arguments]
says that a programmer can’t expect an object referred to by an rvalue reference to remain untouched.
But there is currently no rvalue reference in substr()
.
This paper is proposing to add it.
4. Design Decisions
This is purely a library extension.
Currently substr
is defined as
constexpr basic_string substr(size_type pos = 0, size_type n = npos) const;
This paper proposes to define following overloads
constexpr basic_string substr(size_type pos = 0, size_type n = npos) const &;
constexpr basic_string substr(size_type pos = 0, size_type n = npos) &&;
Other overloads (constexpr basic_string substr(size_type pos = 0, size_type n = npos) const &&;
and constexpr basic_string substr(size_type pos = 0, size_type n = npos) &;
) are not necessary.
Notice that the current proposal is a breaking change, as following snippet of code might work differently if this paper gets accepted:
std::string foo = ...;
std::string bar = std::move(foo).substr(...);
Until C++20, foo
wont change it’s value, after this paper, the state of foo
would be in a "valid but unspecified state".
While a breaking change is generally bad:
-
I do not think there exists code like
std::move(foo).substr(…)
in the wild -
Even if such code exists, the intention of the author was very probably to tell the compiler that he is not interested in the value of
foo
anymore, as it is normally the case when usingstd::move
on a variable. In other words, with this proposal the user is getting what he asked for.
The standard library proposes two way for creating a "substring" instance, either by calling "substr" method or via constructor that accepts (str, pos, len). We see both of them as different spelling of same functionality, and believe they behavior should remaining consistent. Thus we propose to add rvalue overload constructors.
constexpr basic_string( basic_string&& other, size_type pos, const Allocator& alloc = Allocator() );
constexpr basic_string( basic_string&& other, size_type pos, size_type count, const Allocator& alloc = Allocator() );
4.1. Note on the propagation of the allocator
basic_string
is one of the allocator-container, which means that any memory resource used by this class need to be acquired and released to from the associated allocator instance.
This imposes some limitation on the behavior of the proposed overload.
For example in:
std::pmr::string s1 = ....;
std::pmr::string s2 = std::move(s1).substr();
For s2
to be able to steal memory from s1
, we need to be sure that the allocators used by both objects are equal (s1.get_allocator() == s2.get_allocator()
).
This is trivially achievable for the case of the for the allocators that are always equal (std::allocator_traits<A>::is_always_equal::value
is true), including most common case of the stateless std::allocator
and implementation can unconditionally steal any allocated memory in such situation.
Moreover, the proposed overload can still provide some optimization in case of the stateful allocators, where s2.get_allocator()
(which is required to be default constructed) happens to be the same as allocator of the source s1
.
In any remaining cases, behavior of this overload should follow existing const version, and as such it does not add any overhead.
This paper, recommends implementation to avoid additional memory allocation when possible (note if no-allocation would be performed, there is nothing to avoid), however it does not require so. This leave it free for implementation to decide, if the optimization should be guarded by:
-
compile time check of
std::allocator_traits<A>::is_always_equal
-
runtime comparison of allocators instance (addition comparison cost).
4.2. Overload with user supplied-allocator:
While writing the paper, we have noticed that specification of the substr()
requires returned object to use default constructed allocator.
This means that invocation of this function is ill-formed for the basic_string
instance with non-default constructing allocator, for example for invited memory_pool_allocator<char>
that can be only constructed from reference to the pool, the following are ill-formed:
memory_pool pool = ...;
using pool_string = std::basic_string<char, std::char_traits<char>, memory_pool_allocator<char>>;
pool_string s1(20, 'a', memory_pool_allocator<char>(pool));
auto s2 = s1.substr(2, 10);
This could be address by adding Allocator parameters to substr()
overload that accepts allocator to be used as parameter:
constexpr basic_string substr(size_type pos, const Allocator& alloc) const;
constexpr basic_string substr(size_type pos, size_type n, const Allocator& alloc) const;
Desired effect may be already achieved via "substring" constructor, that is also extended in this paper:
auto s2 = pool_string(s1, 2, 10, memory_pool_allocator<char>(pool));
While the authors aggree that using substr
may provide a more convinient interface,
we belive that introduction of allocator accepting substr
oveloads should be handled as separate paper.
4.3. Are they any other function of std::string
that would benefit from a &&
overload
The member function append
and operator+=
take std::string
as const-ref parameter
constexpr basic_string& operator+=( const basic_string& str );
constexpr basic_string& append(const basic_string& str);
constexpr basic_string& append(const basic_string& str, size_type pos, size_type n = npos);
But in this case, because of the interaction of two string instances, the benefits from stealing the resource of str
are less clear.
Supposing both string instances use the same allocator, an implementation should compare the capacity of str
and this
, and evaluate if moving str.size()
elements is less costly than copying them.
This would make the implementation of append
less obvious, and the performance implications are difficult to predict.
For those reasons, the authors does not propose to add new overloads for append
and operator+
.
The authors are not aware of other functions that could benefit from a &&
overload.
4.4. Modifying existing const
overload
One of the effects of the P1787R6: Declarations and where to find them omnibus paper, is the relaxation of the rules for overloading of the member function
based on the cv
and ref
qualifiers. To the best of the authors' knowledge, current wording allows the following declarations to coexist in the basic_string
class:
constexpr basic_string substr(size_type pos = 0, size_type n = npos) const;
constexpr basic_string substr(size_type pos = 0, size_type n = npos) &&;
However, this is not reflected in the current behavior of major compilers, thus it is impossible to get implementation experience for such change, nor validate that the overload resolution works as desired. As consequence, we propose to change the existing overload.
constexpr basic_string substr(size_type pos = 0, size_type n = npos) const&;
constexpr basic_string substr(size_type pos = 0, size_type n = npos) &&;
Note, that standard-library implementation that ships with a compiler that supports this relaxation of the overloading for the member functions,
has the freedom to preserve const
instead of const&
per [namespace.std p6] in case if the behavior of this overload
is indeed the same.
In contrast preserving const
overload, will bake any unintended (but unlikely) difference in the behavior.
4.5. Concerns on ABI stability
Changing basic_string substr(std::size_t pos, std::size_t len) const;
into basic_string substr(std::size_t pos, std::size_t len) const&;
and basic_string substr(std::size_t pos, std::size_t len) &&;
can affect the mangling of the name, thus causing ABI break.
For a library it is possible to continue to define the old symbol, so that already existing code will continue to links and work without errors.
For example, it is possible to use asm to define the old mangled name as an alias for the new const&
symbol.
This is not a novel technique, as it has been explained by the ARG (ABI Review group), and similar breaks have already taken place for other papers, like P0408.
5. Implementation Experience
The changes proposed in the paper were implemented by the authors in the libcxx and passed are test in the test suite. The implementation of the rvalue-constructor is moving the buffer if the:
-
selected substring is to long to use SSO
-
allocators are equal (checked at runtime)
This reflects the behavior of the rvalue with allocator constructor for this implementation.
The implementation experience does not cover introduction of additional alias nor preservation of const
overload, required to preserve ABI compatibility.
6. Technical Specifications
Suggested wording (against N4902):
Apply following modifications to definition of basic_string class template in [basic.string.general] General
.
constexpr basic_string(const basic_string& str, size_type pos, const Allocator& a = Allocator()); constexpr basic_string(const basic_string& str, size_type pos, size_type n, const Allocator& a = Allocator()); constexpr basic_string(basic_string&& str, size_type pos, const Allocator& alloc = Allocator()); constexpr basic_string(basic_string&& str, size_type pos, size_type n, const Allocator& alloc = Allocator());
and
constexpr basic_string substr(size_type pos = 0, size_type n = npos) const &; constexpr basic_string substr(size_type pos = 0, size_type n = npos) &&;
Replace the definition of the corresponding constructor [string.cons] Constructors and assignment operators
Wording note:
We no longer define this constructors in terms of being equivalent to corresponding construction from basic_string_view
, as that would prevent reuse of the memory, that we want to allow.
The use of "prior the call", are not necessary for const&
, but allow us to merge the wording.
constexpr basic_string(const basic_string& str, size_type pos, const Allocator& a = Allocator()); constexpr basic_string(const basic_string& str, size_type pos, size_type n, const Allocator& a = Allocator()); constexpr basic_string(basic_string&& str, size_type pos, const Allocator& alloc = Allocator()); constexpr basic_string(basic_string&& str, size_type pos, size_type n, const Allocator& alloc = Allocator());
Effects: Let n
be npos
for the first overload. Equivalent to: basic_string(basic_string_view<charT, traits>(str).substr(pos, n), a)
.
Let:
-
s
be the value ofstr
prior this call, -
rlen
be smaller ofn
ands.size() - pos
, for overloads that define parametern
, ands.size() - pos
otherwise.
Effects: Constructs an object whose initial value is the range [s.data() + pos, rlen)
Throws: out_of_range
if pos > s.size()
Remarks: The str
is in valid but unspecified state, after invocation of either third or fourth overload.
Recommended practice: For third and fourth overload implementations should avoid unnecessary allocations, if s.get_allocator() == a
is true
.
Apply following changes to [string.substr] basic_string::substr
.
constexpr basic_string substr(size_type pos = 0, size_type n = npos) const &;
Effects: Determines the effective length rlen
of the string to copy as the smaller of n and size() - pos
.
Returns: basic_string(data()+pos, rlen)
.
Throws: out_of_range
if pos > size()
.
Effects: Equivalent to: return basic_string(*this, pos, n);
constexpr basic_string substr(size_type pos = 0, size_type n = npos) &&;
Effects: Equivalent to: return basic_string(std::move(*this), pos, n);
.
7. Acknowledgements
A big thank you to all those giving feedback for this paper.