string_ref
: a non-owning reference to a stringReferences to strings are very common in C++ programs, but often the callee doesn't care about the exact type of the object that owns the data. 3 things generally happen in this case:
char*
and
a length (or just char*
and assumes
0-termination)—and reduces the readability and safety of
calls and loses any helper functions the original type
provided.Google and LLVM have independently implemented a string-reference type to
encapsulate this kind of argument. string_ref
is implicitly
constructible from const char*
and std::string
. It
provides most of the const
member operations from
std::string
to ease conversion. This paper follows Chromium
in extending string_ref
to basic_string_ref<charT, traits>
(Chromium omits traits
). We provide typedefs to parallel the 4
basic_string
typedefs.
Operations on string_ref
apply to the characters in the
string, and not the pointers that refer to the characters. This introduces
the possibility that the underlying characters might change while a
string_ref
referring to them is in an associative container,
which would break the container, but we believe this risk is worthwhile
because it matches existing practice and matches user intentions more
often.
Both Google's and LLVM's string_ref
types extend
the interface from std::string
to provide some
helpful utility functions:
Versions of std::string
operations that take
string_ref
instead also give the standard a way to provide
in-place operations on non-null-terminated byte/character sequences:
Google's StringPiece
provides as_string
and
ToString
methods to convert to std::string
. LLVM's
StringRef
provides both a str()
explicit
conversion and an implicit operator std::string()
. Since this
paper builds on top of C++11, we provide an explicit
conversion operator.
Google's and LLVM's string_ref
types provide a subset of
std::string
's searching operations, but they do provide
pos
arguments to specify where to start the search. Because string_ref::substr
is
much cheaper than string::substr
, this paper removes the
pos
argument entirely.
None of the existing classes have constexpr
methods.
What do we call this class?
The interface of string_ref
is similar to, but not exactly
the same as the interface of std::string
. In general, we want
to minimize differences between std::string
and
string_ref
so that users can go back and forth between the two
often. This section justifies the differences whose utility we think
overcomes that general rule.
We should consider adding these methods to std::string
in
C++17, but we can't modify std::string
in a TS, so this paper
doesn't propose such changes.
remove_prefix()
and
remove_suffix()
make
it easy to parse strings using string_ref
. They could both
be implemented as non-member functions (e.g. str.remove_prefix(n)
=== str = str.substr(n)
), but it seems useful to
provide the simplest mutators as member functions. Note that other
traversal primitives need to be non-members so that they're
extensible, which may argue for pulling these out too.starts_with(haystack, needle)
vs
starts_with(needle, haystack)
, while
haystack.starts_with(needle)
is the only English reading of
the member version.copy
: std::string::copy
is copy
out, not in. It's not well named. Users can always use
std::copy
instead.pos
and n
parameters to methods have been
removed from string_ref
. std::string
needs
these parameters because std::string::substring
is an
expensive (copying and sometimes allocating) operation. However, these
are always integral parameters, so the compiler can't check that their
order is correct, and readers often have a hard time. Because string_ref::substr
is cheap,
we insist users call it instead of passing its arguments to other
functions.namespace std {
namespace beta {
template<typename charT, typename traits> class basic_string_ref;
typedef basic_string_ref<char> string_ref;
typedef basic_string_ref<char16_t> u16string_ref;
typedef basic_string_ref<char32_t> u32string_ref;
typedef basic_string_ref<wchar_t> wstring_ref;
// numeric conversions
int stoi(const string_ref& str, size_t* idx=0, int base=10);
long stol(const string_ref& str, size_t* idx=0, int base=10);
unsigned long stoul(const string_ref& str, size_t* idx=0, int base=10);
long long stoll(const string_ref& str, size_t* idx=0, int base=10);
unsigned long long stoull(const string_ref& str, size_t* idx=0, int base=10);
float stof(const string_ref& str, size_t* idx=0);
double stod(const string_ref& str, size_t* idx=0);
long double stold(const string_ref& str, size_t* idx=0);
int stoi(const wstring_ref& str, size_t* idx=0, int base=10);
long stol(const wstring_ref& str, size_t* idx=0, int base=10);
unsigned long stoul(const wstring_ref& str, size_t* idx=0, int base=10);
long long stoll(const wstring_ref& str, size_t* idx=0, int base=10);
unsigned long long stoull(const wstring_ref& str, size_t* idx=0, int base=10);
float stof(const wstring_ref& str, size_t* idx=0);
double stod(const wstring_ref& str, size_t* idx=0);
long double stold(const wstring_ref& str, size_t* idx=0);
}
}
A string-like object that refers to a const
sized piece of
memory owned by another object.
We provide implicit constructors so users can pass in a const
char*
or a std::string
wherever a
string_ref
is expected.
It is expected that user-defined string-like types will define an
implicit conversion to string_ref
(or another appropriate
instance of basic_string_ref
)
to interoperate with functions that need to read strings.
Unlike std::string
s and string literals, data()
may return a pointer to a buffer that is not null-terminated. Therefore it
is typically a mistake to pass data()
to a routine that takes just a const charT*
and expects a
null-terminated string.
namespace std {
namespace beta {
template<typename charT, typename traits>
class basic_string_ref {
public:
// types
typedef charT value_type;
typedef const charT* pointer;
typedef const charT& reference;
typedef const charT& const_reference;
typedef implementation_defined const_iterator;
typedef const_iterator iterator;
typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
typedef const_reverse_iterator reverse_iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
static constexpr size_type npos = size_type(-1);
// construct/copy
constexpr basic_string_ref();
constexpr basic_string_ref(const basic_string_ref&);
basic_string_ref& operator=(const basic_string_ref&);
basic_string_ref(const charT* str);
template<typename Allocator>
basic_string_ref(const basic_string<charT, traits, Allocator>& str);
constexpr basic_string_ref(const charT* str, size_type len);
basic_string_ref(initializer_list<charT> il);
template<typename Allocator>
explicit operator basic_string<charT, traits, Allocator>() const;
// iterators
constexpr const_iterator begin() const;
constexpr const_iterator end() const;
constexpr const_iterator cbegin() const;
constexpr const_iterator cend() const;
const_reverse_iterator rbegin() const;
const_reverse_iterator rend() const;
const_reverse_iterator crbegin() const;
const_reverse_iterator crend() const;
// capacity
constexpr size_type size() const;
constexpr size_type length() const;
constexpr size_type max_size() const;
constexpr bool empty() const;
// element access
constexpr const charT& operator[](size_type pos) const;
const charT& at(size_t pos) const;
constexpr const charT& front() const;
constexpr const charT& back() const;
constexpr const charT* data() const;
// modifiers
void clear();
void remove_prefix(size_type n);
void remove_suffix(size_type n);
// basic_string_ref string operations
constexpr basic_string_ref substr(size_type pos, size_type n=npos) const;
int compare(basic_string_ref x) const;
bool starts_with(basic_string_ref x) const;
bool ends_with(basic_string_ref x) const;
size_type find(basic_string_ref s) const;
size_type find(charT c) const;
size_type rfind(basic_string_ref s) const;
size_type rfind(charT c) const;
size_type find_first_of(basic_string_ref s) const;
size_type find_first_of(charT c) const;
size_type find_last_of(basic_string_ref s) const;
size_type find_last_of(charT c) const;
size_type find_first_not_of(basic_string_ref s) const;
size_type find_first_not_of(charT c) const;
size_type find_last_not_of(basic_string_ref s) const;
size_type find_last_not_of(charT c) const;
};
// Comparison operators
template<typename charT, typename traits>
bool operator==(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
template<typename charT, typename traits>
bool operator!=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
template<typename charT, typename traits>
bool operator<(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
template<typename charT, typename traits>
bool operator>(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
template<typename charT, typename traits>
bool operator<=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
template<typename charT, typename traits>
bool operator>=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
// Inserter
template<class charT, class traits>
basic_ostream<charT, traits>&
operator<<(basic_ostream<charT, traits>& os,
const basic_string_ref<charT,traits>& str);
// Hashing
template<> struct hash<string_ref>;
template<> struct hash<u16string_ref>;
template<> struct hash<u32string_ref>;
template<> struct hash<wstring_ref>;
}
}
constexpr basic_string_ref();
empty() == true
constexpr basic_string_ref(const basic_string_ref& other);
data() == other.data() &&
size() == other.size()
basic_string_ref& operator=(const basic_string_ref& other);
data() == other.data() &&
size() == other.size()
basic_string_ref(const charT* str);
data() == str
, and
size()
is 0
if str == nullptr
or strlen(str)
otherwise.template<typename Allocator>
basic_string_ref(const basic_string<charT, traits, Allocator>& str);
data() == str.data() &&
size() ==
str.size()
constexpr basic_string_ref(const charT* str, size_type len);
data() == str && size()
== len
basic_string_ref(initializer_list<charT> il);
data() == il.begin() &&
size() == il.size()
template<typename Allocator>
explicit operator basic_string<charT, traits, Allocator>() const;
basic_string<charT,
traits, Allocator>(data(), size())
.[Note: The operator is explicit
to avoid accidental
O(N) operations on type mismatches. — end note]
typedef implementation_defined const_iterator;
A random-access, contiguous iterator type.
For a string_ref str
, any operation that
invalidates a pointer in the range
[str.data()
,
str.data+str.size()
) invalidates
pointers and iterators returned from str
's
methods.
typedef const_iterator iterator;
Because string_ref
refers to a constant sequence,
iterator
and const_iterator
are the same
type.
constexpr const_iterator begin() const;
constexpr const_iterator cbegin() const;
constexpr const_iterator end() const;
constexpr const_iterator cend() const;
const_reverse_iterator rbegin() const;
const_reverse_iterator crbegin() const;
reverse_iterator(end())
.const_reverse_iterator rend() const;
const_reverse_iterator crend() const;
reverse_iterator(begin())
.constexpr size_type size() const;
constexpr size_type length() const;
std::distance(begin(), end())
, a count
of the number of char-like objects currently in the string.constexpr size_type max_size() const;
constexpr bool empty() const;
size() == 0
constexpr const charT& operator[](size_type pos) const;
pos < size()
*(begin() + pos)
constexpr const charT& at(size_type pos) const;
out_of_range
if pos >=
size()
.operator[](pos)
constexpr const charT& front() const;
!empty()
operator[](0)
constexpr const charT& back() const;
!empty()
operator[](size() - 1)
constexpr const charT* data() const;
p
such that
p + i == &operator[](i)
for each i
in
[0,size())
. The range [data()
,data() + size()
) must be
valid even for empty basic_string_ref
s..void clear();
*this = basic_string_ref()
void remove_prefix(size_type n);
*this = substr(n, npos);
void remove_suffix(size_type n);
*this = substr(0, size() - n)
Unlike std::string, string_ref provides no whole-string methods with position
or length
parameters. Instead, users should use the substr()
method to create the character sequence they're actually interested in, and use that.
constexpr basic_string_ref substr(size_type pos, size_type len=npos) const;
out_of_range
if pos > size()
.rlen
of the string to copy as the smaller of
len
and size() - pos
.basic_string_ref(data()+pos, rlen)
.int compare(basic_string_ref other) const;
rlen
of the strings to compare as the
smallest of size()
and other.size()
. The
function then compares the two strings by calling
traits::compare(data(), other.data(),
rlen)
.Condition | Return Value |
---|---|
size() < str.size() | < 0 |
size() == str.size() | 0 |
size() > str.size() | > 0 |
bool starts_with(basic_string_ref x) const;
true
if x
is a prefix of *this
.bool ends_with(basic_string_ref x) const;
true
if x
is a suffix of *this
.All member functions in this section that take a charT
argument c
return the result of a call to the
same-named member function with an argument of
basic_string_ref(&c, 1)
.
size_type find(basic_string_ref str) const;
size_type find(charT c) const;
xpos
, if possible, such that both of the
following conditions obtain:
xpos + str.size() <= size()
traits::eq(at(xpos+I), str.at(I))
for all elements I
of the string referenced by
str
.npos
.traits::eq()
.size_type rfind(basic_string_ref str) const;
size_type rfind(charT c) const;
xpos
, if possible, such that both of the
following conditions obtain:
xpos + str.size() <= size()
traits::eq(at(xpos+I), str.at(I))
for all elements I
of the string referenced by
str
.xpos
if the function
can determine such a value for
xpos
. Otherwise, returns
npos
.traits::eq()
.size_type find_first_of(basic_string_ref str) const;
size_type find_first_of(charT c) const;
xpos
, if possible, such that both of the
following conditions obtain:
xpos <= size()
traits::eq(at(xpos+I), str.at(I))
for some element I
of the string referenced by
str
.xpos
if the function
can determine such a value for
xpos
. Otherwise, returns
npos
.traits::eq()
.size_type find_last_of(basic_string_ref str) const;
size_type find_last_of(charT c) const;
xpos
, if possible, such that both of the
following conditions obtain:
xpos <= size()
traits::eq(at(xpos+I), str.at(I))
for some element I
of the string referenced by
str
.xpos
if the function
can determine such a value for
xpos
. Otherwise, returns
npos
.traits::eq()
.size_type find_first_not_of(basic_string_ref str) const;
size_type find_first_not_of(charT c) const;
xpos
, if possible, such that both of the
following conditions obtain:
xpos <= size()
traits::eq(at(xpos+I), str.at(I))
for no element I
of the string referenced by
str
.xpos
if the function
can determine such a value for
xpos
. Otherwise, returns
npos
.traits::eq()
.size_type find_last_not_of(basic_string_ref str) const;
size_type find_last_not_of(charT c) const;
xpos
, if possible, such that both of the
following conditions obtain:
xpos <= size()
traits::eq(at(xpos+I), str.at(I))
for no element I
of the string referenced by
str
.xpos
if the function
can determine such a value for
xpos
. Otherwise, returns
npos
.traits::eq()
.template<typename charT, typename traits>
bool operator==(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
x.compare(y) == 0
template<typename charT, typename traits>
bool operator!=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
x.compare(y) != 0
template<typename charT, typename traits>
bool operator<(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
x.compare(y) < 0
template<typename charT, typename traits>
bool operator>(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
x.compare(y) > 0
template<typename charT, typename traits>
bool operator<=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
x.compare(y) <= 0
template<typename charT, typename traits>
bool operator>=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
x.compare(y) >= 0
template<class charT, class traits>
basic_ostream<charT, traits>&
operator<<(basic_ostream<charT, traits>& os,
const basic_string_ref<charT,traits>& str);
sentry
object, if
this object returns true
when converted to a value of
type bool
, determines padding as described in
22.4.2.2.2, then inserts the resulting sequence of characters
seq
as if by calling os.rdbuf()->sputn(seq,
n)
, where n
is the larger
of os.width()
and str.size()
; then calls
os.width(0)
.template<> struct hash<string_ref>;
template<> struct hash<u16string_ref>;
template<> struct hash<u32string_ref>;
template<> struct hash<wstring_ref>;
[Note: The functions below mirror C++11's
stox()
functions on strings. Because
string_ref
can cheaply change to refer to a smaller
substring, we should also add a more convenient interface that advances
the string_ref
as it parses an integer out, with the rough
signature stox_consume(string_ref&, int
base=10)
. However, failure is expected in parsing functions, so
it's not appropriate to raise an exception when no integer is available to
be parsed. Existing practice is to use a signature like bool
stox_consume(string_ref&, T& result, int
base=10)
, but this requires a temporary integral variable to store
the result in. If we get an optional<T>
type in a TS, a
better signature along the lines of optional<T>
stox_consume(string_ref&, int base=10)
becomes
available. To avoid precluding the optional
signature, I
haven't included any of the consume
variants in this
paper. — end note]
int stoi(const string_ref& str, size_t* idx=0, int base=10);
long stol(const string_ref& str, size_t* idx=0, int base=10);
unsigned long stoul(const string_ref& str, size_t* idx=0, int base=10);
long long stoll(const string_ref& str, size_t* idx=0, int base=10);
unsigned long long stoull(const string_ref& str, size_t* idx=0, int base=10);
stox(string(str), idx,
base)
where x
is the type suffix of
the function called.float stof(const string_ref& str, size_t* idx=0);
double stod(const string_ref& str, size_t* idx=0);
long double stold(const string_ref& str, size_t* idx=0);
stox(string(str), idx)
where x
is the type suffix of the function
called.int stoi(const wstring_ref& str, size_t* idx=0, int base=10);
long stol(const wstring_ref& str, size_t* idx=0, int base=10);
unsigned long stoul(const wstring_ref& str, size_t* idx=0, int base=10);
long long stoll(const wstring_ref& str, size_t* idx=0, int base=10);
unsigned long long stoull(const wstring_ref& str, size_t* idx=0, int base=10);
stox(wstring(str), idx,
base)
where x
is the type suffix of
the function called.float stof(const wstring_ref& str, size_t* idx=0);
double stod(const wstring_ref& str, size_t* idx=0);
long double stold(const wstring_ref& str, size_t* idx=0);
stox(wstring(str), idx)
where x
is the type suffix of the function
called.std::sub_match
([re.submatch]) should probably inherit from std::string_ref
, but the fact that it takes any bidirectional iterator makes that tricky.starts_with
and ends_with
methods to basic_string
.