References to arrays and strings are very common in C++ programs, but often the callee doesn't care about the exact type of the object that owns the data. 3 things generally happen in this case:
vector
, say—and insists that callers copy the data if it was originally in another format.Google and LLVM have independently implemented two types to encapsulate this kind of argument. array_ref<T>
is implicitly constructible from several standard array-like types and stores a contiguous range of addresses. It provides the subset of operations from std::array<>
and std::vector<>
that work on immutable sequences. Because array_ref
stores pointers rather than a reference to the original array, it also supports a few operations to modify the current view.
string_ref
is implicitly constructible from const char*
and std::string
. It provides most of the member operations from std::string
to ease conversion. This paper follows Chromium in extending string_ref
to basic_string_ref<charT, traits>
(Chromium omits traits
). We provide typedefs to parallel the 4 basic_string
typedefs.
Both Google's and LLVM's string_ref
types extend the interface from std::string
to provide some helpful utility functions:
Versions of std::string
operations that take string_ref
instead also give the standard a way to provide in-place operations on non-null-terminated byte/character sequences:
Note that some of the code samples in this paper don't use full C++11 conventions because they're extracted from code that actually compiles, and current compilers don't yet support the whole C++11 language.
Google's StringPiece
provides as_string
and ToString
methods to convert to std::string
. LLVM's StringRef
provides both a str()
explicit conversion and an implicit operator std::string()
, and similarly for ArrayRef
to std::vector
. Since this paper builds on top of C++11, we provide explicit
conversion operators in addition to two named conversions vec()
and str()
.
Google's and LLVM's string_ref
types provide a subset of std::string's
searching operations, but they do provide 'pos
' arguments to specify where to start the search. Because string_ref::substr
is much cheaper than string::substr
, this paper removes the pos
argument entirely.
None of the existing classes have constexpr
methods.
What do we call these?
std::string_ref:
An array_ref<T>
represents an immutable array of size()
elements of type T. The storage for the array is *not* owned by the array_ref
object, and clients must arrange for the backing store to remain live while the array_ref
object is in use.
Implicit conversion operations are provided from types with contiguous iterators like std::vector
, std::string
, std::array
, and primitive arrays. array_ref
objects are invalidated by any operation that invalidates their underlying pointers.
One common use for array_ref
is when passing arguments to a routine where you want to be able to accept a variety of array types. The usual approach here is to have the client explicitly pass in a pointer and a length, as in:
void MyOldRoutine(const int* elems, int N) {
for (int i = 0; i < N; i++) { assert(elems[i] == N - i); }
}
Unfortunately, this leads to ugly and error-prone code at the call site:
std::vector<int> my_vector;
for (size_t i = 5; i > 0; --i) { my_vector.push_back(i); }
MyOldRoutine(my_vector.data(), my_vector.size());
std::array<int, 4> my_std_array = {{4, 3, 2, 1}};
MyOldRoutine(my_std_array.data(), my_std_array.size());
int my_array[10] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
MyOldRoutine(my_array, 10);
int* dyn_array = new int[3];
for (size_t i = 0; i < 3; ++i) { dyn_array[i] = 3 - i; }
MyOldRoutine(dyn_array, 3);
Instead, you can use an array_ref
as the argument to the routine:
void MyNewRoutine(std::array_ref<int> a) {
for (size_t i = 0; i < a.size(); i++) { assert(a[i] == int(a.size() - i)); }
}
This makes the call sites cleaner, for the most part:
std::vector<int> my_vector;
for (int i = 5; i > 0; --i) { my_vector.push_back(i); }
MyNewRoutine(my_vector);
std::array<int, 4> my_std_array = {{4, 3, 2, 1}};
MyNewRoutine(my_std_array);
int my_array[10] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
MyNewRoutine(my_array);
int* dyn_array = new int[3];
for (size_t i = 0; i < 3; ++i) { dyn_array[i] = 3 - i; }
MyNewRoutine(std::array_ref<int>(dyn_array, 3));
namespace std {
template<typename T>
class array_ref {
// types
typedef T value_type;
typedef const T * pointer;
typedef const T & reference;
typedef const T & const_reference;
typedef implementation_defined const_iterator;
typedef const_iterator iterator;
typedef std::reverse_iterator< const_iterator > const_reverse_iterator;
typedef const_reverse_iterator reverse_iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
// construct/copy
constexpr array_ref();
constexpr array_ref(const array_ref &);
array_ref & operator=(const array_ref &);
constexpr array_ref(const T * array, size_t length);
array_ref(const vector< T > & v);
template<size_t N>
constexpr array_ref(const T(&a)[N]);
template<size_t N>
constexpr array_ref(const std::array< T, N > & a);
constexpr array_ref substr(size_type pos, size_type n=size_type(-1)) const;
// iterators
constexpr const_iterator begin() const;
constexpr const_iterator end() const;
constexpr const_iterator cbegin() const;
constexpr const_iterator cend() const;
const_reverse_iterator rbegin() const;
const_reverse_iterator rend() const;
const_reverse_iterator crbegin() const;
const_reverse_iterator crend() const;
// capacity
constexpr size_type size() const;
constexpr size_type max_size() const;
constexpr bool empty() const;
// element access
constexpr const T & operator[](size_t i) const;
constexpr const T & at(size_t i) const;
constexpr const T & front() const;
constexpr const T & back() const;
constexpr const T * data() const;
// Outgoing conversion operators
explicit operator vector< T >() const;
vector< T > vec() const;
template<typename traits, typename Allocator>
explicit operator basic_string< T, traits, Allocator >() const;
basic_string< T > str() const;
// mutators
void clear();
void remove_prefix(size_type n);
void remove_suffix(size_type n);
void pop_back();
void pop_front();
};
// deducing constructor wrappers
template<typename T>
constexpr array_ref< T > make_array_ref(const T * array, size_t length);
template<typename T, size_t N>
constexpr array_ref< T > make_array_ref(const T(&a)[N]);
template<typename T>
array_ref< T > make_array_ref(const vector< T > & v);
}
These functions do the same thing as the constructor with the same signature. They just allow users to avoid writing the iterator type.
typedef const T * pointer;
typedef implementation_defined const_iterator;
random-access, contiguous iterator type
typedef const_iterator iterator;
Because array_ref controls a constant sequence, iterator and const_iterator are the same type.
constexpr array_ref();
empty() == true
array_ref(const vector< T > & v);
template<size_t N>
constexpr array_ref(const std::array< T, N > & a);
constexpr array_ref substr(size_type pos, size_type n=size_type(-1)) const;
constexpr const T * data() const;
These functions provide explicit conversions to selected other contiguous sequence types using those types' iterator-range constructors. We provide both explicit conversion operators for use in variable initialization and short member functions for use in function calls.
The operators are explicit
to avoid accidental O(N) operations on type mismatches.
explicit operator vector< T >() const;
template<typename traits, typename Allocator>
explicit operator basic_string< T, traits, Allocator >() const;
void clear();
Resets *this to its default-constructed state.
void remove_prefix(size_type n);
Advances the start pointer of this array_ref past n
elements without moving the end pointer.
void remove_suffix(size_type n);
Moves the end pointer of this array_ref earlier by n
elements without moving the start pointer.
void pop_back();
remove_suffix(1)
void pop_front();
remove_prefix(1)
A string-like object that points to a sized piece of memory, similar to an array_ref<charT>
.
We provide implicit constructors so users can pass in a const char*
or a std::string
wherever a string_ref
is expected.
It is expected that user-defined string-like types will define an implicit conversion to string_ref
(or another appropriate instance of basic_string_ref
) to interoperate with functions that need to read strings.
Unlike std::strings
and string literals, data()
may return a pointer to a buffer that is not null-terminated. Therefore it is typically a mistake to pass data()
to a routine that takes just a const charT*
and expects a null-terminated string.
std::hash
specializations shall be provided for string_ref
, wstring_ref
, u16string_ref
, and u32string_ref
.
Methods on basic_string_ref<charT>
with the same name as methods on array_ref<charT>
are intended to have the same semantics unless the difference is noted specifically.
namespace std {
template<typename charT, typename traits>
class basic_string_ref {
// types
typedef charT value_type;
typedef const charT * pointer;
typedef const charT & reference;
typedef const charT & const_reference;
typedef implementation_defined const_iterator;
typedef const_iterator iterator;
typedef std::reverse_iterator< const_iterator > const_reverse_iterator;
typedef const_reverse_iterator reverse_iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
static constexpr size_type npos = size_type(-1);
// construct/copy
constexpr basic_string_ref();
constexpr basic_string_ref(const basic_string_ref &);
basic_string_ref & operator=(const basic_string_ref &);
basic_string_ref(const charT * str);
template<typename Allocator>
basic_string_ref(const basic_string< charT, traits, Allocator > & str);
constexpr basic_string_ref(const charT * str, size_type len);
// iterators
constexpr const_iterator begin() const;
constexpr const_iterator end() const;
constexpr const_iterator cbegin() const;
constexpr const_iterator cend() const;
const_reverse_iterator rbegin() const;
const_reverse_iterator rend() const;
const_reverse_iterator crbegin() const;
const_reverse_iterator crend() const;
// capacity
constexpr size_type size() const;
constexpr size_type max_size() const;
constexpr bool empty() const;
constexpr size_type length() const;
// element access
constexpr const charT & operator[](size_t i) const;
const charT & at(size_t i) const;
constexpr const charT & front() const;
constexpr const charT & back() const;
constexpr const charT * data() const;
// Outgoing conversion operators
constexpr operator array_ref< const charT >() const;
template<typename Allocator>
explicit operator basic_string< charT, traits, Allocator >() const;
basic_string< charT, traits > str() const;
// mutators
void clear();
void remove_prefix(size_type n);
void remove_suffix(size_type n);
void pop_back();
void pop_front();
// string operations with the same semantics as std::basic_string
int compare(basic_string_ref x) const;
constexpr basic_string_ref substr(size_type pos, size_type n=npos) const;
size_type copy(charT * buf) const;
size_type find(basic_string_ref s) const;
size_type find(charT c) const;
size_type rfind(basic_string_ref s) const;
size_type rfind(charT c) const;
size_type find_first_of(basic_string_ref s) const;
size_type find_first_of(charT c) const;
size_type find_first_not_of(basic_string_ref s) const;
size_type find_first_not_of(charT c) const;
size_type find_last_of(basic_string_ref s) const;
size_type find_last_of(charT c) const;
size_type find_last_not_of(basic_string_ref s) const;
size_type find_last_not_of(charT c) const;
// new string operations
bool starts_with(basic_string_ref x) const;
bool ends_with(basic_string_ref x) const;
};
// Common specializations:
typedef basic_string_ref< char > string_ref;
typedef basic_string_ref< char16_t > u16string_ref;
typedef basic_string_ref< char32_t > u32string_ref;
typedef basic_string_ref< wchar_t > wstring_ref;
// Comparison operators
template<typename charT, typename traits>
bool operator==(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
template<typename charT, typename traits>
bool operator!=(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
template<typename charT, typename traits>
bool operator<(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
template<typename charT, typename traits>
bool operator>(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
template<typename charT, typename traits>
bool operator<=(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
template<typename charT, typename traits>
bool operator>=(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
// numeric conversions
int stoi(const string_ref & str, size_t * idx=0, int base=10);
long stol(const string_ref & str, size_t * idx=0, int base=10);
unsigned long stoul(const string_ref & str, size_t * idx=0, int base=10);
long long stoll(const string_ref & str, size_t * idx=0, int base=10);
unsigned long long stoull(const string_ref & str, size_t * idx=0, int base=10);
float stof(const string_ref & str, size_t * idx=0);
double stod(const string_ref & str, size_t * idx=0);
long double stold(const string_ref & str, size_t * idx=0);
int stoi(const wstring_ref & str, size_t * idx=0, int base=10);
long stol(const wstring_ref & str, size_t * idx=0, int base=10);
unsigned long stoul(const wstring_ref & str, size_t * idx=0, int base=10);
long long stoll(const wstring_ref & str, size_t * idx=0, int base=10);
unsigned long long stoull(const wstring_ref & str, size_t * idx=0, int base=10);
float stof(const wstring_ref & str, size_t * idx=0);
double stod(const wstring_ref & str, size_t * idx=0);
long double stold(const wstring_ref & str, size_t * idx=0);
}
template<typename charT, typename traits>
bool operator==(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
Note that equality can be implemented faster than comparison.
Each function stox
in this section behaves as if it calls
stox(str.str(), idx, base)
Ideally with less copying.typedef implementation_defined const_iterator;
random-access, contiguous iterator type
typedef const_iterator iterator;
Because string_ref
controls a constant sequence, iterator
and const_iterator
are the same type.
std::string's
substring constructor is omitted because string_ref(str).substr(pos, n)
is clearer
constexpr basic_string_ref();
empty() == true
basic_string_ref(const charT * str);
data() == str
, and size()
is 0
if str == nullptr
or strlen(str)
otherwise.
template<typename Allocator>
basic_string_ref(const basic_string< charT, traits, Allocator > & str);
data() == str.data() && size() == str.size()
constexpr basic_string_ref(const charT * str, size_type len);
data() == str && size() == len
constexpr size_type length() const;
Alias for size()
for string compatibility.
constexpr const charT * data() const;
A pointer such that [data()
,data() + size()
) is a valid range. For a non-empty basic_string_ref, data() == &front()
.
constexpr operator array_ref< const charT >() const;
An array_ref
for this basic_string_ref's
sequence of characters.
template<typename Allocator>
explicit operator basic_string< charT, traits, Allocator >() const;
A new string
containing the sequence referenced by *this
.
explicit
to avoid accidental O(N) operations on type mismatches.Unlike array_ref
, this operator preserves the traits
type.
basic_string< charT, traits > str() const;
basic_string<charT, traits>(*this);
array_ref
, this function preserves the traits
type. void clear();
Resets *this to its default-constructed state.
void remove_prefix(size_type n);
Advances the start pointer of this basic_string_ref past n
elements without moving the end pointer.
void remove_suffix(size_type n);
Moves the end pointer of this basic_string_ref earlier by n
elements without moving the start pointer.
void pop_back();
remove_suffix(1)
void pop_front();
remove_prefix(1)
Unlike std::string, string_ref provides no whole-string methods with position
or length
parameters. Instead, use the substr()
method to create the character sequence you're actually interested in, and use that.
constexpr basic_string_ref substr(size_type pos, size_type n=npos) const;
size_type copy(charT * buf) const;
Copy is more likely than the other methods to be used with an n
parameter since it can be the valid length of the character buffer. Still this->copy(buf, n)
can be replaced with this->substr(0, n).copy(buf)
, so I've removed the n
from here too.
bool starts_with(basic_string_ref x) const;
true
if x
is a prefix of *this
.
bool ends_with(basic_string_ref x) const;
true
if x
is a suffix of *this
.
Certain other pieces of the library should change to make life easier for array_ref<>
and string_ref
.
char_traits
methods could be constexpr
.constexpr
and non-constexpr
versions of a function. Specifically, sometimes the fastest-executing implementation isn't in a form that constexpr
accepts, but there's still a possible constexpr
implementation.It might make sense for std::array_ref<T>
to inherit from std::range<T*>
when/if that gets standardized. On the other hand:
range<T*>
will have an implicit conversion operator from anything with begin()
and end()
methods, so it'll already implicitly convert from array_ref<T>
. Will the inheritance still help with code like the following? template<typename T> void foo(std::range<T>)
range<T*>
may obviate the need for array_ref<T>
. One difficulty with using range<T*>
instead of array_ref<T>
is that range<Iterator>
will probably be implicitly constructible from anything with a begin/end, but non-template functions may want to take range<T*>
instead. One way to deal with this would be for range<T*>
to be specialized, adding constructors from arbitrary contiguous iterators, but that could introduce ambiguities with functions taking range<TplArg>
. std::sub_match
([re.submatch]) should probably inherit from std::string_ref
, but the fact that it takes any bidirectional iterator makes that tricky.
Consider replacing these conversion operators with new constructor overloads on the target types.
Consider adding starts_with
and ends_with
methods to basic_string
.
stox_consume(str, base)
as size_t idx;
auto result = stox(str.str(), &idx, base);
str.remove_prefix(idx);
return result;
array_ref
classes make the view const. It may be useful to extend that to allow modifications of the referenced array elements, and use array_ref<const T>
for immutable views. basic_string_ref::substr
for interface questions. We want something like this on array_ref
, but probably not with this name. string_ref
inherit from array_ref
? It seems like a straightforward extension, but there's a risk in that some of basic_string_ref<charT, traits>
's behavior depends on that traits
parameter, which isn't propagated to the array_ref<charT>
base class. If traits::eq
isn't simply operator==(charT)
, this would cause the behavior to depend on whether a basic_string_ref
has been upcasted to array_ref
. For the 4 specializations of char_traits<>
defined by the standard, this is guaranteed, but it's not necessarily true for user-defined types. substr()
takes a start and a length, not a start and an end, which prevents this from being a simple extension. Should we add slice()
and deprecate substr()
?