basic_regex
Document number: | N1499 = 03-0082 |
Date: | September 22, 2003 |
Project: | Programming Language C++ |
Reference: | ISO/IEC IS 14882:1998(E) |
Reply to: | Pete Becker |
Dinkumware, Ltd. | |
petebecker@acm.org |
basic_regex
Should Not Keep a Copy of its InitializerThe basic_regex
template has a member function str
which
returns a string object that holds the text used to initialize the basic_regex
object. It also provides a container-like interface to this text through the
member functions begin
and end
, which return
const_iterator
objects that allow inspection of the initializer text.
While it might occasionally be useful to look at the initializer string, we ought
to apply the rule that you don't pay for it if you don't use it. Just as fstream
objects don't carry around the file name that they were opened with, basic_regex
objects should not carry around their initializer text. If someone needs to keep
track of that text they can write a class that holds the text and the basic_regex
object.
Recommended changes: remove the member functions str
,
begin
, and end
.
basic_regex
Should Not Have an AllocatorThe basic_regex
template takes an argument that defines a type for
an allocator object. The template also has several member typedefs and one member
function to provide information about the allocator type and the allocator object.
This is because a basic_regex
object "is in effect both a
container of characters, and a container of states, as such an allocator
parameter is appropriate." Calling it a container doesn't make it one.
The allocator in basic_regex
is not very useful, and it
unduly complicates the implementation.
The cost of using an allocator is high. Every type that the basic_regex
object uses internally must have its own allocator type and its own allocator object.
A node based implementation might have a dozen or more node types, requiring a dozen
or more allocator objects. Allocator objects can be created as local objects
when needed, which effectively precludes allocators with internal state; they can be
ordinary members of the basic_regex
object, inflating its size; or
they can be implemented as a chain of base classes (to take advantage of the
zero-size base optimization), with a high cost in readability and maintainability.
None of these options is attractive.
Further, it's not at all clear how a user can determine that a substitute allocator
is appopriate or what characteristics such an allocator should have. The STL containers
have clearly spelled out requirements for their memory usage; basic_regex
objects have no such requirements (nor should they). The implementor of the
basic_regex
template knows best what its memory requirements are.
Recommended changes: remove the Allocator
argument from
basic_regex
and remove the members reference
,
const_reference
, difference_type
, size_type
,
allocator_type
, get_allocator
, and max_size
.
regex_traits
Should Use Iterators, Not StringsThe member functions of the regex_trait
template support customization
and internationalization for regular expressions. Of these, the member functions
transform
, transform_primary
, lookup_collatename
,
and lookup_classname
take string
as input.
This interface is inherently inefficient -- it requires creating a string object
from a sequence in order to pass that string to the function. Further, in the
case of transform
, the function typically extracts iterators from
the string object. Passing the text as a pair of iterators avoids introducing
unnecessary string objects.
Recommended changes:
regex_traits::transform
to
template <class InIt, class OutIt>
string_type transform(InIt first, InIt last) const;
and change the Effects clause to:
Effects: returns use_facet<collate<charT> >(getloc()).transform(first, last))
.
regex_traits::transform_primary
to
template <class InIt, class OutIt>
string_type transform_primary(InIt first, InIt last) const;
and change the Effects clause to:
Effects: if typeid(use_facet<collate<charT> >)
== typeid(collate_byname<charT>)
and the form of the sort key
returned by collate_byname<charT>::transform(first, last)
is known and can be converted into a primary sort key, then returns that key,
otherwise returns an empty string.
regex_traits::lookup_collatename
to
template <class InIt, class OutIt>
char_class_type lookup_collatename(InIt first, InIt last) const;
and change the Effects clause to:
Effects: returns the sequence characters that represents the
collation element named by the characters in the half-open range
[first, last)
if that sequence names a valid collation element
under the imbuded locale, otherwise returns an empty string.
Note that in addition to the iterator language, this change to the effects clause
removes the requirement that lookup_collatename
recognize the names
of characters in the POSIX Portable Character Set. This requirement seems to be the
result of a misunderstanding of what constitutes a collation element.
regex_traits::lookup_classname
to
template <class InIt, class OutIt>
char_class_type lookup_classname(InIt first, InIt last) const;
and change the Effects clause to:
Effects: returns an implementation-specific value that represents
a character classification named without regard to case by the characters in the half-open
range [first, last)
if such a character classification exists, otherwise
returns 0. The implementation shall provide character classes with the following names:
"d"
,
"w"
,
"s"
,
"alnum"
,
"alpha"
,
"blank"
,
"cntrl"
,
"digit"
,
"graph"
,
"lower"
,
"print"
,
"punct"
,
"space"
,
"upper"
,
and "xdigit"
.