Doc. no.: J16/03-0011= WG21/N1429.
Previous Doc. no.: J16/02-0044 = WG21/N1386
Date: 03 March 2003
Project: Programming Language C++
Subgroup: Library
Wiki: http://www.research.att.com/~ark/cgi-bin/lwg-wiki/wiki?RegularExpressions
Reply to: John Maddock <john_maddock@compuserve.com>
This proposal is based heavily on the Boost regular expression library, although GRETA is also discussed. The proposal represents widespread existing practice; the Boost regex library is in use on over twenty compiler/platform combinations, and is one of the most visited libraries on the Boost web site. The proposal is necessarily long and quite detailed and covers many difficult and contentious issues. The intent is to provide a concrete "straw man" proposal for discussion along with a clear discussion of the design issues involved.
Motivation Target Audience Impact on the standard Design Summary of key design decisions Summary of key limitations imposed by the design Regular expression syntax (and complexity) Regular expression algorithms Regular expression representation Regular expression traits classes Representing regular expression matches Algorithms Repeated regular expression searches Iterator requirements Proposed text Regular Expression Traits Header <regex> synopsis Supporting Types Template class basic_regex Template class sub_match Template class match_results Regular expression algorithms Regular expression Iterators
Regular expressions should be familiar to many programmers; a regular expression is a string used as a pattern to select specific strings from a set of character strings. Regular expressions are used by many familiar Unix utilities, including egrep, grep, lex, and sed[1], and are supported directly in many programming languages including awk[1], perl[2], C#[3], and ECMAScript[4] (also known as JavaScript), and indeed are often used ubiquitously wherever text processing occurs.
Until recently most regular expression libraries were implemented in C, no doubt helped by the fact that the POSIX standard provides a standardized regular expression API[1]. Well known C implementations include the GNU regex library[5], the RX library[6], Henry Spencer's regex library[7] (which forms part of the BSD distribution), and PCRE[8] (for perl like regular expressions). Most early C++ implementations were simple wrappers around the C API's[9][10], although these add a measure of safety (by using the Resource Acquisition is Initialization principle), as well an object oriented interface. However wrapper classes and C API's alike suffer from a key defect when interfacing with the C++ standard library: they are limited to searching narrow character strings held in a const char*.
Native C++ implementations can overcome the limitations suffered by C libraries, and there are two main candidates for standardization: the Boost regex library[11] (formerly regex++), and GRETA[12]. Both these libraries support narrow and wide character regular expressions, and iterator-based searches. GRETA implements the regular expression algorithms as member functions of the regular expression class, while the Boost library make these separate non-member functions. GRETA is based on perl regular expressions, where as the Boost library uses POSIX regular expression matching rules, albeit with some perl compatible extensions.
The are essentially three possible target audiences for a regular expression library:
End Users: regular expression may be used by the end user of a program - typically this would be in a text editor for search (and replace) operations. These users would require that the expression is localized to use their native character set, and never exhibit the kind of exponential search times that some regular expression engines can produce under worst case conditions. Typically this means that the regular expression recognized by the engine has to be restricted to some kind of safe subset of the syntax recognized by programming tools such as perl.
Programmers: this section includes all users of scripting languages like perl[2], awk[1], and ECMAScript[4]. In these cases the regular expressions are authored by the programmer, and these are converted to a state machine at run time. Typically these users require an expressive regular expression syntax, at the expense of worst case performance. Localization of the syntax is not normally required, although the state machine itself should usually honor the run time locale (for example when matching an alphanumeric character).
Programmers (code generators): this section of users include those using tools like lex[1], which convert one or more regular expressions into a C or C++ source file. Here the expressions are written by the programmer, and then converted to a state machine at compile time. There is no localization, either of the syntax or of the way in which the state machine behaves at runtime. The generated state machines are usually highly optimized; as a result the regular expressions may take some time to be analyzed by the generator program, and are typically quite restricted in the syntax they accept.
This proposal concentrates on the second group; that is to say a rich regular expression syntax, generally used by programmers, and interpreted at run time. However, thought has been given to the other possible user types, and the design by no means rule these out completely.
This is a pure library proposal, it does not add any new language features, or alter any existing standard library headers.
This proposal consists of a pair of classes, and a set of algorithms that operate on instances of those classes. Regular expressions are represented by instances of the basic_regex
class template, matches against a regular expression are represented by instances of the match_results
class template. The algorithms regex_match
and regex_search
take as input a regular expression and a character string, and fill in a match_results
structure if a match is found. For search and replace operations the algorithm regex_replace
performs formatted search and replace on a character string, producing a new string as output. Finally, for repeated find operations, there are two iterator types; one enumerates a sequence of full regular expression matches, the other a sequence of strings.
Regular expression syntax: ECMAScript (discussion).
Complexity: None specified (discussion).
Regular Expression Object: A template class modeled after basic_string
; the regular expression object does not have access to the iterator type that will be searched (discussion).
Traits class: Regular expression objects may be customized via a traits class template parameter; all localization of the library is handled via the traits class (discussion).
Character types: both narrow and wide character strings are supported.
Representation of regular expression matches: a container of sub-expression matches, what matched each marked sub-expression in the regular expression is reported back to the user (discussion).
Regular expression algorithms: iterator based non-member functions for searching for a match, and for finding an exact match (discussion).
Search and replace: ECMAScript-like search and replace via a non-member function to transform one string to another (discussion).
Repeated searches: an iterator type to enumerate all of the regular expression matches within another iterator sequence (discussion).
Splitting a string: an iterator type that enumerates one or more strings for each regular expression match within another iterator sequence (discussion).
Regular expression syntax: whatever syntax is chosen will severely limit the choice of state machine used along with any complexity guarantees given. The choice of ECMAScript regular expressions implies that some regular expressions can only be matched by a backtracking Non-deterministic Finite Automata (NFA), other algorithms may be used for some subsets of ECMAScript regular expressions (discussion).
Regular expression object type: the fact that the regular expression object is templated on the character type, and not on an iterator type, may make some implementation choices more difficult. In practice this choice seems to have no impact on performance (discussion).
Regular expression traits class: the traits class design will be a major constraining factor on the kinds of internal representations that a regular expression state machine may use. The design presented here is the simplest possible; implementations wishing to extend the regular expression library may need to extend the traits class design (discussion).
Support for wide characters: much regular expression theoretical work is based on small finite character sets, and can therefore not be applied to wide character strings, however wide character string support is considered essential. This constraint may be mitigated to some degree by providing a specialization for basic_regex<char>
and then overloading its associated non-member algorithms (discussion).
Sub-expression matching: User feedback suggests that support for matching marked sub-expressions is essential. However, this can have a negative impact on the kinds of algorithms that may be employed. This is mitigated to some degree in this proposal by providing overloaded forms of the key algorithms that do not report what matched, allowing alternative algorithms to be used in these cases for some expressions (discussion).
Iterator type: support for forward iterators would allow for multibyte character support via iterator adapters, however this would also preclude some perl-like features, and may have negative performance implications. This proposal requires support for bidirectional iterators, however this choice needs more investigation and may need to be changed at a later date (discussion).
The complexity of converting a character string to a state machine, and the complexity of searching for a match to that machine, are deeply dependent upon the syntax used to represent the regular expression. There are three ways in which the regular expression syntax can be specified:
The following brief summary indicates the main differences between the regular expression syntaxes that have been standardized. It is assumed that the reader is already familiar with one or more of these.
The basic POSIX regular expression syntax is most often encountered in Unix utilities such as grep or sed.
Only the characters .[\*^$
are special. Other features are accessed as escape sequences, for example grouping is accomplished with \(
and \)
, quantified repeats using \{
and \}
and alternation with \|
.
Bracket expressions (contained in []
), have access to the full range of POSIX features including named collating elements and equivalence classes, however escape sequences within bracket expressions are not valid (unlike in perl).
There are no + or ? quantifiers.
Back-references to previously marked sub-expressions (with \(
and \)
) are allowed. This last option means that pattern recognition is almost certainly an NP-Complete problem[13].
Expressions are matched using the "leftmost longest" rule.
The extended POSIX regular expression syntax is used by the Unix utilities egrep and awk[1].
The extended POSIX regular expression syntax is generally more natural in use than the basic syntax, the characters .[]{}+*?|\^$
are all special, escape sequences are used to make special characters ordinary (in other words there are no special escape sequences defined by the POSIX standard, however such sequences are allowed as compatible extensions).
Expressions are matched using the "leftmost longest" rule[1].
Back-references are not allowed in POSIX extended regular expressions, as a result a POSIX extended regular expression can be converted to a Deterministic Finite Automata (DFA) if required.
The following features are supported by both POSIX-extended and POSIX-basic regular expressions but not by ECMAScript:
ECMAScript[4] is based on the perl regular expression syntax[2], which in turn is loosely based on the POSIX extended syntax[1]. ECMAScript is also known as JavaScript or JScript.
Unlike the POSIX regular expression syntaxes, ECMAScript does not use the "leftmost longest" rule to determine the best possible match, instead the regular expression is treated as a directed graph, and a depth first search is performed, with the first match found being returned. The following examples illustrate the main differences between the two:
Expression |
Text |
POSIX leftmost longest match |
ECMAScript depth first search match |
|
|
|
|
|
|
$0 = " abc def xyz " |
$0 = " abc def xyz " |
|
|
|
|
These differences between ECMAScript matching rules, and POSIX matching rules, mean that these two regular expression syntaxes differ not only in the features offered, but also in the form that the state machine takes and/or the algorithms used to traverse the state machine.
The following features are supported by ECMAScript, but not by the POSIX-basic or POSIX-extended syntaxes:
The main advantage offered by ECMAScript is that the richer syntax makes it very much easier to author patterns that will match sections of mark-up languages. For example to match a "<font>…</font>" section using the POSIX syntax is much more complicated than it first seems, the obvious expression:
"<font[^>]*>.*</font>"
would match everything from the first occurrence of "<font" to the last occurrence of "</font>", including any intervening <font> or </font> tags. It is possible to do the right thing using something much more complex like:
"<font[^>]*>([^<]|<[^/]|</[^f]|</f[^o]|</fo[^n]|</fon[^t]|</font[^>])*</font>"
which is not especially memorable (nor even a general solution). Where as the ECMAScript equivalent:
"<font[^>]*>.*?</font>"
is very much easier. I've seen suggestions that a POSIX-like syntax, but using a "leftmost shortest" matching rule would be a better a way of handling mark-up languages[14], but while this is true, it still doesn't approach the flexibility of ECMAScript in this important area.
Perl 5 regular expressions offer the same facilities as ECMAScript, as well as several additional features such as independent sub-expressions, zero width look-behind assertions, and interpreted sub-expressions. The latter feature allows recursive regular expressions; something that can only be fully implemented when regular expressions are an intrinsic part of the language, and have access to the program's variables. Given that perl isn't standardized, the perl syntax probably isn't a good candidate for the C++ standard. It is also worth noting that the perl regular expression syntax will undergo a complete rewrite in perl 6[15].
This proposal recommends that support for ECMAScript syntax is required, while support for the POSIX syntax is optional. This choice is different to the one made by Boost, but is based on a very clear bias in user expectations (and feedback), in particular for features like "non-greedy repeats" which simply do not fit well into the POSIX "leftmost longest" model. This choice is also the same as that made by almost all of the newer programming languages: not only perl and JavaScript, but also Python, PHP, and Microsoft's .NET framework. The next major revision of the boost regex library will offer a choice of either POSIX or ECMAScript semantics
However it should be noted that ECMA script has no support for localization in regular expressions, the proposal therefore requires that the following features (taken from POSIX-extended regular expressions) be supported, and that these features are localizable via the supplied traits class:
The main state machine representations and algorithms used for implementing regular expressions are shown below, along with their complexities [16][17]:
Finite state machine / algorithm |
Size of machine for M character expression |
Pattern recognition complexity, for an N character sequence, and S states. |
Comments |
DFA |
O(2M) |
O(N) |
May have a high cost for the construction of the finite state machine, typically only used by code generators such as lex. |
Bit-parallel non-backtracking NFA |
O(M) for most expressions, but O(2M) for nested bounded-repeats. |
O(1+(S/B))N |
Only applicable to narrow character regular expressions and does not report what matched. If B (the number of bits in a machine word) is greater than S (a reasonably common experience), then performance is O(N). |
Non backtracking NFA |
O(M) for most expressions, but O(2M) for nested bounded-repeats. |
O(SN) |
Typically used for POSIX-extended regular expression engines, especially tools like egrep where the expression is to be authored by an "end user" rather than a programmer. |
Backtracking NFA |
O(M) |
O(2N) |
Typically used for ECMAScript regular expressions, and for POSIX-basic expressions that contain back-references. |
The main lesson from this table is that unless the standard defines the regular expression grammar used, the complexity of the regular expression algorithms can not be specified. One option that could be explored is to allow for the runtime monitoring of algorithm complexity. That is, the stated algorithm complexity is enforced not by using an algorithm that has the stated complexity, but by monitoring the number of states in the machine that are visited, and throwing an exception if the number visited exceeds some threshold. This option allows for the use of richer regular expression grammars, while at the same time preventing the state machine from executing eternally (this is currently in use with the Boost.regex library and appears to work well in practice).
Please also note that for carefully limited regular expression grammars, and for narrow character strings, there are many clever algorithms and/or heuristics available with much better performance (and in some cases worst case complexity as well) than the ones given here[18][19].
This proposal does not make any performance guarantees, given that this is essentially impossible for ECMAScript regular expressions.
In most programming languages, regular expressions are represented as a character strings. Ultimately though, a regular expression is a finite state machine, the character string is just is a user-friendly representation. This leads to the first design decision: regular expressions will be represented by a type that is constructible from a character string. This allows the data structure representing the finite state machine (DFA or NFA etc) to be completely opaque to the end user. This form also ensures that character strings are converted to a state machine only once (whereas if regular expression were represented by basic_string
then the transformation to a state machine would have to occur every time the expression was used). On the other hand, since character strings and regular expressions are so closely related, the regular expression type should "look like" a basic_string
, in particular the member functions should make the underlying character string representation easily accessible. There are some limits to this; other than basic_regex<>::assign
, member functions will be limited to those that do not modify the string. This limitation arises from the fact that one can not arbitrarily modify a regular expression and expect the result to still be a valid expression, nor can one make arbitrary changes to the underlying state machine without repeating any static analysis that may have been performed on the machine. One reviewer has asked for arbitrary modification to the machine to be allowed, in order to support emacs-style incremental searches. However, it is not clear what kind of impact this would have on the state machine representation, or even whether this is possible for regular expressions. As a result this is not part of this proposal, and experience with Boost so far is that not one user has ever asked for this feature.
The GRETA and Boost regular expression libraries use slightly different forms for the regular expression object. Boost uses:
template <class charT, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class reg_expression;
While GRETA uses:
template< typename ConstIterator, typename traits = perl_syntax<iterator_traits<ConstIterator>::value_type> > class basic_rpattern;
There are two main differences here: Boost has an Allocator template parameter, whereas GRETA does not; and GRETA is templated on the iterator type that will be searched rather than on the character type. In addition, GRETA uses member functions for regular expression search and match operations, where as Boost uses non-member functions. This proposal follows the Boost interface for the following reasons:
basic_string
. This is a deliberate design decision; the basic_string
interface is well known and understood, following the same interface design decisions (in particular using the same template arguments) makes the library both easier to learn, and easier to teach. I believe that it is hard to overstate the importance of this. boost::regex
or boost::wregex
(see discussion below), again in a manner completely analogous to the standard string classes. basic_regex<char>
if these techniques are to be used. This is harder (though not necessarily impossible) if the interface is iterator based. As with all design decisions there are some compromises involved. In particular, the GRETA implementation by default uses a recursive depth first search; each state in the machine is represented by a polymorphic object, so the iterator type needs to be known by the machine in order for iterators to be passed through virtual function calls. However there is a one to one transformation from a virtual function call, to a call through a function pointer looked up in a table indexed by the object's type. A refactored version of Boost regex uses this technique to implement the same algorithm that GRETA uses, the result has been found to be faster than either Boost or GRETA so far, and this implementation will be used in the next major revision of the boost.regex library [21]. In other words, although there are some implementation choices that do not fit easily into the Boost interface, there are no algorithms that can not fit into this interface, either in theory or practice.
There is one other restriction worth pointing out; the Boost interface does not easily allow an implementation to generate self-modifying machine code for the state machine. That would require the state machine to be fixed to a finite number of iterator types, and in practice there appear to be no regex implementations that make this design choice in any case. If this option were required, then full construction of the state machine would have to be delayed until it was required (and the iterator type was known to be one that is supported), the result could then be cached in the regular expression object (and indexed by a const typeinfo*
).
This proposal has made some small modifications to the Boost interface where experience has indicated that the existing version is imperfect, in particular the name of the regular expression object has been changed from reg_expression
to basic_regex
, to make the basic_string
analogy clearer. The actual class used to represent the regular expression is:
template <class charT, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class basic_regex;
In many cases there are multiple ways in which a character string can be converted to a state machine, the most important of these is to tell the state machine whether it should regard a character sequence as case sensitive or not, but other options are possible: for example hints on optimizations and/or the expression syntax to use (for example POSIX basic or extended, perl or JavaScript etc). So in addition to a string, the form of the state machine is determined by a series of flags (represented by a bitmask type), these flags should take on well defined default values which results in any given string producing the same behavior on all implementations.
In the Boost reference implementation these flags are defined as members of a non-template base class to basic_expression
, called regbase
. Originally this base class had several members in addition to these constants (in a design rather similar to std::ios_base
), but as time has gone on these members have either been removed or moved to other locations (either the traits class or the basic_regex
class), so that leaves a base class with no purpose other than to hold some symbolic constants. At the same time there are other symbolic constants used elsewhere by the regular expression library, which in the Boost implementation are defined at namespace scope. To tidy all this up, this proposal adds these to the nested namespace std::regex_constants
, whose only purpose is to hold the symbolic constants used by the regular expression library:
namespace std{ namespace regex_constants{ typedef bitmask_type syntax_type; // default syntax, the same on all implementations: static const syntax_type normal = 0; // case insensitivity: static const syntax_type icase = 1; // possibly other flags follow. // Other flag types follow... } }
One of the more controversial aspects of this proposal will be which flags to include here, as this depends heavily upon the regular expression syntax we standardize upon. As a preliminary choice, this proposal suggests that ECMAScript regular expression must be supported, and that POSIX style regular expression are an optional extra.
The proposal does not document any member functions for accessing the internal state machine representation. Obviously such member functions will exist, but their form will depend upon the implementation. In most cases these member functions will be declared private, and one or more non-member functions will be declared as friends.
When constructing a state machine, one must recognize that not all strings are valid regular expressions, in this proposal the regular expression object throws an exception derived from runtime_error
when the string is not a valid expression.
Finally as with basic_string
, most users will only be using one of two typedefs of basic_regex
:
typedef basic_regex<char> regex; typedef basic_regex<wchar_t> wregex;
The traits class used by class template basic_regex provides two related facilities: localization and customization. Unlike most traits classes used in the standard library, the one used by the class template basic_regex is instance based, allowing three main use cases:
The default traits class (regex_traits) is stateful, and uses std::locale
to provide localization facilities. Note however that stateful traits classes are not tied to std::locale
; user defined traits classes could use some other type for localization: for example on the Microsoft Windows platform an LCID
might provide a better locale type.
The default traits class is defined as follows:
template <class charT> struct regex_traits { public: typedef charT char_type; typedef std::size_t size_type; typedef std::basic_string<char_type> string_type; typedef std::locale locale_type; typedef bitmask_type char_class_type; typedef regex_traits_version_1_tag version_tag; struct sentry { sentry(regex_traits<charT>&); operator void*(); }; static size_type length(const char_type* p); regex_constants::syntax_type syntax_type(charT c) const; regex_constants::escape_syntax_type escape_syntax_type(charT c) const; charT translate(charT c, bool icase) const; string_type transform(const string_type& in) const; string_type transform_primary(const string_type& in) const; char_class_type lookup_classname(const string_type& name) const; string_type lookup_collatename(const string_type& name) const; bool is_class(charT c, char_class_type f) const; template <class InputIterator> int toi(InputIterator& first, InputIterator last, int radix) const; locale_type imbue(locale_type l); locale_type getloc()const; std::string error_string(regex_constants::error_type) const; };
Designing the correct traits class for basic_regex
is difficult, for example the current boost version has been through several design revisions already, and is still imperfect. The design presented here uses combinations of features from the boost regex design and from GRETA, it has also been simplified compared to either of these as far as is possible. However, it is a fact that vendors wishing to provide new and innovative features in their regular expression code will almost certainly need to extend or modify the traits class design presented here. In order to allow for this, the traits class contains a version tag: a type that signifies the interface version that the traits class conforms to. Any traits class that provides a superset of the interface specified in this document should specify a version tag that derives from std::regex_traits_version_1_tag
. Likewise, implementations of basic_regex
should accept any traits class whose version tag derives from regex_traits_version_1_tag
, and may of course accept traits classes with some other version tag as a compatible extension. A class type is used for this purpose to allow for a hierarchy of traits class versions:
// required by the standard: struct regex_traits_version_1_tag {}; // vendor specific traits version, compatible with regex_traits_version_1_tag: struct regex_traits_vendor1_tag : regex_traits_version_1_tag {}; // some future standard version: struct regex_traits_version_2_tag : regex_traits_version_1_tag {};
Traits classes used by basic_regex
, must be default constructible, they are also capable of being imbued with a locale. However the implementation of many traits class designs requires the initialization and or caching of a fair amount of locale-specific data - it makes sense to delay this initialization until the traits class is actually about to be used (and after any call to imbue) - to ensure that the user only pays for what they actually use:
std::regex e; int main(int argc, const char**argv) { if(argc > 2) { e.imbue(std::locale("en_GB"); e.assign(argv[2]); // do some work } else { // do something else, no regex traits class initialization performed } }
The traits class uses a sentry design, similar to that used by the iostreams library for this purpose:
template <class charT, class traits, class Allocator> class basic_regex { private: // for exposition only: traits traits_inst; Allocator alloc_inst; public: basic_regex(const Allocator& a = Allocator()) : traits_inst(), alloc_inst(a) {} basic_regex& assign(const charT* ptr, flag_type f = regex_constants::normal) { traits::sentry s(traits_inst); if(s){ // perform state machine construction here } } };
One of the key features of the traits class design is that it allows the expression syntax to be customized. There are limits to what can be accomplished with this: the traits class is not a parser and has no knowledge of the internal state machine structure (and nor should it). What it can do is replace those characters that have special meaning in the regex grammar with some other (possibly localized) equivalent, by use of the following member functions:
regex_constants::syntax_type syntax_type(charT c) const; regex_constants::escape_syntax_type escape_syntax_type(charT c) const; char_class_type lookup_classname(const string_type& name) const; string_type lookup_collatename(const string_type& name) const; template<class InputIterator> int toi(InputIterator& first, InputIterator last, int radix) const;
Member function syntax_type
, returns an enumerated value which represents the meaning of the next input character: the regular expression grammar is actually applied to the sequence of values obtained by calling traits.syntax_type
, and not the characters in the expression itself. Implementations may define values for regex_constants::syntax_type
that are not specified in this standard in order to support compatible extensions.
Likewise the member function escape_syntax_type
returns an enumerated value indicating the meaning of an escaped character, implementations may support escape sequences not required by the ECMA standard by defining additional values for regex_constants::escape_syntax_type
.
Together the functions syntax_type
and escape_syntax_type
allow the primary regular expression grammar to be customized by a custom traits class: for example a traits class could use Han-ideographs rather that Latin punctuation symbols in place of the usual ? * +
regular expression operators.
The member function lookup_classname
converts a string into a bitmap type, which can later be used to determine whether a character is a member of a particular character class. For example when the parser encounters a \d
escape sequence, syntax_type
will be passed '\'
and return syntax_escape
. escape_syntax_type
will then be passed 'd'
and return escape_type_class
. Finally the parser will pass "d"
to lookup_classname
to obtain a bitmask type that can later be passed to member function is_class
to determine whether a character is a digit or not. This process allows the traits class to support custom character class names, in addition to the standard forms:
\d \s \w [[:alnum:]] [[:alpha:]] [[:blank:]] [[:cntrl:]] [[:digit:]] [[:graph:]] [[:lower:]] [[:print:]] [[:punct:]] [[:space:]] [[:upper:]] [[:xdigit:]]
It is interesting to note that while the C localization API has the function wctype
to support locale specific character class names, there is no equivalent in std::locale
.
Names looked up by lookup_classname
are case insensitive, so for example passing the values "UPPER" "upper"
and "Upper"
would all have the same result. In the case of escape sequences, the value returned by escape_syntax_type
determines whether the character class is to be negated (for example in "\D"
) or not (as in "\d"
).
The member function lookup_collatename
converts a string into a single collating element, it is called when the parser encounters a [[.collating-element-name.]]
sequence. The default traits class supports all the names specified in the POSIX standard, for example if the parser encountered "[[.left-square-bracket.]]"
then " left-square-bracket "
would be passed to lookup_classname
, and "["
would be returned. Even though support for this functionality is required in POSIX regular expressions, note that neither the C, POSIX, nor C++ standards have any portable mechanism for converting locale-specific collating element names to collating elements. Consequently it is implementation defined whether the default traits class supports collating names other than those in the default "POSIX" locale. Finally, the collating element is returned as a string not as a character to allow for multi-character collating elements (for example ae
in English, or ll
in traditional Spanish).
Finally the toi
member function is used to convert numeric sequences to integers according to the traits classes locale: the default traits class uses std::num_get
for this purpose, user defined traits classes that don't use std::locale
need to make their own locale specific arrangements.
The following traits class member functions are used to customize how an expression is matched against an input sequence:
charT translate(charT c, bool icase) const; string_type transform(const string_type& in) const; string_type transform_primary(const string_type& in) const; bool is_class(charT c, char_class_type f) const;
Matching of individual characters can generally be accomplished in one of two ways: either the character to be matched is converted to the set of all characters that are equivalent to it, and the input character is then matched against that set, or the both the input character and the character in the expression are converted to a consistent form (for example lower case) and then directly compared. The regex traits class design takes the latter approach, on the grounds that it can be applied to large character sets, where as it is not in general possible to enumerate all the characters equivalent to some character c, when the character set is large. For example the character set may contain more than two cases (upper, lower and title case variants in Unicode for example), as well as other forms of canonical or compatibility equivalence. Thus two characters c1
and c2
are compared with regard to case using:
translate(c1, false) == translate(c2, false)
and caseless comparisons using:
translate(c1, true) == translate(c2, true)
Character ranges can be matched either in a locale sensitive manner or not: POSIX leaves it up to the implementation whether or not character ranges such as [a-b] are sensitive to the locale, while ECMAScript requires that such ranges are matched only according to character code points. This proposal makes locale sensitivity an option: when the syntax_option_type
flag collate
is set when constructing a regular expression, then character ranges are sensitive to the locale, and when it’s not set then they are matched only by comparing code points.
Matching locale sensitive character ranges is accomplished by converting all the input sequences to collating keys with the transform member function: in the default traits class this just calls std::collate<charT>::transform
. For example when the regex-option flag regex_constants::collate
is set then the sequence [a-b]
would match some character c1
if:
traits.transform("a") <= traits.transform(c1) <= traits.transform("b");
Matching equivalence classes is accomplished by comparing the sort keys returned from the transform_primary member function, for example the expression [[=a=]]
would match some character c1
if:
traits.transform_primary(c1) == traits.transform_primary("a")
Note that the transform
and transform_primary
member function accept a string rather than a character as input in order to cope with multi-character collating elements used in expressions such as [[.ae.]-d]
or [[=ll=]]
. Note also that there is no portable way to implement transform_primary
in terms of std::locale
, since even if the sort key format returned by std::collate_byname<>::transform
is known and can be converted into a primary sort key, the user can still install their own custom std::collate
implementation into the locale object used, and that can use any sort key format they see fit. The transform_primary
member function is therefore more of use to custom traits classes, and should throw an exception if it cannot be implemented for a particular locale. Unfortunately this significantly reduces the usefulness of POSIX style equivalence classes within regular expressions, but that cannot be fixed without modifying the std::collate
facet. Note that primary sort keys can not be obtained by converting to all lower case and then obtaining a regular sort key: primary keys take into account only the primary character shape, case, accentation and locale specific tailoring are not taken into account, so for example the characters "AÀÁÂÃÄÅaàáâãäå" should all produce the same primary sort key.
Finally the is_class
member function is used to determine whether some input character is a member of a particular character class, for example the expression [[:upper:]]
would match some character c1 if:
traits.is_class(c1, traits.lookup_classname("upper")) == true
The default traits class is implemented in terms of a call to std::ctype<charT>::is
.
Whenever the regular expression parser encounters a syntax error, a bad_expression
exception needs to be thrown. The traits class provides the member function error_string
to convert a numeric error condition into a suitably localized string for presentation to the end user, the default traits class returns only English language error strings, but user defined traits classes can easily customize this behavior.
The member functions:
locale_type imbue(locale_type l); locale_type getloc()const;
allow the traits class to have a per-instance locale when required, and to be imbued with that locale. If the traits class is stateless then these functions have no effect, and the member-type locale_type
is a dummy/placeholder type.
Many of the regular expression algorithms report what matched the expression. Unlike matches against a normal sequence of characters, regular expression matches can be of variable length, and can even match the null string. Furthermore, the end of sequence iterator can not be used to indicate that no match was found, as a null string at the end of a sequence can be a valid regular expression match.
Regular expressions also contain marked sub-expressions, each of which has its own "sub-match" in addition to what matched the whole expression. Further more, each of these sub-expression matches can be either matched or non-matched depending upon the context. For example, consider the expression:
(^[[:space:]]*)|([[:space:]]*$)
Here the expression contains two marked sub-expressions (which we'll call $1 and $2 using perl terminology), both the expression as a whole and the two marked sub-expressions can match variable length sequences, and either $1 or $2 can be matched, but never both at once. The expression can also match the null string at either the end or the start of the sequence being searched.
In order to represent this range of possibilities, each match found must be represented as a collection of sub-expression matches, with each sub-expression match represented by a pair of iterators. A Boolean flag is needed to indicate whether a sub-expression participated in the match or not; ideally one would use singular iterators for this, but not all iterators are required to have singular values (nor are is it guaranteed by the standard that one can compare two iterators if one of them is singular), so this option is prohibited.
This proposal uses:
template <class Iterator> struct sub_match : public std::pair<Iterator, Iterator> { bool matched; typename iterator_traits<Iterator>::distance_type length()const; operator basic_string<typename iterator_traits<Iterator>::value_type>()const; };
to represent matched sub-expressions. This is the simplest solution, however experience with Boost has shown that some users new to C++ get confused by the "pair of iterators" concept, and just assume that the first
member contains a null terminated string representing the match. An alternative and more object oriented approach would be to develop this into a fully fledged substring class, perhaps as part of a wider strings library (the author has some code for this, but it is not yet ready for primetime). However there are also dangers to this approach; a substring class would still be a pair of iterators internally, there is therefore a potential for the string to which the substring refers to be destroyed before the substring object, leaving invalidated iterators behind. Of course this is still a possible problem with the "pair of iterators" approach, however at least it is obvious that they are iterators and not something that looks like it may be a "real" string.
There are two options for representing a collection of sub_matches: either to use an existing container class, or provide a new container. This proposal takes the latter option - providing a new container called match_results -
for the following reasons:
match_results
type adds a number of member functions that are useful for this particular concept, these include prefix() and suffix() (for returning what perl calls $` and $'), position(n) and length(n) provide the position and length of sub-expression n, and str(n) returns a copy of sub-expression n as a string. match_results
type does not contain any member functions for modifying its contents - that is the sole responsibility of the regular expression algorithms - and not the end user. In fact the match_results
type models a "const sequence". match_results
internally; for some algorithms significant performance benefit may be obtained by carefully managing the memory used by the class. format
member function. This is actually a separate algorithm in the Boost regex library (called regex_format
), however there appears to be no benefit in making this a non-member function. Finally there are some typedefs of the match_results
type to simplify usage:
typedef match_results<const char*> cmatch; typedef match_results<const wchar_t*> wcmatch; typedef match_results<string::const_iterator> smatch; typedef match_results<wstring::const_iterator> wsmatch;
In this proposal regular expression algorithms are present as iterator-based non-member functions that operate upon instances of basic_regex
and match_results
. As previously discussed this maintains the independence of the regular expression type from the iterator type. Each algorithm is overloaded for three common cases: where the source character-container sequence is a pair of iterators, or a null terminated string, or an instance of basic_string
. In addition, the algorithms are split into two types: those that report what matched using an instance of the match_results
class, and those that simply indicate whether there was a match or not. Although the latter option may not seem that useful for those used to tools such as perl, in practice it does have important uses, and can be implemented using algorithms that are significantly more efficient than those that report what matched. Indeed for long searches and a limited subset of regular expressions there are algorithms that offer better than O(N) average case performance[22]. Typical uses here include search engines (grep etc), or data input validation (where you just want to compare the string against a regular expression based filter). Note however, that the worst case performance of the "non-reporting" algorithms is still likely to be the same as the "reporting" ones, it's the real world performance that can differ. For this reason the "non-reporting" algorithms represent an opportunity for the vendor to optimize the implementation, there is no enforceable requirement that they do so (indeed Boost does not currently optimize these at present). The regular expression algorithms all also take a series of flags - expressed as a combination of bitmasks - to indicate things like whether the string being searched represents the start/end of a line etc. These flags are all defined in the nested namespace std::regex_constants
, and are of type std::regex_constants::match_flag_type
.
The first two algorithms are regex_search
and regex_match
. The former will find an occurrence of a regular expression within a string, while the latter will determine whether a regular expression matches the string exactly and is mainly used for data input validation. Note that the effect of regex_match
can be obtained by using regex_search
with an expression that is suitably anchored at either end. However, from the programmers point of view, the advantage of using regex_match
is that it is obvious from reading the code that it requires an exact match against the expression, and not just an occurrence somewhere within the string.
The names of the algorithms in this proposal are based upon the names Boost uses and all have a "regex_" prefix, which may cause some controversy (as may the naming used in general in this proposal). This proposal does not use overly generic names like match/compare/search/find here, as the possibility for ambiguous overloads occurring should be obvious. An alternative is to use a generic name, but placed in a nested namespace (let's say std::regex
). This may be an appropriate option, but ambiguities can still occur either as a result of argument dependent lookup, or a using directive. Many people may see the latter as an evil, but experience suggests that statements such as:
using namespace whatever;
are remarkably common in practice. For this reason the "regex_" prefixed names that Boost regex uses are retained in this proposal. Here are the twelve forms for regex_match
and regex_search
:
template <class BidirectionalIterator, class Allocator, class charT, class traits, class Allocator2> bool regex_match(BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class BidirectionalIterator, class charT, class traits, class Allocator2> bool regex_match(BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class charT, class Allocator, class traits, class Allocator2> bool regex_match(const charT* str, match_results<const charT*, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> bool regex_match(const basic_string<charT, ST, SA>& s, match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class charT, class traits, class Allocator2> bool regex_match(const charT* str, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class ST, class SA, class charT, class traits, class Allocator2> bool regex_match(const basic_string<charT, ST, SA>& s, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class BidirectionalIterator, class Allocator, class charT, class traits, class Allocator2> bool regex_search(BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class charT, class Allocator, class traits, class Allocator2> bool regex_search(const charT* str, match_results<const charT*, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> bool regex_search(const basic_string<charT, ST, SA>& s, match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class BidirectionalIterator, class Allocator, class charT, class traits> bool regex_search(BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default); template <class charT, class Allocator, class traits> bool regex_search(const charT* str const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default); template <class ST, class SA, class Allocator, class charT, class traits> bool regex_search(const basic_string<charT, ST, SA>& s, const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default);
The algorithms regex_match
and regex_search
both support a feature not commonly seen in regular expression libraries: a partial match. When the flag std::regex_constants::match_partial
is set in the flags passed to the algorithm, then a result of true
may be returned if one or more characters were matched, and the state machine then reached the end of the character sequence while there were still states to be matched. Partial matches answer the question: "could a match have been found if more input was provided?". They have proven to be extremely useful in two situations: for data input validation, where validation occurs a character at a time as the input is entered, and for long searches through character sequences that are too long to be loaded into memory and therefore have to be buffered (in this case the partial match allows the code to determine whether a match could occur at an overlap between two buffers)[23].
Many tools such as perl or sed offer algorithms for performing search and replace operations on a string. These take an input sequence, find all occurrences of a regular expression within it, and replace whatever matched with the result of merging a format string with what matched. The equivalent in the Boost library is the algorithm regex_merge
; there are two main forms, one sends the new text string to an output iterator, while the other returns a basic_string<>
object. In this proposal this algorithm has been renamed regex_replace
in order to make it a little more memorable:
template <class OutputIterator, class BidirectionalIterator, class traits, class Allocator, class charT> OutputIterator regex_replace(OutputIterator out, BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator>& e, const basic_string<charT>& fmt, unsigned int flags = match_default); template <class traits, class Allocator, class charT> basic_string<charT> regex_replace(const basic_string<charT>& s, const reg_expression<charT, traits, Allocator>& e, const basic_string<charT>& fmt, unsigned int flags = match_default);
It is common practice to want to find all of the (non-overlapping) occurrences of a regular expression within a character sequence. Boost uses the algorithm regex_grep
for this; this algorithm searches through the sequence, and for each match found calls a unary predicate with the results of the match. However there is a sense in which this is the wrong interface; basically it provides a "push" interface, where as the standard library normally uses "pull" interfaces in the form of iterators. Of course, one could just call regex_search
repeatedly, however these algorithms often have to do some housekeeping tasks (memory allocation is a typical example) each time they are called. By providing a separate interface for finding all the occurrences of an expression, the implementation can ensure that these housekeeping tasks occur only once, rather than once for each occurrence of the expression. This proposal does not provide the regex_grep
interface that Boost does, instead it provides a regex_iterator
, whose value_type is an instance of the match_results
class template.
template <class BidirectionalIterator, class charT = iterator_traits<BidirectionalIterator>::value_type, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class regex_iterator;
Another common task is a perl-like split operation; to find all the non-overlapping occurrences of an expression within a character sequence, and for each occurrence spit out one or more new strings (usually one new string for each marked sub-expression). In Boost regex this is accomplished using the regex_split
algorithm, however like regex_grep
this algorithm has a "push" rather than a "pull" interface. In addition the semantics of the perl split function (on which the Boost algorithm is modeled) are rather strange; giving different behavior depending upon whether there are any marked sub-expressions or not. For these reasons, in this proposal an iterator-based interface (regex_token_iterator
) is provided that actually has a much more flexible interface than the Boost regex_split
function. The iterator can be constructed with an integer argument indicating the index of the sub-expression to enumerate for each match, in which case the iterator will enumerate one string for each match found. Alternatively the iterator can be constructed from an array literal, containing a list of the sub-expressions to iterate, in which case there will be several strings enumerated for each match found.
template <class BidirectionalIterator, class charT = iterator_traits<BidirectionalIterator>::value_type, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class regex_token_iterator;
Typical usage would be for example:
extern std::string text; extern std::regex re; // // This iterator will enumerate one string for each occurrence of re in text, // each string will be equivalent to what perl calls $` (i.e. field splitting): std::regex_token_iterator<std::string::const_iterator> it1(text.begin(), text.end(), re, -1); // // This iterator will enumerate one string for each occurrence of re in text, // each string will be equivalent to what perl calls $& (the whole match): std::regex_token_iterator<std::string::const_iterator> it1(text.begin(), text.end(), re, 0); // // This iterator will enumerate three string's for each occurrence of re in text, // these will be what perl calls $1, $2 and $5 : std::regex_token_iterator<std::string::const_iterator> it1(text.begin(), text.end(), re, {1,2,5});
While all the algorithms discussed so far are iterator-based, no mention has been made so far of iterator requirements. The following options are available along with their pro's and con's:
InputIterator: this is discounted as an option, input iterators can not indicate what matched (as many of the algorithms must do), nor can they be used to match back-references.
ForwardIterator: this is a real option, forward iterators can be used to indicate what matched. In addition there is in practice a real world use case: multibyte character sequences can be converted to a single wide character "on the fly" using a forward iterator adapter. Note that some multi-byte sequences can not be represented by a bi-directional or random access iterator. There are downsides as well: requiring forward iterator support places a burden on the implementation, in particular matching boundary conditions (such as ^ or $) require that the implementation remembers the last character visited as well as the current position. In addition perl-style backtracking algorithms become much less efficient for single character repeats (a very common construct), since every backtrack-position has to be saved, rather than computed by decrementing an iterator. There are also some perl features (zero width look-behind assertions) that can not be matched using forward iterators, however these features are not required by the ECMAScript standard. It is unclear just how much of a burden forward iterator support would be in practice; some experimental evidence would be of help here.
BidirectionalIterator: This requirement is much easier to implement, and can additionally be used with some kinds of multibyte sequences (Unicode UTF8 encoding for example).
RandomAccessIterator: As with bidirectional iterators, this requirement is easy to implement, it also offers the potential for significant performance gains: for POSIX style matches, computing "leftmost longest" matches is much more efficient, while for perl style matches, skipping through greedy ".*" repeats are much more efficient, and for some expressions Boyer-Moore like search heuristics are possible.
Of the two template libraries discussed here, Boost requires bidirectional iterators, while GRETA doesn't document it's requirements (but appears to also require bidirectional iterators). In practice insisting on ForwardIterator support is probably excessive, while BidirectionIterators are relatively easy to support, in addition the kinds of performance gains that can be obtained from RandomAccessIterators can still be added using compile-time dispatch to different scanners depending upon the iterator type. This proposal currently documents that all iterators must be bidirectional, this can be tightened to a requirement to support forward iterators if practice shows that this is indeed possible (in the meantime implementations should be encouraged to support the widest range of iterator types possible).
ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects.
ECMA-262, ECMAScript Language Specification, Chapter 15 part 5.4.11 String.prototype.replace.
IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 6.1, Portable Character Set.
IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions.
IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, awk.
IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep.
Collating element: A sequence of one or more characters within the current locale that collate as if they were a single character.
Finite state machine: An unspecified data structure that is used to represent a regular expression, and which permits efficient matches against the regular expression to be obtained.
Format specifier: A sequence of one or more characters that is to be replaced with some part of a regular expression match.
Matched: A sequence of zero or more characters shall be said to be matched by a regular expression when the characters in the sequence correspond to a sequence of characters defined by the pattern.
Partial match: A match that is obtained by matching one or more characters at the end of a character-container sequence, but only a prefix of the regular expression.
Primary equivalence class: A set of one or more characters which share the same primary sort key: that is the sort key weighting that depends only upon character shape, and not accentation, case, or locale specific tailorings.
Regular expression: A pattern that selects specific strings from a set of character strings.
Sub-expression: A subset of a regular expression that has been marked by parenthesis.
This clause describes components used by C++ programs to describe regular expressions, and perform search and replace operations on character strings using those expressions.
Clause |
Header |
<regex> |
|
<regex> |
|
<regex> |
|
<regex> |
|
<regex> |
|
<regex> |
|
<regex> |
|
<regex> |
This subclause defines requirements on classes representing regular expression traits, and defines a class template regex_traits<charT> that satisfies those requirements.
The class template basic_regex needs a set of related types and functions to complete the definition of it's semantics. These types and functions are provided as a set of member typedefs and functions in the template parameter `traits' used by the basic_regex template. This subclause defines the semantics guaranteed by these members.
To specialize the basic_regex template to generate a regular expression class to handle a particular character container type CharT, that and its related regular expression traits class Traits is passed as a pair of parameters to the basic_regex template as formal parameters charT and Traits.
In Table RE2 X denotes a traits class defining types and functions for the character container type charT; u is an object of type X; v is an object of type const X; p is a value of type const charT*; I1 and I2 are Input Iterators; c is a value of type const charT; s is an object of type X::string_type; cs is an object of type const X::string_type; b is a value of type bool; I is a value of type int; and loc is an object of type X::locale_type.
Expression |
Return type |
Assertion / Note |
X::char_type |
charT |
The character container type used in the implementation of class template |
X::size_type |
|
An unsigned integer type, capable of holding the length of a null-terminated string of charT's. |
X::string_type |
std::basic_string<charT> |
|
X::locale_type |
Implementation defined |
A copy constructible type that represents the locale used by the traits class. |
X::char_class_type |
Implementation defined |
A bitmask type representing a particular character classification. Multiple values of this type can be bitwise-or'ed together to obtain a new valid value. |
X::version_tag |
Either regex_traits_version_1_tag or a class that publicly inherits from regex_traits_version_1_tag. |
A type that signifies the interface to which this traits class conforms. |
X::sentry |
Implementation defined |
A type used to force initialization of an instance of this traits class. An instance of this type must be copy constructed from a traits class instance, and verified that it does not compare equal to a null pointer constant, before any member function of that traits class instance may be called other than |
X::sentry s(u); |
void |
An object of type |
X::sentry(u) |
Convertible to bool. |
Conversion of the |
X::length(p) |
X::size_type |
Yields the smallest |
v.syntax_type(c) |
regex_constants::syntax_type |
Returns a symbolic value of type |
v.escape_syntax_type(c) |
regex_constants::escape_syntax_type |
Returns a symbolic value of type |
v.translate(c, b) |
X::char_type |
Returns a character |
v.transform(cs) |
X::string_type |
Returns a sort key for string |
v.transform_primary(cs) |
X::string_type |
Returns a primary sort key for string |
v.lookup_classname(cs) |
X::char_class_type |
Converts the string |
v.lookup_collatename(cs) |
X::string_type |
Returns a string containing the collating element named by |
v.is_class(c, v.lookup_classname (cs)) |
bool |
Returns true if character |
v.toi(I1, I2, i) |
int |
Behaves as follows: if |
u.imbue(loc) |
X::locale_type |
Imbues |
v.getloc() |
X::locale_type |
Returns the current locale used by |
v.error_string(i) |
std::string |
Returns a human readable error string for the error condition |
The header <regex> defines the class template regex_traits which shall be capable of being specialized for character container types char and wchar_t, and which satisfies the requirements for a regular expression traits class. Class template regex_traits is described in section RE.3.3.
The header <regex> defines a basic regular expression class template and its traits that can handle all char-like (lib.strings) template arguments.
The header <regex> defines a container-like class template that holds the result of a regular expression match.
The header <regex> defines a series of algorithms that allow an iterator sequence to be operated upon by a regular expression.
The header <regex> defines two specific template classes, regex
and wregex
and their special traits.
The header <regex> also defines two iterator types for enumerating regular expression matches.
namespace std{ namespace regex_constants{ typedef bitmask_type syntax_option_type; typedef bitmask_type match_flag_type; typedef implementation defined syntax_type; typedef implementation defined escape_syntax_type; typedef implementation defined error_type; } // namespace regex_constants class bad_expression; template <class charT> struct regex_traits; template <class charT, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class basic_regex; template <class charT, class traits, class Allocator> bool operator == (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs); template <class charT, class traits, class Allocator> bool operator != (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs); template <class charT, class traits, class Allocator> bool operator < (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs); template <class charT, class traits, class Allocator> bool operator <= (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs); template <class charT, class traits, class Allocator> bool operator >= (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs); template <class charT, class traits, class Allocator> bool operator > (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs); template <class charT, class io_traits, class re_traits, class Allocator> basic_ostream<charT, io_traits>& operator << (basic_ostream<charT, io_traits>& os, const basic_regex<charT, re_traits, Allocator>& e); template <class charT, class traits, class Allocator> void swap(basic_regex<charT, traits, Allocator>& e1, basic_regex<charT, traits, Allocator>& e2); typedef basic_regex<char> regex; typedef basic_regex<wchar_t> wregex; template <class BidirectionalIterator> class sub_match; template <class BidirectionalIterator> bool operator == (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator != (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator < (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator <= (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator >= (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator > (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator == (const sub_match<BidirectionalIterator>& lhs, const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator != (const sub_match<BidirectionalIterator>& lhs, const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator < (const sub_match<BidirectionalIterator>& lhs, const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator > (const sub_match<BidirectionalIterator>& lhs, const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator >= (const sub_match<BidirectionalIterator>& lhs, const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& rhs); template <class BidirectionalIterator, class traits, class Allocator> bool operator <= (const sub_match<BidirectionalIterator>& lhs, const std::basic_string<iterator_traits<BidirectionalIterator> ::value_type, traits, Allocator>& rhs); template <class BidirectionalIterator> bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator == (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs); template <class BidirectionalIterator> bool operator != (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs); template <class BidirectionalIterator> bool operator < (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs); template <class BidirectionalIterator> bool operator > (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs); template <class BidirectionalIterator> bool operator >= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs); template <class BidirectionalIterator> bool operator <= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs); template <class BidirectionalIterator> bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs); template <class BidirectionalIterator> bool operator == (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs); template <class BidirectionalIterator> bool operator != (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs); template <class BidirectionalIterator> bool operator < (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs); template <class BidirectionalIterator> bool operator > (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs); template <class BidirectionalIterator> bool operator >= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs); template <class BidirectionalIterator> bool operator <= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs); template <class charT, class traits, class BidirectionalIterator> basic_ostream<charT, traits>& operator << (basic_ostream<charT, traits>& os, const sub_match<BidirectionalIterator>& m); template <class BidirectionalIterator, class Allocator = allocator< typename iterator_traits<BidirectionalIterator>::value_type > > class match_results; template <class BidirectionalIterator, class Allocator> bool operator == (const match_results<BidirectionalIterator, Allocator>& m1, const match_results<BidirectionalIterator, Allocator>& m2); template <class BidirectionalIterator, class Allocator> bool operator != (const match_results<BidirectionalIterator, Allocator>& m1, const match_results<BidirectionalIterator, Allocator>& m2); template <class charT, class traits, class BidirectionalIterator, class Allocator> basic_ostream<charT, traits>& operator << (basic_ostream<charT, traits>& os, const match_results<BidirectionalIterator, Allocator>& m); template <class BidirectionalIterator, class Allocator> void swap(match_results<BidirectionalIterator, Allocator>& m1, match_results<BidirectionalIterator, Allocator>& m2); typedef match_results<const char*> cmatch; typedef match_results<const wchar_t*> wcmatch; typedef match_results<string::const_iterator> smatch; typedef match_results<wstring::const_iterator> wsmatch; template <class BidirectionalIterator, class Allocator, class charT, class traits, class Allocator2> bool regex_match(BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class BidirectionalIterator, class charT, class traits, class Allocator2> bool regex_match(BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class charT, class Allocator, class traits, class Allocator2> bool regex_match(const charT* str, match_results<const charT*, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> bool regex_match(const basic_string<charT, ST, SA>& s, match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class charT, class traits, class Allocator2> bool regex_match(const charT* str, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class ST, class SA, class charT, class traits, class Allocator2> bool regex_match(const basic_string<charT, ST, SA>& s, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class BidirectionalIterator, class Allocator, class charT, class traits, class Allocator2> bool regex_search(BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class charT, class Allocator, class traits, class Allocator2> bool regex_search(const charT* str, match_results<const charT*, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class BidirectionalIterator, class Allocator, class charT, class traits> bool regex_search(BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default); template <class charT, class Allocator, class traits> bool regex_search(const charT* str const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default); template <class ST, class SA, class Allocator, class charT, class traits> bool regex_search(const basic_string<charT, ST, SA>& s, const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default); template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> bool regex_search(const basic_string<charT, ST, SA>& s, match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default); template <class OutputIterator, class BidirectionalIterator, class traits, class Allocator, class charT> OutputIterator regex_replace(OutputIterator out, BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default); template <class traits, class Allocator, class charT> basic_string<charT> regex_replace(const basic_string<charT>& s, const reg_expression<charT, traits, Allocator>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default); // regular expression iterators: template <class BidirectionalIterator, class charT = iterator_traits<BidirectionalIterator>::value_type, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class regex_iterator; template <class BidirectionalIterator, class charT = iterator_traits<BidirectionalIterator>::value_type, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class regex_token_iterator; } // namespace std
The namespace std::regex_constants
acts as a repository for the symbolic constants used by the regular expression library.
The namespace std::regex_constants
defines five types: syntax_option_type, match_flag_type, syntax_type, escape_syntax_type and error_type
, along with a series of constants of these types.
namespace std{ namespace regex_constants{ typedef bitmask_type syntax_option_type; // these flags are required: static const syntax_option_type normal; static const syntax_option_type icase; static const syntax_option_type nosubs; static const syntax_option_type optimize; static const syntax_option_type collate; static const syntax_option_type ECMAScript = normal; static const syntax_option_type JavaScript = normal; static const syntax_option_type JScript = normal; // these flags are optional, if the functionality is supported // then the flags shall take these names. static const syntax_option_type basic; static const syntax_option_type extended; static const syntax_option_type awk; static const syntax_option_type grep; static const syntax_option_type egrep; static const syntax_option_type sed = basic; static const syntax_option_type perl; } // namespace regex_constants } // namespace std
The type syntax_option_type
is an implementation defined bitmask type (17.3.2.1.2). Setting its elements has the effects listed in table RE3, a valid value of type syntax_option_type
will always have exactly one of the elements normal, basic, extended, awk, grep, egrep, sed or perl
set:
syntax_option_type
effects.
Element |
Effect if set |
normal |
Specifies that the grammar recognized by the regular expression engine uses its normal semantics: that is the same as that given in the ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1). |
icase |
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case. |
nosubs |
Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure. |
optimize |
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. |
collate |
Specifies that character ranges of the form "[a-b]" should be locale sensitive. |
ECMAScript |
The same as normal. |
JavaScript |
The same as normal. |
JScript |
The same as normal. |
basic |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1). |
extended |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX extended regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1). |
awk |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk (FWD.1). |
grep |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1). |
egrep |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep when given the -E option in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1). |
sed |
The same as basic. |
perl |
Specifies that the grammar recognized by the regular expression is an implementation defined extension of the normal syntax. |
namespace std{ namespace regex_constants{ typedef bitmask_type match_flag_type; static const match_flag_type match_default = 0; static const match_flag_type match_not_bol; static const match_flag_type match_not_eol; static const match_flag_type match_not_bow; static const match_flag_type match_not_eow; static const match_flag_type match_any; static const match_flag_type match_not_null; static const match_flag_type match_continuous; static const match_flag_type match_partial; static const match_flag_type match_prev_avail; static const match_flag_type format_default = 0; static const match_flag_type format_sed; static const match_flag_type format_perl; static const match_flag_type format_no_copy; static const match_flag_type format_first_only; } // namespace regex_constants } // namespace std
The type match_flag_type
is an implementation defined bitmask type (17.3.2.1.2). When matching a regular expression against a sequence of characters [first, last) then setting its elements has the effects listed in table RE4:
match_flag_type
effects when obtaining a match against a character container sequence [first,last).
Element |
Effect if set |
match_default |
Specifies that matching of regular expressions proceeds without any modification of the normal rules used in ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1) |
match_not_bol |
Specifies that the expression "^" should not be matched against the sub-sequence [first,first). |
match_not_eol |
Specifies that the expression "$" should not be matched against the sub-sequence [last,last). |
match_not_bow |
Specifies that the expression "\b" should not be matched against the sub-sequence [first,first). |
match_not_eow |
Specifies that the expression "\b" should not be matched against the sub-sequence [last,last). |
match_any |
Specifies that if more than one match is possible then any match is an acceptable result. |
match_not_null |
Specifies that the expression can not be matched against an empty sequence. |
match_continuous |
Specifies that the expression must match a sub-sequence that begins at first. |
match_partial |
Specifies that if no match can be found, then it is acceptable to return a match [from, last) where from!=last, if there exists some sequence of characters [from,to) of which [from,last) is a prefix, and which would result in a full match. |
match_prev_avail |
Specifies that |
format_default |
Specifies that when a regular expression match is to be replaced by a new string, that the new string is constructed using the rules used by the ECMAScript replace function in ECMA-262, ECMAScript Language Specification, Chapter 15 part 5.4.11 String.prototype.replace. (FWD.1). In addition during search and replace operations then all non-overlapping occurrences of the regular expression are located and replaced, and sections of the input that did not match the expression, are copied unchanged to the output string. |
format_sed |
Specifies that when a regular expression match is to be replaced by a new string, that the new string is constructed using the rules used by the Unix sed utility in IEEE Std 1003.1-2001, Portable Operating SystemInterface (POSIX ), Shells and Utilities.. |
format_perl |
Specifies that when a regular expression match is to be replaced by a new string, that the new string is constructed using an implementation defined superset of the rules used by the ECMAScript replace function in ECMA-262, ECMAScript Language Specification, Chapter 15 part 5.4.11 String.prototype.replace (FWD.1). |
format_no_copy |
When specified during a search and replace operation, then sections of the character container sequence being searched that do match the regular expression, are not copied to the output string. |
format_first_only |
When specified during a search and replace operation, then only the first occurrence of the regular expression is replaced. |
namespace std{ namespace regex_constants{ typedef implementation defined syntax_type; static const syntax_type syntax_char; static const syntax_type syntax_open_mark; static const syntax_type syntax_close_mark; static const syntax_type syntax_dollar; static const syntax_type syntax_caret; static const syntax_type syntax_dot; static const syntax_type syntax_star; static const syntax_type syntax_plus; static const syntax_type syntax_question; static const syntax_type syntax_open_set; static const syntax_type syntax_close_set; static const syntax_type syntax_or; static const syntax_type syntax_escape; static const syntax_type syntax_dash; static const syntax_type syntax_open_brace; static const syntax_type syntax_close_brace; static const syntax_type syntax_digit; static const syntax_type syntax_comma; static const syntax_type syntax_equal; static const syntax_type syntax_colon; static const syntax_type syntax_not; } // namespace regex_constants } // namespace std
The type syntax_type
is an implementation defined enumeration type (17.3.2.1.2). Values of type syntax_type
represent how individual characters should be interpreted within a localized regular expression grammar, table RE5 shows which special characters defined in ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects, are equivalent to which syntax_type
values in the C locale:
syntax_type
values in the C locale.
Value |
Equivalent character(s) in the syntax specified by ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects. |
syntax_char |
Any character not listed below. |
syntax_open_mark |
( |
syntax_close_mark |
) |
syntax_dollar |
$ |
syntax_caret |
^ |
syntax_dot |
. |
syntax_star |
* |
syntax_plus |
+ |
syntax_question |
? |
syntax_open_set |
[ |
syntax_close_set |
] |
syntax_or |
| |
syntax_escape |
\ |
syntax_dash |
- |
syntax_open_brace |
{ |
syntax_close_brace |
} |
syntax_digit |
0123456789 |
syntax_comma |
, |
syntax_equal |
= |
syntax_colon |
: |
syntax_not |
! |
namespace std{ namespace regex_constants{ typedef implementation defined escape_syntax_type; static const escape_syntax_type escape_type_word_assert; static const escape_syntax_type escape_type_not_word_assert; static const escape_syntax_type escape_type_control_f; static const escape_syntax_type escape_type_control_n; static const escape_syntax_type escape_type_control_r; static const escape_syntax_type escape_type_control_t; static const escape_syntax_type escape_type_ascii_control; static const escape_syntax_type escape_type_hex; static const escape_syntax_type escape_type_unicode; static const escape_syntax_type escape_type_identity; static const escape_syntax_type escape_type_backref; static const escape_syntax_type escape_type_decimal; static const escape_syntax_type escape_type_class; static const escape_syntax_type escape_type_not_class; } // namespace regex_constants } // namespace std
The type escape_syntax_type
is an implementation defined enumeration type (17.3.2.1.2). Values of type escape_syntax_type
represent how individual escaped characters should be interpreted within a localized regular expression grammar, table RE6 shows which special characters defined in ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects, are equivalent to which escape_syntax_type
values in the C locale:
escape_syntax_type
values in the C locale.
Value |
Equivalent character(s) in syntax specified in ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects |
escape_type_word_assert |
b |
escape_type_not_word_assert |
B |
escape_type_control_f |
f |
escape_type_control_n |
n |
escape_type_control_r |
r |
escape_type_control_t |
t |
escape_type_ascii_control |
c |
escape_type_hex |
x |
escape_type_unicode |
u |
escape_type_identity |
\ |^$*+?{}.()[] |
escape_type_backref |
123456789 |
escape_type_decimal |
0 |
escape_type_not_class |
Any upper case character not listed above. |
escape_type_class |
Any non-upper case character not listed above. |
namespace std{ namespace regex_constants{ typedef implementation defined error_type; static const error_type error_collate; static const error_type error_ctype; static const error_type error_escape; static const error_type error_subreg; static const error_type error_brack; static const error_type error_paren; static const error_type error_brace; static const error_type error_badbrace; static const error_type error_range; static const error_type error_space; static const error_type error_badrepeat; static const error_type error_complexity; static const error_type error_stack; } // namespace regex_constants } // namespace std
The type error_type
is an implementation defined enumeration type (17.3.2.1.2). Values of type error_type
represent the error conditions as described in table RE7:
values.
Value |
Error condition |
error_collate |
The expression contained an invalid collating element name. |
error_ctype |
The expression contained an invalid character class name. |
error_escape |
The expression contained an invalid escaped character, or a trailing escape. |
error_subreg |
The expression contained an invalid backreference. |
error_brack |
The expression contained mismatched [ and ]. |
error_paren |
The expression contained mismatched ( and ). |
error_brace |
The expression contained mismatched { and } |
error_badbrace |
The expression contained an invalid range in a {} expression. |
error_range |
The expression contained an invalid character range, for example [b-a]. |
error_space |
There was insufficient memory to convert the expression into a finite state machine. |
error_badrepeat |
One of *?+{ was not preceded by a valid regular expression. |
error_complexity |
The complexity of an attempted match against a regular expression exceeded a pre-set level. |
error_stack |
There was insufficient memory to determine whether the regular expression could match the specified character sequence. |
class bad_expression : public runtime_error { public: explicit bad_expression(const string& what_arg); };
The class bad_expression
defines the type of objects thrown as exceptions to report errors during the conversion from a string representing a regular expression to a finite state machine.
bad_expression(const string& what_arg);
Effects: Constructs an object of class bad_expression
.
Postcondition: strcmp(what(), what_arg.c_str()) == 0
.
namespace std{ template <class charT> struct regex_traits { public: typedef charT char_type; typedef std::size_t size_type; typedef std::basic_string<char_type> string_type; typedef std::locale locale_type; typedef bitmask_type char_class_type; struct sentry { sentry(regex_traits<charT>&); operator void*(); }; regex_traits(); static size_type length(const char_type* p); regex_constants::syntax_type syntax_type(charT c)const; regex_constants::escape_syntax_type escape_syntax_type(charT c) const; charT translate(charT c, bool icase) const; string_type transform(const string_type& in) const; string_type transform_primary(const string_type& in) const; char_class_type lookup_classname(const string_type& name) const; string_type lookup_collatename(const string_type& name) const; bool is_class(charT c, char_class_type f) const; template<class InputIterator> int toi(InputIterator& first, InputIterator last, int radix) const; locale_type imbue(locale_type l); locale_type getloc()const; std::string error_string(regex_constants::error_type) const; }; } // namespace std
The class template regex_traits is capable of being specialized for the types char
and wchar_t
and satisfies the requirements for a regular expression traits class (RE.1.1).
typedef bitmask_type char_class_type;
The type char_class_type
is used to represent a character classification and is capable of holding an implementation defined superset of the values held by ctype_base::mask (22.2.1).
struct sentry { sentry(regex_traits<charT>&); operator void*(); };
An object of type regex_traits<charT>::sentry shall be constructed from a regex_traits object, and tested to be not equal to null, before any of the member functions of that object other than length
, getloc
, and imbue
shall be called. Type sentry performs implementation defined initialization of the traits class object, and represents an opportunity for the traits class to cache data obtained from the locale object.
static size_type length(const char_type* p);
Effects: returns char_traits<charT>::length(p);
regex_constants::syntax_type syntax_type(charT c) const;
Effects: for a character c in the right hand column of table RE5, returns the corresponding regex_constants::syntax_type
value.
regex_constants::escape_syntax_type escape_syntax_type(charT c) const;
Effects: for a character c in the right hand column of table RE6, returns the corresponding regex_constants::escape_syntax_type
value.
charT translate(charT c, bool icase) const;
Effects: returns (icase ? use_facet<ctype<charT> >(getloc()).tolower(c) : c)
string_type transform(const string_type& in) const;
Effects: returns use_facet<collate<charT> >(getloc()).transform(in.begin(), in.end()).
string_type transform_primary(const string_type& in) const;
Effects: if typeid(use_facet<collate<charT> >) == typeid(collate_byname<charT>)
and the form of the sort key returned by collate_byname<charT>::transform
is known and can be converted into a primary sort key, then returns that key, otherwise returns a empty string.
char_class_type lookup_classname(const string_type& name) const;
Effects: returns an implementation defined value that represents the character classification name
. If name
is not recognized then returns a value that compares equal to 0. At least the names "d", "w", "s", "alnum", "alpha", "blank", "cntrl", "digit", "graph", "lower", "print", "punct", "space", "upper" and "xdigit", shall be recognized. The value returned shall be independent of the case of the characters in name
.
string_type lookup_collatename(const string_type& name) const;
Effects: returns the sequence of one or more characters that represents the collating element name
. Returns an empty string if name
is not recognized. At least the names specified in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 6.1, Portable Character Set shall be recognized.
bool is_class(charT c, char_class_type f) const;
Effects: determines if the character c is a member of the character classification represented by f.
Returns: converts f into a value m
of type ctype_base::mask
in an implementation defined manner, and returns true if use_facet<ctype<charT> >(getloc()).is(c, m)
is true. Otherwise returns true if f & lookup_classname("w") == lookup_classname("w")
and c == '_'
, otherwise returns false.
template <class InputIterator> int toi(InputIterator& first, InputIterator last, int radix) const;
Precondition: argument radix
shall take one of the values 8, 10 or 16.
Effects: constructs an object result
of type int
. If first == last
or if is_class(*first, lookup_classname("d")) == false
then sets result
equal to -1. Otherwise constructs a basic_istream<charT>
object which uses an implementation defined stream buffer type which represents the character sequence [first,last), and sets the format flags on that object as appropriate for argument radix
. Performs unsigned integer formatted input on the basic_istream<charT>
object setting the value result
to the value obtained, and updates first
to point to the first non-digit character in the sequence [first,last).
Returns: the value result
.
Postcondition: is_class(*first, lookup_classname("d")) == false
.
locale_type imbue(locale_type loc);
Imbues this with a copy of the locale loc. The effect of calling imbue is to invalidate all cached data held by this
, no member functions other than length
, getloc
, and imbue
may be called until an object of type sentry
has been copy-constructed from *this
.
Returns: if no locale has been previously imbued then a copy of the global locale in effect at the time of construction of this
, otherwise a copy of the last argument passed to imbue
.
Postcondition: getloc() == loc
.
locale_type getloc()const;
Effects: if no locale has been imbued then a copy of the global locale in effect at the time of construction of this
, otherwise a copy of the last argument passed to imbue
.
std::string error_string(regex_constants::error_type e) const;
Effects: returns a human readable error string for error condition e.
For a char-like type charT
, the template class basic_regex
describes objects that represent a regular expression constructed from a sequence of charT
's. In the rest of this clause, charT
denotes such a given char-like type. Storage for the regular expression is allocated and freed as necessary by the member functions of class basic_regex
.
The template class basic_regex
conforms to the requirements of a Sequence, as specified in (lib.sequence.reqmts), except that only operations defined for const
-qualified Sequences are supported.
Objects of type specialization of basic_regex are responsible for converting the sequence of charT
objects to an internal representation. It is not specified what form this representation takes, nor how it is accessed by algorithms that operate on regular expressions (implementations will typically declare some template functions as friends of basic_regex
to achieve this).
The regular expression grammar recognized by type specialization of basic_regex is described in ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1), and is modified according to any syntax_option_type
flags specified when constructing the object or assigning a regular expression to it (RE.3.1.1). In addition to the features specified in ECMA-262, the following features shall also be recognized:
\d \D \w \W \s \S
are sensitive to the locale encapsulated by the traits class. [[:class-name:]]
are recognized, and are sensitive to the locale encapsulated by the traits class. The range of values for class-name is determined by the traits class, but at least the following names are recognized: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit
. [[.collating-name.]]
are recognized, the range of values for collating-name
is determined by the traits class. [[=collating-name=]]
are recognized, the range of values for collating-name
is determined by the traits class. [a-b]
are locale sensitive when the flag std::regex_constants::collate
is passed to the regular expression constructor.The grammar used is localized by interaction with the traits class template parameter as follows:
Objects of type specialization of basic_regex store within themselves a default-constructed instance of their traits
template parameter, henceforth referred to as traits_inst. This traits_inst object is used to support localization of the regular expression; no basic_regex object shall call any locale dependent C or C++ API, including the formatted string input functions, instead it shall call the appropriate traits member function to achieve the required effect.
The transformation from a sequence of characters to a finite state machine is accomplished by first by transforming the sequence of characters to the sequence of tokens obtained by calling traits_inst.syntax_type(c)
for each input character c. The regular expression grammar is then applied to the sequence of tokens in order to construct the finite state machine (this is to allow the regular expression syntax to be localized to a specific character set; for example to use Far Eastern ideographs, rather than Latin characters). Where traits_inst.syntax_type(c)
returns syntax_escape
, then the implementation shall call traits_inst.escape_syntax_type(c)
for the character following the escape, in order to determine the grammatical meaning of the escape sequence. When traits_inst.escape_syntax_type(c)
returns escape_type_class
or escape_type_not_class
, then the single character following the escape is converted to a string n
, and passed to traits_inst.lookup_classname(n)
, to determine the character classification to be matched against. If traits_inst.lookup_classname(n)
returns null, then the escape is treated as an identity escape.
Where the regular expression grammar requires the conversion of a sequence of characters to an integral value, this is accomplished by calling traits::toi
.
Where the regular expression grammar requires the conversion of a sequence of characters representing a named collating element to the character(s) it represents, this is accomplished by calling traits::
lookup_collatename.
Where the regular expression grammar requires the conversion of a sequence of characters representing a named character class to an internal representation, this is accomplished by calling traits::lookup_classname
. The results from subsequent calls to this function can be bitwise OR'ed together and subsequently passed to traist::is_class
.
The behavior of the internal finite state machine representation, when used to match a sequence of characters is as described in ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1). The behavior is modified according to any match_flag_type
flags specified (RE.3.1.2) when using the regular expression object in one of the regular expression algorithms (RE.7). The behavior is also localized by interaction with the traits class template parameter as follows:
During matching of a regular expression finite state machine against a sequence of characters, two characters c and d are compared using traits_inst.translate(c, getflags() & regex_constants::icase) == traits_inst.translate(d, getflags() & regex_constants::icase)
.
During matching of a regular expression finite state machine against a sequence of characters, comparison of a collating element range c1-c2
against a character c
is conducted as follows: if getflags() & regex_constants::collate
is true, then the character c
is matched if traits_inst.transform(string_type(1,c1)) <= traits_inst.transform(string_type(1,c)) && traits_inst.transform(string_type(1,c)) <= traits_inst.transform(string_type(1,c2))
, otherwise c
is matched if c1 <= c && c <= c2
.
During matching of a regular expression finite state machine against a sequence of characters, testing whether a collating element is a member of a primary equivalence class is conducted by first converting the collating element and the equivalence class to a sort keys using traits::transform_primary
, and then comparing the sort keys for equality.
During matching of a regular expression finite state machine against a sequence of characters, a character c
is a member of character class some_name
, if traits_inst.is_class(c, traits_inst.lookup_classname("some_name"))
.
The functions described in this clause can report errors by throwing exceptions of type bad_expression
.
namespace std{ template <class charT, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class basic_regex { public: // types: typedef charT value_type; typedef implementation defined const_iterator; typedef const_iterator iterator; typedef typename Allocator::reference reference; typedef typename Allocator::const_reference const_reference; typedef typename Allocator::difference_type difference_type; typedef typename Allocator::size_type size_type; typedef Allocator allocator_type; typedef regex_constants::syntax_option_type flag_type; typedef typename traits::locale_type locale_type; // constants: static const regex_constants::syntax_option_type normal = regex_constants::normal; static const regex_constants::syntax_option_type icase = regex_constants::icase; static const regex_constants::syntax_option_type nosubs = regex_constants::nosubs; static const regex_constants::syntax_option_type optimize = regex_constants::optimize; static const regex_constants::syntax_option_type collate = regex_constants::collate; static const regex_constants::syntax_option_type ECMAScript = normal; static const regex_constants::syntax_option_type JavaScript = normal; static const regex_constants::syntax_option_type JScript = normal; // these flags are optional, if the functionality is supported // then the flags shall take these names. static const regex_constants::syntax_option_type basic = regex_constants::basic; static const regex_constants::syntax_option_type extended = regex_constants::extended; static const regex_constants::syntax_option_type awk = regex_constants::awk; static const regex_constants::syntax_option_type grep = regex_constants::grep; static const regex_constants::syntax_option_type egrep = regex_constants::egrep; static const regex_constants::syntax_option_type sed = basic = regex_constants::sed; static const regex_constants::syntax_option_type perl = regex_constants::perl; // construct/copy/destroy: explicit basic_regex(const Allocator& a = Allocator()); explicit basic_regex(const charT* p, flag_type f = regex_constants::normal, const Allocator& a = Allocator()); basic_regex(const charT* p1, const charT* p2, flag_type f = regex_constants::normal, const Allocator& a = Allocator()); basic_regex(const charT* p, size_type len, flag_type f, const Allocator& a = Allocator()); basic_regex(const basic_regex&); template <class ST, class SA> explicit basic_regex(const basic_string<charT, ST, SA>& p, flag_type f = regex_constants::normal, const Allocator& a = Allocator()); template <class InputIterator> basic_regex(InputIterator first, inputIterator last, flag_type f = regex_constants::normal, const Allocator& a = Allocator()); ~basic_regex(); basic_regex& operator=(const basic_regex&); basic_regex& operator=(const charT* ptr); template <class ST, class SA> basic_regex& operator=(const basic_string<charT, ST, SA>& p); // iterators: const_iterator begin() const; const_iterator end() const; // capacity: size_type size() const; size_type max_size() const; bool empty() const; unsigned mark_count() const; // // modifiers: basic_regex& assign(const basic_regex& that); basic_regex& assign(const charT* ptr, flag_type f = regex_constants::normal); basic_regex& assign(const charT* first, const charT* last, flag_type f = regex_constants::normal); template <class string_traits, class A> basic_regex& assign(const basic_string<charT, string_traits, A>& s, flag_type f = regex_constants::normal); template <class InputIterator> basic_regex& assign(InputIterator first, InputIterator last, flag_type f = regex_constants::normal); // const operations: Allocator get_allocator() const; flag_type getflags() const; basic_string<charT> str() const; int compare(basic_regex&) const; // locale: locale_type imbue(locale_type loc); locale_type getloc() const; // swap void swap(basic_regex&) throw(); }; } // namespace std
static const regex_constants::syntax_option_type normal = regex_constants::normal; static const regex_constants::syntax_option_type icase = regex_constants::icase; static const regex_constants::syntax_option_type nosubs = regex_constants::nosubs; static const regex_constants::syntax_option_type optimize = regex_constants::optimize; static const regex_constants::syntax_option_type collate = regex_constants::collate; static const regex_constants::syntax_option_type ECMAScript = normal; static const regex_constants::syntax_option_type JavaScript = normal; static const regex_constants::syntax_option_type JScript = normal; // these flags are optional, if the functionality is supported // then the flags shall take these names. static const regex_constants::syntax_option_type basic = regex_constants::basic; static const regex_constants::syntax_option_type extended = regex_constants::extended; static const regex_constants::syntax_option_type awk = regex_constants::awk; static const regex_constants::syntax_option_type grep = regex_constants::grep; static const regex_constants::syntax_option_type egrep = regex_constants::egrep; static const regex_constants::syntax_option_type sed = basic = regex_constants::sed; static const regex_constants::syntax_option_type perl = regex_constants::perl;
The static constant members are provided as synonyms for the constants declared in namespace std::regex_constants
; for each constant of type syntax_option_type
declared in namespace std::regex_constants
then a constant with the same name, type and value shall be declared within the scope of basic_regex
.
In all basic_regex
constructors, a copy of the Allocator
argument is used for any memory allocation performed by the constructor or member functions during the lifetime of the object.
basic_regex(const Allocator& a = Allocator());
Effects: Constructs an object of class basic_regex
. The postconditions of this function are indicated in Table RE8:
Element |
Value |
empty() |
true |
size() |
0 |
str() |
basic_string<charT>() |
basic_regex(const charT* p, flag_type f = regex_constants::normal, const Allocator& a = Allocator());
Requires: p shall not be a null pointer.
Throws: bad_expression
if p is not a valid regular expression.
Effects: Constructs an object of class basic_regex
; the object's internal finite state machine is constructed from the regular expression contained in the null-terminated string p, and interpreted according to the flags specified in f. The postconditions of this function are indicated in Table RE9:
Element |
Value |
empty() |
false |
size() |
char_traits<charT>::length(p) |
str() |
basic_string<charT>(p) |
getflags() |
f |
mark_count() |
The number of marked sub-expressions within the expression. |
basic_regex(const charT* p1, const charT* p2, flag_type f = regex_constants::normal, const Allocator& a = Allocator());
Requires: p1 and p2 are not null pointers, p1 < p2
.
Throws: bad_expression
if [p1,p2) is not a valid regular expression.
Effects: Constructs an object of class basic_regex
; the object's internal finite state machine is constructed from the regular expression contained in the sequence of characters [p1,p2), and interpreted according the flags specified in f. The postconditions of this function are indicated in Table RE10:
Element |
Value |
empty() |
false |
size() |
std::distance(p1,p2) |
str() |
basic_string<charT>(p1,p2) |
getflags() |
f |
mark_count() |
The number of marked sub-expressions within the expression. |
basic_regex(const charT* p, size_type len, flag_type f, const Allocator& a = Allocator());
Requires: p shall not be a null pointer, len < max_size()
.
Throws: bad_expression
if p is not a valid regular expression.
Effects: Constructs an object of class basic_regex
; the object's internal finite state machine is constructed from the regular expression contained in the sequence of characters [p, p+len), and interpreted according the flags specified in f. The postconditions of this function are indicated in Table RE11:
Element |
Value |
empty() |
false |
size() |
len |
str() |
basic_string<charT>(p, len) |
getflags() |
f |
mark_count() |
The number of marked sub-expressions within the expression. |
basic_regex(const basic_regex& e);
Effects: Constructs an object of class basic_regex
as a copy of the object e. The postconditions of this function are indicated in Table RE12:
Element |
Value |
empty() |
e.empty() |
size() |
e.size() |
str() |
e.str() |
getflags() |
e.getflags() |
mark_count() |
e.mark_count() |
template <class ST, class SA> basic_regex(const basic_string<charT, ST, SA>& s, flag_type f = regex_constants::normal, const Allocator& a = Allocator());
Throws: bad_expression
if s is not a valid regular expression.
Effects: Constructs an object of class basic_regex
; the object's internal finite state machine is constructed from the regular expression contained in the string s, and interpreted according to the flags specified in f. The postconditions of this function are indicated in Table RE13:
Element |
Value |
empty() |
false |
size() |
s.size() |
str() |
s |
getflags() |
f |
mark_count() |
The number of marked sub-expressions within the expression. |
template <class ForwardIterator> basic_regex(ForwardIterator first, ForwardIterator last, flag_type f = regex_constants::normal, const Allocator& a = Allocator());
Throws: bad_expression
if the sequence [first, last) is not a valid regular expression.
Effects: Constructs an object of class basic_regex
; the object's internal finite state machine is constructed from the regular expression contained in the sequence of characters [first, last), and interpreted according to the flags specified in f. The postconditions of this function are indicated in Table RE14:
Element |
Value |
empty() |
false |
size() |
distance(first,last) |
str() |
basic_string<charT>(first,last) |
getflags() |
f |
mark_count() |
The number of marked sub-expressions within the expression. |
basic_regex& operator=(const basic_regex& e);
Effects: Returns the result of assign(e.str(), e.getflags())
.
basic_regex& operator=(const charT* ptr);
Requires: p shall not be a null pointer.
Effects: Returns the result of assign(ptr)
.
template <class ST, class SA> basic_regex& operator=(const basic_string<charT, ST, SA>& p);
Effects: Returns the result of assign(p)
.
const_iterator begin() const;
Effects: Returns a starting iterator to a sequence of characters representing the regular expression.
const_iterator end() const;
Effects: Returns termination iterator to a sequence of characters representing the regular expression.
size_type size() const;
Effects: Returns the length of the sequence of characters representing the regular expression.
size_type max_size() const;
Effects: Returns the maximum length of the sequence of characters representing the regular expression.
bool empty() const;
Effects: Returns true if the object does not contain a valid regular expression, otherwise false.
unsigned mark_count() const;
Effects: Returns the number of marked sub-expressions within the regular expresion.
basic_regex& assign(const basic_regex& that);
Effects: Returns assign(that.str(), that.getflags())
.
basic_regex& assign(const charT* ptr, flag_type f = regex_constants::normal);
Effects: Returns assign(string_type(ptr), f)
.
basic_regex& assign(const charT* first, const charT* last, flag_type f = regex_constants::normal);
Effects: Returns assign(string_type(first, last), f)
.
template <class string_traits, class A> basic_regex& assign(const basic_string<charT, string_traits, A>& s, flag_type f = regex_constants::normal);
Throws: bad_expression
if s is not a valid regular expression.
Returns: *this
.
Effects: Assigns the regular expression contained in the string s, interpreted according the flags specified in f. The postconditions of this function are indicated in Table RE15:
Element |
Value |
empty() |
false |
size() |
s.size() |
str() |
s |
getflags() |
f |
mark_count() |
The number of marked sub-expressions within the expression. |
template <class InputIterator> basic_regex& assign(InputIterator first, InputIterator last, flag_type f = regex_constants::normal);
Requires: The type InputIterator corresponds to the Input Iterator requirements (24.1.1).
Effects: Returns assign(string_type(first, last), f)
.
Allocator get_allocator() const;
Effects: Returns a copy of the Allocator that was passed to the object's constructor.
flag_type getflags() const;
Effects: Returns a copy of the regular expression syntax flags that were passed to the object's constructor, or the last call to assign.
basic_string<charT> str() const;
Effects: Returns a copy of the character sequence passed to the object's constructor, or the last call to assign.
int compare(basic_regex& e)const;
Effects: If getflags() == e.getflags()
then returns str().compare(e.str())
, otherwise returns getflags() - e.getflags()
.
locale_type imbue(locale_type l);
Effects: Returns the result of traits_inst.imbue(l)
where traits_inst
is a (default initialized) instance of the template parameter traits
stored within the object. Calls to imbue invalidate any currently contained regular expression.
Postcondition: empty() == true
.
locale_type getloc() const;
Effects: Returns the result of traits_inst.getloc()
where traits_inst
is a (default initialized) instance of the template parameter traits
stored within the object.
void swap(basic_regex& e) throw();
Effects: Swaps the contents of the two regular expressions.
Postcondition: *this
contains the characters that were in e, e contains the regular expression that was in *this
.
Complexity: constant time.
template <class charT, class traits, class Allocator> bool operator == (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs);
Effects: Returns lhs.compare(rhs) == 0
.
template <class charT, class traits, class Allocator> bool operator != (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs);
Effects: Returns lhs.compare(rhs) != 0
.
template <class charT, class traits, class Allocator> bool operator < (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs);
Effects: Returns lhs.compare(rhs) < 0
.
template <class charT, class traits, class Allocator> bool operator <= (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs);
Effects: Returns lhs.compare(rhs) <= 0
.
template <class charT, class traits, class Allocator> bool operator >= (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs);
Effects: Returns lhs.compare(rhs) >= 0
.
template <class charT, class traits, class Allocator> bool operator > (const basic_regex<charT, traits, Allocator>& lhs, const basic_regex<charT, traits, Allocator>& rhs);
Effects: Returns lhs.compare(rhs) > 0
.
template <class charT, class io_traits, class re_traits, class Allocator> basic_ostream<charT, io_traits>& operator << (basic_ostream<charT, io_traits>& os const basic_regex<charT, re_traits, Allocator>& e);
Effects: Returns (os << e.str()).
template <class charT, class traits, class Allocator> void swap(basic_regex<charT, traits, Allocator>& lhs, basic_regex<charT, traits, Allocator>& rhs);
Effects: calls lhs.swap(rhs)
.
Template class sub_match denotes the sequence of characters matched by a particular marked sub-expression.
When the marked sub-expression denoted by an object of type sub_match<> participated in a regular expression match then member matched
evaluates to true, and members first
and second
denote the range of characters [first,second)
which formed that match. Otherwise matched
is false, and members first
and second
contained undefined values.
If an object of type sub_match<>
represents sub-expression 0 - that is to say the whole match - then member matched
is always true, unless a partial match was obtained as a result of the flag match_partial
being passed to a regular expression algorithm, in which case member matched
is false, and members first
and second
represent the character range that formed the partial match.
template <class BidirectionalIterator> class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator> { public: typedef typename iterator_traits<BidirectionalIterator>::value_type value_type; typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; typedef BidirectionalIterator iterator; bool matched; difference_type length()const; operator basic_string<value_type>()const; basic_string<value_type> str()const; int compare(const sub_match& s)const; int compare(const basic_string<value_type>& s)const; int compare(const value_type* s)const; };
Static difference_type length();
Effects: returns (matched ? 0 : distance(first, second))
.
operator basic_string<value_type>()const;
Effects: returns (matched ? basic_string<value_type>(first, second) : basic_string<value_type>()).
basic_string<value_type> str()const;
Effects: returns (matched ? basic_string<value_type>(first, second) : basic_string<value_type>())
.
int compare(const sub_match& s)const;
Effects: returns str().compare(s.str())
.
int compare(const basic_string<value_type>& s)const;
Effects: returns str().compare(s)
.
int compare(const value_type* s)const;
Effects: returns str().compare(s)
.
template <class BidirectionalIterator> bool operator == (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs.compare(rhs) == 0
.
template <class BidirectionalIterator> bool operator != (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs.compare(rhs) != 0
.
template <class BidirectionalIterator> bool operator < (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs.compare(rhs) < 0
.
template <class BidirectionalIterator> bool operator <= (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs.compare(rhs) <= 0
.
template <class BidirectionalIterator> bool operator >= (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs.compare(rhs) >= 0
.
template <class BidirectionalIterator> bool operator > (const sub_match<BidirectionalIterator>& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs.compare(rhs) > 0
.
template <class BidirectionalIterator> bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs == rhs.str()
.
template <class BidirectionalIterator> bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs != rhs.str()
.
template <class BidirectionalIterator> bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs < rhs.str()
.
template <class BidirectionalIterator> bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs > rhs.str()
.
template <class BidirectionalIterator> bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs >= rhs.str()
.
template <class BidirectionalIterator> bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs <= rhs.str()
.
template <class BidirectionalIterator> bool operator == (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
Effects: returns lhs.str() == rhs
.
template <class BidirectionalIterator> bool operator != (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
Effects: returns lhs.str() != rhs
.
template <class BidirectionalIterator> bool operator < (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
Effects: returns lhs.str() < rhs
.
template <class BidirectionalIterator> bool operator > (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
Effects: returns lhs.str() > rhs
.
template <class BidirectionalIterator> bool operator >= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
Effects: returns lhs.str() >= rhs
.
template <class BidirectionalIterator> bool operator <= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
Effects: returns lhs.str() <= rhs
.
template <class BidirectionalIterator> bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs == rhs.str()
.
template <class BidirectionalIterator> bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs != rhs.str()
.
template <class BidirectionalIterator> bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs < rhs.str()
.
template <class BidirectionalIterator> bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs > rhs.str()
.
template <class BidirectionalIterator> bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs >= rhs.str()
.
template <class BidirectionalIterator> bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs, const sub_match<BidirectionalIterator>& rhs);
Effects: returns lhs <= rhs.str()
.
template <class BidirectionalIterator> bool operator == (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
Effects: returns lhs.str() == rhs
.
template <class BidirectionalIterator> bool operator != (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
Effects: returns lhs.str() != rhs
.
template <class BidirectionalIterator> bool operator < (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
Effects: returns lhs.str() < rhs
.
template <class BidirectionalIterator> bool operator > (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
Effects: returns lhs.str() > rhs
.
template <class BidirectionalIterator> bool operator >= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
Effects: returns lhs.str() >= rhs
.
template <class BidirectionalIterator> bool operator <= (const sub_match<BidirectionalIterator>& lhs, typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
Effects: returns lhs.str() <= rhs
.
template <class charT, class traits, class BidirectionalIterator> basic_ostream<charT, traits>& operator << (basic_ostream<charT, traits>& os const sub_match<BidirectionalIterator>& m);
Effects: returns (os << m.str())
.
Template class match_results denotes a collection of character sequences representing the result of a regular expression match. Storage for the collection is allocated and freed as necessary by the member functions of class match_results.
The template class match_results conforms to the requirements of a Sequence, as specified in (lib.sequence.reqmts), except that only operations defined for const-qualified Sequences are supported.
template <class BidirectionalIterator, class Allocator = allocator<sub_match<BidirectionalIterator> > class match_results { public: typedef sub_match<BidirectionalIterator> value_type; typedef const value_type& const_reference; typedef const_reference reference; typedef implementation defined const_iterator; typedef const_iterator iterator; typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; typedef typename Allocator::size_type size_type; typedef Allocator allocator_type; typedef typename iterator_traits<BidirectionalIterator>::value_type char_type; typedef basic_string<char_type> string_type; // construct/copy/destroy: explicit match_results(const Allocator& a = Allocator()); match_results(const match_results& m); match_results& operator=(const match_results& m); ~match_results(); // size: size_type size() const; size_type max_size() const; bool empty() const; // element access: difference_type length(int sub = 0) const; difference_type position(unsigned int sub = 0) const; string_type str(int sub = 0) const; const_reference operator[](int n) const; const_reference prefix() const; const_reference suffix() const; const_iterator begin() const; const_iterator end() const; // format: template <class OutputIterator> OutputIterator format(OutputIterator out, const string_type& fmt, match_flag_type flags = format_default) const; string_type format(const string_type& fmt, match_flag_type flags = format_default) const; allocator_type get_allocator() const; void swap(match_results& that); };
In all match_results
constructors, a copy of the Allocator argument is used for any memory allocation performed by the constructor or member functions during the lifetime of the object.
match_results(const Allocator& a = Allocator());
Effects: Constructs an object of class match_results. The postconditions of this function are indicated in Table RE16:
Element |
Value |
empty() |
true |
size() |
0 |
str() |
basic_string<charT>() |
match_results(const match_results& m);
Effects: Constructs an object of class match_results, as a copy of m.
match_results& operator=(const match_results& m);
Effects: Assigns m to *this. The postconditions of this function are indicated in Table RE17:
Element |
Value |
empty() |
m.empty(). |
size() |
m.size(). |
str(n) |
m.str(n) for all integers n < m.size(). |
prefix() |
m.prefix(). |
suffix() |
m.suffix(). |
(*this)[n] |
m[n] for all integers n < m.size(). |
length(n) |
m.length(n) for all integers n < m.size(). |
position(n) |
m.position(n) for all integers n < m.size(). |
size_type size()const;
Effects: Returns the number of sub_match elements stored in *this.
size_type max_size()const;
Effects: Returns the maximum number of sub_match elements that can be stored in *this.
bool empty()const;
Effects: Returns size() == 0
.
difference_type length(int sub = 0)const;
Effects: Returns (*this)[sub].length()
.
difference_type position(unsigned int sub = 0)const;
Effects: Returns std::distance(prefix().first, (*this)[sub].first).
string_type str(int sub = 0)const;
Effects: Returns string_type((*this)[sub]).
const_reference operator[](int n) const;
Effects: Returns a reference to the sub_match
object representing the character sequence that matched marked sub-expression n. If n == 0
then returns a reference to a sub_match
object representing the character sequence that matched the whole regular expression.
const_reference prefix()const;
Effects: Returns a reference to the sub_match
object representing the character sequence from the start of the string being matched/searched, to the start of the match found.
const_reference suffix()const;
Effects: Returns a reference to the sub_match
object representing the character sequence from the end of the match found to the end of the string being matched/searched.
const_iterator begin()const;
Effects: Returns a starting iterator that enumerates over all the marked sub-expression matches stored in *this.
const_iterator end()const;
Effects: Returns a terminating iterator that enumerates over all the marked sub-expression matches stored in *this.
template <class OutputIterator> OutputIterator format(OutputIterator out, const string_type& fmt, match_flag_type flags = format_default);
Requires: The type OutputIterator conforms to the Output Iterator requirements (24.1.2).
Effects: Copies the character sequence [fmt.begin(), fmt.end()) to OutputIterator out. For each format specifier or escape sequence in fmt, replace that sequence with either the character(s) it represents, or the sequence of characters within *this to which it refers. The bitmasks specified in flags determines what format specifiers or escape sequences are recognized, by default this is the format used by ECMA-262, ECMAScript Language Specification, Chapter 15 part 5.4.11 String.prototype.replace.
Returns: out.
string_type format(const string_type& fmt, match_flag_type flags = format_default);
Effects: Returns a copy of the string fmt. For each format specifier or escape sequence in fmt, replace that sequence with either the character(s) it represents, or the sequence of characters within *this to which it refers. The bitmasks specified in flags determines what format specifiers or escape sequences are recognized, by default this is the format used by ECMA-262, ECMAScript Language Specification, Chapter 15 part 5.4.11 String.prototype.replace.
allocator_type get_allocator()const;
Effects: Returns a copy of the Allocator that was passed to the object's constructor.
void swap(match_results& that);
Effects: Swaps the contents of the two sequences.
Postcondition: *this
contains the sequence of matched sub-expressions that were in that
, that
contains the sequence of matched sub-expressions that were in *this
.
Complexity: constant time.
template <class BidirectionalIterator, class Allocator, class charT, class traits, class Allocator2> bool regex_match(BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Requires: Type BidirectionalIterator meets the requirements of a Bidirectional Iterator (24.1.4).
Effects: Determines whether there is an exact match between the regular expression e, and all of the character sequence [first, last), parameter flags is used to control how the expression is matched against the character sequence. Returns true if such a match exists, false otherwise.
Postconditions: If the function returns false, then the effect on parameter m is undefined, otherwise the effects on parameter m are given in table RE18:
Element |
Value |
m.size() |
e.mark_count() |
m.empty() |
false |
m.prefix().first |
first |
m.prefix().last |
first |
m.prefix().matched |
false |
m.suffix().first |
last |
m.suffix().last |
last |
m.suffix().matched |
false |
m[0].first |
first |
m[0].second |
last |
m[0].matched |
if a full match was found, and false if it was a partial match (found as a result of the match_partial flag being set). |
m[n].first |
For all integers n < m.size(), the start of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last. |
m[n].second |
For all integers n < m.size(), the end of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last. |
m[n].matched |
For all integers n < m.size(), true if sub-expression n participated in the match, false otherwise. |
template <class BidirectionalIterator, class charT, class traits, class Allocator2> bool regex_match(BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Effects: Behaves "as if" by constructing an instance of match_results<
BidirectionalIterator> what
, and then returning the result of regex_match(first, last, what, e, flags)
.
template <class charT, class Allocator, class traits, class Allocator2> bool regex_match(const charT* str, match_results<const charT*, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_match(str, str + char_traits<charT>::length(str), m, e, flags)
.
template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> bool regex_match(const basic_string<charT, ST, SA>& s, match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_match(s.begin(), s.end(), m, e, flags)
.
template <class charT, class traits, class Allocator2> bool regex_match(const charT* str, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_match(str, str + char_traits<charT>::length(str), e, flags)
.
template <class ST, class SA, class charT, class traits, class Allocator2> bool regex_match(const basic_string<charT, ST, SA>& s, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_match(s.begin(), s.end(), e, flags)
.
template <class BidirectionalIterator, class Allocator, class charT, class traits, class Allocator2> bool regex_search(BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Requires: Type BidirectionalIterator meets the requirements of a Bidirectional Iterator (24.1.4).
Effects: Determines whether there is some sub-sequence within [first,last) that matches the regular expression e, parameter flags is used to control how the expression is matched against the character sequence. Returns true if such a sequence exists, false otherwise.
Postconditions: If the function returns false, then the effect on parameter m is undefined, otherwise the effects on parameter m are given in table RE19:
Element |
Value |
m.size() |
e.mark_count() |
m.empty() |
false |
m.prefix().first |
first |
m.prefix().last |
m[0].first |
m.prefix().matched |
m.prefix().first != m.prefix().second |
m.suffix().first |
m[0].second |
m.suffix().last |
last |
m.suffix().matched |
m.suffix().first != m.suffix().second |
m[0].first |
The start of the sequence of characters that matched the regular expression |
m[0].second |
The end of the sequence of characters that matched the regular expression |
m[0].matched |
if a full match was found, and false if it was a partial match (found as a result of the match_partial flag being set). |
m[n].first |
For all integers n < m.size(), the start of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last. |
m[n].second |
For all integers n < m.size(), the end of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last. |
m[n].matched |
For all integers n < m.size(), true if sub-expression n participated in the match, false otherwise. |
template <class charT, class Allocator, class traits, class Allocator2> bool regex_search(const charT* str, match_results<const charT*, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_search(str, str + char_traits<charT>::length(str), m, e, flags)
.
template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> bool regex_search(const basic_string<charT, ST, SA>& s, match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, const reg_expression<charT, traits, Allocator2>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_search(s.begin(), s.end(), m, e, flags)
.
template <class iterator, class Allocator, class charT, class traits> bool regex_search(iterator first, iterator last, const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default);
Effects: Behaves "as if" by constructing an instance of match_results<
BidirectionalIterator> what
, and then returning the result of regex_search(first, last, what, e, flags)
.
template <class charT, class Allocator, class traits> bool regex_search(const charT* str const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_search(str, str + char_traits<charT>::length(str), e, flags)
.
template <class ST, class SA, class Allocator, class charT, class traits> bool regex_search(const basic_string<charT, ST, SA>& s, const reg_expression<charT, traits, Allocator>& e, match_flag_type flags = match_default);
Effects: Returns the result of regex_search(s.begin(), s.end(), e, flags)
.
template <class OutputIterator, class BidirectionalIterator, class traits, class Allocator, class charT> OutputIterator regex_replace(OutputIterator out, BidirectionalIterator first, BidirectionalIterator last, const reg_expression<charT, traits, Allocator>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default);
Effects: Finds all the non-overlapping matches m of type match_results<BidirectionalIterator>
that occur within the sequence [first, last). If no such matches are found and !(flags & format_no_copy)
then calls std::copy(first, last, out)
. Otherwise, for each match found, if !(flags & format_no_copy)
calls std::copy(m.prefix().first, m.prefix().last, out)
, and then calls m.format(out, fmt, flags)
. Finally if !(flags & format_no_copy)
calls std::copy(last_m.suffix().first, last_m,suffix().last, out)
where last_m
is a copy of the last match found. If flags & format_first_only
is non-zero then only the first match found is replaced.
Returns: out
.
template <class traits, class Allocator, class charT> basic_string<charT> regex_replace(const basic_string<charT>& s, const reg_expression<charT, traits, Allocator>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default);
Effects: Constructs an object basic_string<charT> result
, calls regex_replace(back_inserter(result), s.begin(), s.end(), e, fmt, flags)
, and then returns result
.
The class template regex_iterator
is an iterator adapter; that is to say it represents a new view of an existing iterator sequence, by enumerating all the occurrences of a regular expression within that sequence. regex_iterator
finds (using regex_search
) successive regular expression matches within the sequence from which it was constructed. After it is constructed, and every time operator++ is used, the iterator finds and stores a value of match_results<BidirectionalIterator>
. If the end of the sequence is reached (regex_search
returns false), the iterator becomes equal to the end-of-sequence iterator value. The constructor with no arguments regex_iterator()
always constructs an end of sequence iterator object, which is the only legitimate iterator to be used for the end condition. The result of operator*
on an end of sequence is not defined. For any other iterator value a const match_results<BidirectionalIterator>&
is returned. The result of operator->
on an end of sequence is not defined. For any other iterator value a const match_results<BidirectionalIterator>*
is returned. It is impossible to store things into regex_iterator
's. Two end-of-sequence iterators are always equal. An end-of-sequence iterator is not equal to a non-end-of-sequence iterator. Two non-end-of-sequence iterators are equal when they are constructed from the same arguments.
namespace std{ template <class BidirectionalIterator, class charT = iterator_traits<BidirectionalIterator>::value_type, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class regex_iterator { public: typedef basic_regex<charT, traits, Allocator> regex_type; typedef match_results<BidirectionalIterator> value_type; typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; typedef typename iterator_traits<BidirectionalIterator>::pointer pointer; typedef typename iterator_traits<BidirectionalIterator>::reference reference; typedef std::forward_iterator_tag iterator_category; regex_iterator(); regex_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, match_flag_type m = match_default); regex_iterator(const regex_iterator&); regex_iterator& operator=(const regex_iterator&); bool operator==(const regex_iterator&); bool operator!=(const regex_iterator&); const value_type& operator*(); const value_type* operator->(); regex_iterator& operator++(); regex_iterator operator++(int); private: match_results<BidirectionalIterator> what; // for exposition only BidirectionalIterator end; // for exposition only const regex_type* pre; // for exposition only match_flag_type flags; // for exposition only }; } // namespace std
regex_iterator();
Effects: default constructs the members what,
end
and flags
, and sets the member pre
equal to the null-pointer constant.
regex_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, match_flag_type m = match_default);
Effects: default constructs all data members, and then calls regex_search(a, b, what, re, m)
, if this returns false then sets *this
equal to the end of sequence iterator, otherwise sets end
equal to b
, pre
equal to &re
, and flags
equal to m
.
regex_iterator(const regex_iterator& that);
Effects: constructs all data members as a copy of that
.
Postconditions: *this == that
.
regex_iterator& operator=(const regex_iterator&);
Effects: sets all data members equal to those in that
.
Postconditions: *this == that
.
bool operator==(const regex_iterator& that);
Effects: if (pre == 0) && (that.pre == 0)
then returns true, otherwise returns the result of (pre == that.pre) && (end == that.end) && (flags == that.flags) && (what[0].first == that.what[0].first) && (what[0].second == that.what[0].second)
.
bool operator!=(const regex_iterator&);
Effects: returns !(*this == that)
.
const value_type& operator*();
Effects: returns what
.
const value_type* operator->();
Effects: returns &what
.
regex_iterator& operator++();
Effects: if what.prefix().first != what[0].second
and if the element match_prev_avail
is not set in flags
then sets it. Then calls regex_search(what[0].second, end, what, *pre, ((what[0].first == what[0].second) ? flags | match_not_null : flags))
, and if this returns false then sets *this
equal to the end of sequence iterator.
Returns: *this
.
regex_iterator operator++(int);
Effects: constructs a copy result
of *this
, then calls ++(*this)
.
Returns: result
.
The template class regex_token_iterator
is an iterator adapter; that is to say it represents a new view of an existing iterator sequence, by enumerating all the occurrences of a regular expression within that sequence, and presenting one or more new strings for each match found. Each position enumerated by the iterator is a string that represents what matched a particular sub-expression within the regular expression. When class regex_token_iterator
is used to enumerate a single sub-expression with index -1, then the iterator performs field splitting: that is to say it enumerates one string for each section of the character container sequence that does not match the regular expression specified.
After it is constructed, the iterator finds and stores a value of match_results<BidirectionalIterator> what
(by calling regex_search
) and sets the internal count N to zero. Every time operator++
is used the count N is incremented, if N exceeds or equals this->subs.size()
, then the iterator finds and stores the next value of match_results<BidirectionalIterator>
and sets count N to zero.
If the end of sequence is reached (regex_search
returns false), the iterator becomes equal to the end-of-sequence iterator value, unless the sub-expression being enumerated has index -1: In which case the iterator enumerates one last string that contains all the characters from the end of the last regular expression match to the end of the input sequence being enumerated, provided that this would not be an empty string.
The constructor with no arguments regex_iterator()
always constructs an end of sequence iterator object, which is the only legitimate iterator to be used for the end condition. The result of operator*
on an end of sequence is not defined. For any other iterator value a const basic_string<charT>&
is returned (obtained by calling this->what[subs[N]]
). The result of operator->
on an end of sequence is not defined. For any other iterator value a const basic_string<charT>*
is returned. It is impossible to store things into regex_iterator
's. Two end-of-sequence iterators are always equal. An end-of-sequence iterator is not equal to a non-end-of-sequence iterator. Two non-end-of-sequence iterators are equal when they are constructed from the same arguments.
namespace std{ template <class BidirectionalIterator, class charT = iterator_traits<BidirectionalIterator>::value_type, class traits = regex_traits<charT>, class Allocator = allocator<charT> > class regex_token_iterator { public: typedef basic_regex<charT, traits, Allocator> regex_type; typedef basic_string<charT> value_type; typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; typedef typename iterator_traits<BidirectionalIterator>::pointer pointer; typedef typename iterator_traits<BidirectionalIterator>::reference reference; typedef std::forward_iterator_tag iterator_category; regex_token_iterator(); regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, int submatch = 0, match_flag_type m = match_default); regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, const std::vector<int>& submatches, match_flag_type m = match_default); template <std::size_t N> regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, const int (&submatches)[N], match_flag_type m = match_default); regex_token_iterator(const regex_token_iterator&); regex_token_iterator& operator=(const regex_token_iterator&); bool operator==(const regex_token_iterator&); bool operator!=(const regex_token_iterator&); const value_type& operator*(); const value_type* operator->(); regex_token_iterator& operator++(); regex_token_iterator operator++(int); private: match_results<BidirectionalIterator> what; // for exposition only BidirectionalIterator end; // for exposition only const regex_type* pre; // for exposition only match_flag_type flags; // for exposition only basic_string<charT> result; // for exposition only std::size_t N; // for exposition only std::vector<int> subs; // for exposition only }; } // namespace std
regex_token_iterator();
Effects: constructs an end of sequence iterator, by default constructing all data members.
Postconditions: pre == 0
.
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, int submatch = 0, match_flag_type m = match_default);
Preconditions: !re.empty()
.
Effects: default constructs all data members and pushes the value submatch
onto subs
. Then if regex_search(a, b, what, re, m) == true
sets N
equal to zero, flags
equal to m
, pre
equal to &re
, end
equal to b
, and sets result
equal to ((submatch == -1) ? value_type(what.prefix().str()) : value_type(what[submatch].str()))
. Otherwise if the call to regex_search
returns false, then if (submatch == -1) && (a != b)
sets result
equal to value_type(a, b)
and N
equal to -1. Otherwise sets *this
equal to the end of sequence iterator.
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, const std::vector<int>& submatches, match_flag_type m = match_default);
Preconditions: submatches.size() && !re.empty()
.
Effects: copy constructs member subs
from submatches
, and default constructs the other data members. Then if regex_search(a, b, what, re, m) == true
sets N
equal to zero, flags
equal to m
, pre
equal to &re
, end
equal to b
, and sets result
equal to ((submatch[N] == -1) ? value_type(what.prefix().str()) : value_type(what[submatch[N]].str()))
. Otherwise if the call to regex_search
returns false, then if (submatch[0] == -1) && (a != b)
sets result
equal to value_type(a, b)
and N
equal to -1. Otherwise sets *this
equal to the end of sequence iterator.
template <std::size_t N> regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, const int (&submatches)[R], match_flag_type m = match_default);
Preconditions: !re.empty()
.
Effects: copy constructs member subs
from the iterator sequence [submatches, submatches+R)
, and default constructs the other data members. Then if regex_search(a, b, what, re, m) == true
sets N
equal to zero, flags
equal to m
, pre
equal to &re
, end
equal to b
, and sets result
equal to ((subs[N] == -1) ? value_type(what.prefix().str()) : value_type(what[subs[N]].str()))
. Otherwise if the call to regex_search
returns false, then if (subs[0] == -1) && (a != b)
sets result
equal to value_type(a, b)
and N
equal to -1. Otherwise sets *this
equal to the end of sequence iterator.
regex_token_iterator(const regex_token_iterator& that);
Effects: constructs all data members as a copy of that
.
Postconditions: *this == that
.
regex_token_iterator& operator=(const regex_token_iterator& that);
Effects: sets all data members equal to those in that
.
Postconditions: *this == that
.
bool operator==(const regex_token_iterator&);
Effects: if (pre == 0) && (that.pre == 0)
then returns true, otherwise returns the result of (pre == that.pre) && (end == that.end) && (flags == that.flags) && (N == that.N) && (what[0].first == that.what[0].first) && (what[0].second == that.what[0].second)
.
bool operator!=(const regex_token_iterator&);
Effects: returns !(*this == that)
.
const value_type& operator*();
Effects: returns result
.
const value_type* operator->();
Effects: returns &result
.
regex_token_iterator& operator++();
Effects: if N == -1
then sets *this
equal to the end of sequence iterator. Otherwise if N+1 < subs.size()
, then increments N
and sets result equal to ((subs[N] == -1) ? value_type(what.prefix().str()) : value_type(what[subs[N]].str()))
.
Otherwise if what.prefix().first != what[0].second
and if the element match_prev_avail
is not set in flags
then sets it. Then if regex_search(what[0].second, end, what, *pre, ((what[0].first == what[0].second) ? flags | match_not_null : flags)) == true
sets N
equal to zero, and sets result
equal to ((subs[N] == -1) ? value_type(what.prefix().str()) : value_type(what[subs[N]].str()))
.
Otherwise if the call to regex_search
returns false, then let last_end
be the value of what[0].second
prior to the call to regex_search
. Then if last_end != end
and subs[0] == -1
sets N
equal to -1 and sets result
equal to value_type(last_end, end)
. Otherwise sets *this
equal to the end of sequence iterator.
Returns: *this
.
regex_token_iterator& operator++(int);
Effects: constructs a copy result
of *this
, then calls ++(*this)
.
Returns: result
.
I am particularly grateful to GRETA's author Eric Niebler for many long and detailed discussions, to Edward Diener for much helpful feedback over many years, to Bjorn Karlsson, Jeremy Siek, Richard Peters, and Beman Dawes for many helpful comments on a draft of this proposal, and of course to all the Boost-list members and regex users for their continued help with this project.