Doc. no. | N2321=07=0181 |
Obsoletes: | N2211=07=0071 |
Date: | 2007-06-22 |
Project: | Programming Language C++ |
Reply to: | Martin Sebor |
time_get
facet for POSIX®
compatibility, Revision 2
This is a minor revision of the proposal that clarifies
the permission granted to implementations in Revision
1 of the document to fail to parse input sequences
using complex conversion directives such as
%c
, %x
, and %X
, so
as to extend to the same sequences even when they involve
the optional modifiers E
and O
.
In addition this revision adds a Comaptibility paragraph.
It should be noted that cases where the function may not
be able to correctly parse even complex sequences should
be quite rare especially on POSIX platforms where the
function nl_langinfo
may be used to retrieve the broken-down string consisting
of a sequence of simple conversion directives
corresponding to each of the complex ones. For example, in
the C locale, the broken-down string corresponding to the
%c
directive is "%a %b %e %T
%Y"
. The nl_langinfo
function also
makes it possible to retrieve the alternative symbols used
instead of ordinary digits in directives involving the
E
and O
modifiers.
The time_get
and time_put
facets
provide a low-level asymmetric interface for the parsing
and formatting of time values. The interfaces are
asymmetric because the time_put
facet is
capable of producing a much larger set of sequences than
the time_get
facet is capable of parsing.
The time_put
interface can also readily
expose useful implementation-defined extensions by
recognizing additional formatting specifiers and modifiers
while the time_get
interface provides no such
flexibility. The behavior of the time_put
facet is specified in terms of the C standard library
function strftime
and the facet's interface
allows programs to take advantage of the rich set of the
60 or so strftime
conversion specifies
(including their optional modifiers). In contrast, the
behavior of time_get
is restricted to parsing
a limited set of time and date sequences produced by a
handful of formatting specifiers, namely the
locale-independent and trivial %T
(which is
the same as "%H:%M:%S"
, the 24 hour time
representation), the locale-specific and less trivial
%x
(the locale's date representation), and to
parsing simple weekday names (%a
and
%A
) and the names of calendar months
(%b
and %B
). Presumably, this
restriction exists only because the C standard library
provides no function for parsing time sequences. Such a
function is, however, specified by the ISO/IEC
9945 standard (also known as POSIX) -- see strptime
.
Thus, C++ programs that need to process date and time
sequences produced by any of the other 56 or so formatting
specifiers are unable to do so by relying on the
time_get
facet's parsing functionality, even
though much of it often exists in implementations that
parse non-trivial date sequences but is not exposed in the
interface of the facet. For instance, even the simple
task of parsing a 12 hour time representation is beyond
the ability of the facet, as is the often needed ability
to recognize and interpret time zones.
This paper proposes to extend the time_get
facet interface in a way to permit the parsing of most of
the same set of date and time sequences as produced by
time_put
, thus providing a subset of the same
functionality as POSIX
strptime
. Specifically, we propose to add two
get
and one do_get
member
functions to class template time_get
to
parallel those declared by time_put
.
Add to the declaration of class time_get
in
[lib.locale.time.get], immediately below the declaration
of the member function get_year
, the
following declarations:
iter_type
get (iter_type s, iter_type end,
ios_base& f, ios_base::iostate& err,
tm* t, char format,
char modifier = 0) const;
iter_type
get (iter_type s, iter_type end,
ios_base& f, ios_base::iostate& err,
tm* t, const char_type* fmt,
const char_type *end) const;
Add to the declaration of class time_get
,
immediately below the declaration of the virtual member
function do_get_year
, the following
declaration:
virtual iter_type
do_get (iter_type s, iter_type end,
ios_base& f, ios_base::iostate& err,
tm* t, char format,
char modifier) const;
Add to the end of [lib.locale.time.get.members] the following text:
iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier = 0) const;
Returns:
do_get(s, end, f, err, t, format, modifier)
iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, const char_type* fmt, const char_type *end) const;
Requires:
[fmt, end)
is a valid range.Effects: The function starts by evaluating
err = ios_base::goodbit
. It then enters a loop, reading zero or more characters froms
at each iteration. Unless otherwise specified below, the loop terminates when the first of the following conditions holds:
- The expression
(fmt == end)
evaluates totrue
.- The expression
(err == ios_base::goodbit)
evaluates tofalse
.- The expression
(s == end)
evaluates totrue
, in which case the function evaluateserr = ios_base::eofbit | ios_base::failbit
.- The next element of
fmt
is equal to'%'
, optionally followed by amodifier
character, followed by a conversion specifier character,format
, together forming a conversion specification valid for the ISO/IEC 9945 functionstrptime
. If the number of elements in the range[fmt, end)
is not sufficient to unambiguously determine whether the conversion specification is complete and valid, the function evaluateserr = ios_base::failbit
. Otherwise, the function evaluatess = do_get(s, end, f, err, t, format, modifier)
, where the value ofmodifier
is'\0'
when the optional modifier is absent from the conversion specification. If(err == ios_base::goodbit
) holds after the evaluation of the expression, the function incrementsfmt
to point just past the end of the conversion specification and continues looping.- The expresion
isspace(*fmt, f.getloc())
evaluates totrue
, in which case the function first incrementsfmt
until(fmt == end || !isspace(*fmt, f.getloc())
evaluates totrue
, then advancess
until(s == end || !isspace(*s, f.getloc()))
istrue
, and finally resumes looping.- The next character read from
s
matches the element pointed to byfmt
in a case-insensitive comparison, in which case the function evaluates++fmt, ++s
and continues looping. Otherwise, the function evaluateserr = ios_base::failbit
.Note: The function uses the
ctype<charT>
facet installed inf
's locale to determine valid whitespace characters. It is unspecified by what means the function performs case-insensitive comparison or whether multi-character sequences are considered while doing so.Returns:
s
.
Add the following paragraphs to the end of [lib.locale.time.get.virtuals]:
virtual iter_type do_get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier) const;
Requires:
[fmt, end)
is a valid range andt
is dereferenceable.Effects: The function starts by evaluating
err = ios_base::goodbit
. It then reads characters starting ats
until it encounters an error, or until it has extracted and assigned thosestruct tm
members, and any remaining format characters, corresponding to a conversion directive appropriate for the ISO/IEC 9945 functionstrptime
, formed by concatenating'%'
, themodifier
character, when non-NUL, and theformat
character. When the concatenation fails to yield a complete valid directive the function leaves the object pointed to byt
unchanged and evaluateserr |= ios_base::failbit
. When(s == end)
evaluates totrue
after reading a character the function evaluateserr |= ios_base::eofbit
.For complex conversion directives such as
%c
,%x
, or%X
, or directives that involve the optional modifiersE
orO
, when the function is unable to unambiguously determine some or allstruct tm
members from the input sequence[s, end)
, it evaluateserr |= ios_base::eofbit
. In such cases the values of thosestruct tm
members are unspecified and may be outside their valid range.Note: It is unspecified whether multiple calls to
do_get()
with the address of the samestruct tm
object will update the current contents of the object or simply overwrite its members. Portable programs must zero out the object before invoking the function.Returns: An iterator pointing immediately beyond the last character recognized as possibly part of a valid input sequence for the given
format
andmodifier
.
A reference implementation of this extension is available for review in the Open Source Apache C++ Standard Library. The same extension has been implemented in the Rogue Wave® C++ Standard Library and shipped since 2001. See this page for the latest documentation of the feature.
The proposed extensions are largely source compatible with
the existing interface of the time_get
facet
(there is a very small chance that the introduction of a
new a base class member function might affect the
well-formedness or even the behavior of a program that
calls a function with the same name in a class derived
from the base).
Adding a new virtual member function is a binary incompatible change. During the discussion of this proposal at the Oxford meeting in April 2007 a number of attendees expressed concern about introducing such a change in a Technical Report (such as TR2) and felt that a change of this nature would be more appropriate for the upcoming revision of the C++ standard.