1. Introduction
Stream-based I/O is designed since C++98 to give a thorough abstraction to I/O. An important part of the abstraction is position indicator, which is used to tell and seek the current position. Such position is most commonly conveyed by
, which is
by default and the only instantiation used in the standard library is
. It’s widely implemented to be able to be output by stream
; however, it cannot be output directly by
in C++23 astonishingly due to lack of
. Moreover, the state of the position is usually neglected, making it not robust in some cases to be output by
. This paper aims to add a specialization for
to solve these problems.
2. Revision History
Changes since R0
These changes are mainly suggested by SG16, as discussed and polled in the regular meeting.
-
Format the state as Boolean instead of a recoverable descriptor.
-
Fix several minor wording problems.
3. Motivation
Let’s see tony table to illustrate it directly:
Before |
---|
|
After |
|
4. Design Decision
4.1. Core Problem
4.1.1. Not an integer, only convertible
For C programmers and those who don’t take a thorough look at the design of stream, they’re likely to regard the position as an integer directly since
just returns so. Actually, in C++ it’s designed as follows:
template < typename CharT , typename CharTraits = char_traits < CharT >> class basic_iostream { public : using pos_type = typename CharTraits :: pos_type ; pos_type tellg (); pos_type tellp (); };
And the most commonly used types are alias like:
template < typename CharT , typename CharTraits = char_traits < CharT >> class basic_fstream : public basic_iostream < CharT , CharTraits > { ... }; using fstream = basic_fstream < char > ; using wfstream = basic_fstream < wchar_t > ;
So the return type of
is usually determined by
, where
is
or
. They’re defined as
, and the only used instantiation in the standard library is
.
only supports limited integer operations, like subtracting another
to get
, or adding a
offset to get a new
. Particularly,
is regulated to be an alias of a signed integer type, and
should be convertible to
to make expression
compile (See [stream.types] and [fpos.operations] in the standard).
Though such requirement can be implemented as follows:
template < typename StateT > class fpos { explicit operator streamoff () const { ... } };
As explicit conversion operator is only supported since C++11 while streams are introduced in C++98, the existing mainstream implementations, like MS-STL, libstdc++, libc++, and many other older implementations like Apache stdcxx, STLport (stlport/stl/char_traits#92),
all choose to make it not
, even though it could be strengthened since C++11. Such phenomenon make an illusion to C++ programmers that "it’s just an integer". Specifically, for output, the
has all overloads like:
basic_ostream & operator << ( int ); basic_ostream & operator << ( long ); basic_ostream & operator << ( long long );
This enables implicit conversion from
to
to match one of the overloads and makes output successful. However, it fails to work when it comes to
, since template doesn’t try to do implicit conversions in such cases. The
specialization for
,
and
cannot be utilized by
without explicit conversion.
4.1.2. Not only the integer, though convertible
Another problem that’s usually neglected is that
doesn’t merely contain the position integer; it also has a
as conveyed by the template parameter, most typically
. It’s used to determine the current state of character conversion, like for
in locale. By default,
assigned by a
will have a value-initialized state, which means the initial state. However, sometimes it’s possibly not "initial", so "output it as an integer -> input it to an integer -> assign it back to
" is both unsafe and incorrect.
For instance, though rarely happen, if some derived class of
allows partial conversion when overriding
(since it’s only regulated to prepare space for at least one
),
of its stream may also report a
with partial status. Anyway, it’s incomplete to only report the position integer in some cases.
4.2. Proposed Solution
To make it both safe and convenient, we propose to add formatter specialization for
. Considering that almost all behaviors of
are implementation-defined, it seems meaningless to regulate its format specifications. It’s also suggested by SG16 to not provide an implementation-defined descriptor to enable possible scanner ([P1729]) to restore it. Thus, we just propose to additionally output whether it’s in the initial state.
So to be specific, the formatter specialization of
should behave as follows:
-
When no specification is given (i.e.
or{}
),{ : }
should produceformat
, where the latter is either"(position, mbsinit(&state))" true
orfalse
; -
When some specifications are given (e.g.
), only the position will be output in the way determined by the format specifications.{ : d }
4.3. Possible Future Evolution
There is a related issue that is considered but not proposed in this paper. That is, it could be discussed whether we need to change the behavior of
, like making conversion operator of
to
explicit or overloading
for
. Such breaking change may or may not be expected by many. It may be solved, if necessary, in other future proposals.
5. Standard Wording
We propose to add wording in [fpos.operations]:
2.Stream operations that return a value of type traits::pos_type return P(O(-1)) as an invalid value to signal an error. If this value is used as an argument to any istream, ostream, or streambuf member that accepts a value of type traits::pos_type then the behavior of that function is undefined.
3. The formatter ofshould behave as follows:
fpos < mbstate_t > namespace std { template < class charT > struct formatter < fpos < mbstate_t > , charT > { private : formatter < streamoff , charT > underlying_ ; // exposition only
bool need_state_ = false; // exposition only
public : template < class ParseContext > constexpr typename ParseContext :: iterator parse ( ParseContext & ctx );
template < class FormatContext > typename FormatContext :: iterator format ( const fpos < mbstate_t >& ref , FormatContext & ctx ) const ; }; }
template < class ParseContext > constexpr typename ParseContext :: iterator parse ( ParseContext & ctx ); Effects: Sets need_state_ to true if format-specifier or format-spec ([format.string.general]) is not present, otherwise same as underlying_.format(ctx).
Returns: An iterator past the end of format-spec.
template < class FormatContext > typename FormatContext :: iterator format ( const fpos < mbstate_t >& ref , FormatContext & ctx ) const ; Effects: Writes the following into ctx.out():Returns: An iterator past the end of the output range.
- If need_state_ is false, then as if underlying_.format(static_cast<streamoff>(ref), ctx);
- Otherwise,
- STATICALLY-WIDEN<charT>("("),
- the result of formatting static_cast<streamoff>(ref) via underlying_,
- STATICALLY-WIDEN<charT>(", "),
- the result of formatting the bool value to denote whether the ref.state() is in the initial state, which is same as calling mbsinit([c.mb.wcs]),
- STATICALLY-WIDEN<charT>(")").
6. Impact on Existing Code
This is a pure extension to the standard library so there won’t be severe conflicts with the existing code. The only possible conflict is that some existing code has already added specialization on
, but it seems that no open-source code on Github does so.
7. Implementation
Since many other formatters have already require widen characters, we just use
to represent it in code below for simplicity. Inheritance may be used to implement it:
template < typename CharT > struct formatter < fpos < mbstate_t > , CharT > : formatter < streamoff , CharT > { private : using Base = formatter < streamoff , CharT > ; bool need_state_ = false; public : constexpr auto parse ( const auto & ctx ) { auto it = ctx . begin (); if ( it == context . end () || * it == WIDEN ( '}' )) { need_state_ = true; return it ; } return Base :: parse ( ctx ); } auto format ( const fpos < mbstate_t >& value , auto & ctx ) const { if ( need_state_ ) { auto state = value . state (); return format_to ( ctx . out (), ctx . locale (), WIDEN ( "({}, {})" ), static_cast < streamoff > ( value ), mbsinit ( & state )); } return Base :: format ( static_cast < streamoff > ( value ), ctx ); } };
We notice that:
-
What
returns is the value type, which may introduce unnecessary copy in. state ()
since the state may be stored directly inauto state = value . state ()
. If the compiler is unable to optimize it out, unexposed members offpos < mbstate_t >
may be utilized (e.g. by declaring the formatter as friend) to write it likefpos
.mbsinit ( & value . inner_state ) -
MS-STL and libstdc++ can still utilize the implicit conversion so the
instatic_cast
can be omitted. libc++ checks whetherBase :: format
is integer in the templatevalue
method and thus implicit conversion cannot help.Base :: format
8. Acknowledgement
Thanks to Victor Zverovich, the author of [fmt], and Arthur O’Dwyer for rounds of suggestions and discussions on generalization of the conversion and encouragement to post this paper when the idea is first proposed. Thanks to Tom Honermann for advice on formatting the state type and assistance in D3374R0. Thanks to all members in SG16 for discussion in the regular meeting and mailing list in P3374R0. I’d also like to extend my gratitude to Peking University for giving me a colorful undergraduate life in the last four years.