1. Introduction
Stream-based I/O is designed since C++98 to give a thorough abstraction to I/O. An important part of the abstraction is position indicator, which is used to tell and seek the current position. Such position is most commonly conveyed by
, which is
by default and the only instantiation used in the standard library is
. It’s widely implemented to be able to be output by stream
; however, it cannot be output directly by
in C++23 astonishingly due to lack of
. Moreover, the state of the position is usually neglected, making it not robust in some cases to be output by
. This paper aims to add a specialization for
to solve these problems.
2. Motivation
Let’s see tony table to illustrate it directly:
Before |
---|
|
After |
|
3. Design Decision
3.1. Core Problem
3.1.1. Not an integer, only convertible
For C programmers and those who don’t take a thorough look at the design of stream, they’re likely to regard the position as an integer directly since
just returns so. Actually, in C++ it’s designed as follows:
template < typename CharT , typename CharTraits = char_traits < CharT >> class basic_iostream { public : using pos_type = typename CharTraits :: pos_type ; pos_type tellg (); pos_type tellp (); };
And the most commonly used types are alias like:
template < typename CharT , typename CharTraits = char_traits < CharT >> class basic_fstream : public basic_iostream < CharT , CharTraits > { ... }; using fstream = basic_iostream < char > ; using wfstream = basic_iostream < wchar_t > ;
So the return type of
is usually determined by
, where
is
or
. They’re defined as
, and the only used instantiation in the standard library is
.
only supports limited integer operations, like subtracting another
to get
, or adding a
offset to get a new
. Particularly,
is regulated to be an alias of a signed integer type, and
should be convertible to
to make expression
compile (See [stream.types] and [fpos.operations] in the standard).
Though such requirement can be implemented as follows:
template < typename StateT > class fpos { explicit operator streamoff () const { ... } };
The existing mainstream implementations, like MS-STL, libstdc++, libc++, and many other implmentations that support C++98 like Apache stdcxx, STLport (stlport/stl/char_traits#92),
all choose to make it not
. Such tacit phenomenon make an illusion to C++ programmers that "it’s just an integer". Specifically, for output, the
has all overloads like:
basic_ostream & operator << ( int ); basic_ostream & operator << ( long ); basic_ostream & operator << ( long long );
This enables implicit conversion from
to
to match one of the overloads and makes output successful. However, it fails to work when it comes to
, since template doesn’t try to do implicit conversions in most cases. The
specialization for
,
and
cannot be utilized by
without explicit conversion.
3.1.2. Not only the integer, though convertible
Another problem that’s usually neglected is that
doesn’t merely contain the position integer; it also has a
as conveyed by the template parameter, most typically
. It’s used to determine the current state of character conversion, like for
in locale. By default,
assigned by a
will have a value-initialized state, which means the initial state. However, sometimes it’s possibly not "initial", making such restoration unsafe and incorrect.
For instance, though rarely happen, if some derived class of
allows partial conversion when overriding
(since it’s only regulated to prepare space for at least one
),
of its stream may also report a
with partial status. Anyway, it’s incomplete to only report the position integer in some cases.
3.2. Proposed Solution
To make it both safe and convenient, we propose to add formatter specialization for
. Considering that almost all behaviors of
are implementation-defined, it seems meaningless to regulate its format specifications. However, the state should be output in a way that can fully convey its information, so if [P1729] is accepted, a corresponding scanner can be defined to do a round trip.
So to be specific, the formatter specialization of
should behave as follows:
-
When no specification is given (like
or{}
),{ : }
should produceformat
, where"(position, mbstate descriptor)"
can be used to fully restore the state ofmbstate descriptor
;fpos -
When some specifications are given (e.g.
), only the position will be output in the way determined by the format specifications.{ : d }
3.3. Possible Future Evolution
An important aspect to consider is whether we need to change the behavior of
, like making conversion operator to
explicit or overloading
for
. Such breaking change may or may not be expected by many.
Besides, it’s worthwhile to have a discussion about whether we need to leave some way for the
specialization to check whether
is in its initial state by
to boost safety in some cases.
It could also be talked about whether the formatter should be generalized for
to support any
that is formattable, and whether
should support
itself instead of relying on
. It may be also worthy to discuss whether ambiguity is introduced when parsing
to restore the
if the descriptor is not constrained at all.
4. Standard Wording
We propose to add wording in [fpos.operations]:
2.Stream operations that return a value of type traits::pos_type return P(O(-1)) as an invalid value to signal an error. If this value is used as an argument to any istream, ostream, or streambuf member that accepts a value of type traits::pos_type then the behavior of that function is undefined.
3. Assuming that mbstate_t can be fully restored by acalled mbstate descriptor, the formatter of
basic_string < charT > should behave as follows:
fpos < mbstate_t > namespace std { template < class charT > struct formatter < fpos < mbstate_t > , charT > { private : formatter < streamoff , charT > underlying_ ; // exposition only
bool need_state_ = false; // exposition only
public : template < class ParseContext > constexpr typename ParseContext :: iterator parse ( ParseContext & ctx );
template < class FormatContext > typename FormatContext :: iterator format ( const fpos < mbstate_t >& ref , FormatContext & ctx ) const ; }; }
template < class ParseContext > constexpr typename ParseContext :: iterator parse ( ParseContext & ctx ); Effects: Sets need_state_ to true if format-specifier or format-spec ([format.string.general]) is not present, otherwise same as underlying_.format(ctx).
Returns: An iterator past the end of format-spec.
template < class FormatContext > typename FormatContext :: iterator format ( const fpos < mbstate_t >& ref , FormatContext & ctx ) const ; Effects: Writes the following into ctx.out():Returns: An iterator past the end of the output range.
- If need_state_ is false, then as if underlying_.format(static_cast<streamoff>(ref), ctx);
- Otherwise,
- STATICALLY-WIDEN<charT>("("),
- the result of writing static_cast<streamoff>(ref) via underlying_,
- STATICALLY-WIDEN<charT>(", "),
- the mbstate descriptor of ref.state(),
- STATICALLY-WIDEN<charT>(")").
5. Impact on Existing Code
This is a pure extension to the standard library so there won’t be severe conflicts with the existing code. The only possible conflict is that some existing code has already added specialization on
, but it seems that no open-source code on Github does so.
6. Implementation
Since
is implementation-defined and the internal state is undocumented, we may only be able to write some pseudo-code that’s close to the final implementation. The easiest implementation may make use of inheritance:
template < typename CharT > struct formatter < fpos < mbstate_t > , CharT > : formatter < streamoff , CharT > { private : using Base = formatter < streamoff , CharT > ; bool need_state_ = false; public : constexpr auto parse ( const auto & ctx ) { auto it = ctx . begin (); if ( it == context . end () || * it == WIDEN ( '}' )) { need_state_ = true; return it ; } return Base :: parse ( ctx ); } auto format ( const fpos < mbstate_t >& value , auto & ctx ) const { if ( need_state_ ) { return format_to ( ctx . out (), ctx . locale (), WIDEN ( "({}, {})" ), static_cast < streamoff > ( value ), GET_MBSTATE_DESCRIPTOR ( value . state ())); } return Base :: format ( static_cast < streamoff > ( value ), ctx ); } };
Notice that MS-STL and libstdc++ can still utilize the implicit conversion so the
in
can be omitted. libc++ checks whether
is integer in the template
method and thus implicit conversion cannot help.
7. Acknowledgement
Thanks to Victor Zverovich, the author of [fmt], and Arthur O’Dwyer for suggestions and discussions on generalization of the conversion and encouragement to post this paper. Thanks to Tom Honermann for advice on formatting the state type and assistance. I’d also like to extend my gratitude to Peking University for giving me a colorful undergraduate life in the last four years.