1. Introduction
This paper proposes making
formattable using the formatting
facility introduced in C++20 (
) and fixes encoding issues in the
underlying API ([LWG4156]).
2. Changes since R0
-
Changed the title from "Formatting of std::error_code" to "Fix encoding issues and add a formatter for std::error_code" to reflect the fact that the paper also fixes [LWG4156].
-
Specified that
returns a string the ordinary literal encoding per SG16 feedback.error_category :: name () -
Made transcoding in
implementation-defined if the literal encoding is not UTF-8 per SG16 feedback and for consistency with other similar cases in the standard.error_category :: message ()
3. Motivation
has a rudimentary
inserter. For example:
std :: error_code ec ; auto size = std :: filesystem :: file_size ( "nonexistent" , ec ); std :: cout << ec ;
This works and prints
.
However, the following code doesn’t compile:
std :: ( "{} \n " , ec );
Unfortunately, the existing inserter has several issues, such as I/O manipulators applying only to the category name rather than the entire error code, resulting in confusing output:
std :: cout << std :: left << std :: setw ( 12 ) << ec ;
This prints:
generic :2
Additionally, it doesn’t allow formatting the error message and introduces potential encoding issues, as the encoding of the category name is unspecified.
4. Proposal
This paper proposes adding a
specialization for
to address the problems discussed in the previous section.
The default format will produce the same output as the
inserter:
std :: ( "{} \n " , ec );
Output:
generic:2
It will correctly handle width and alignment:
std :: ( "[{:>12}] \n " , ec );
Output:
[ generic:2]
Additionally, it will allow formatting the error message:
std :: ( "{:s} \n " , ec );
Output:
No such file or directory
(The actual message depends on the platform.)
The main challenge lies in the standard’s lack of specification for the
encodings of strings returned by
and
/
(syserr.errcat.virtuals):
virtual const char * name () const noexcept = 0 ;
Returns: A string naming the error category.
virtual string message ( int ev ) const = 0 ;
Returns: A string that describes the error condition denoted by
.
In practice, implementations typically define category names as string literals, meaning they are in the ordinary literal encoding.
However, there is significant divergence in message encodings. libc++ and
libstdc++ use
for the generic category which is in the C
(not "C") locale encoding but disagree on the encoding for the system category:
libstdc++ uses the Active Code Page (ACP) while libc++ again uses
/ C locale on Windows. Microsoft STL uses a table of string literals in the
ordinary literal encoding for the generic category and ACP for the system
category.
The following table summarizes the differences:
libstdc++ | libc++ | Microsoft STL | |
POSIX |
|
| N/A |
Windows | / ACP
|
| ordinary literals / ACP |
Obviously none of this is usable in a portable way through the generic
API because encodings can be and often are different.
To address this, the proposal suggests using the C locale encoding (execution
character set), which is already employed in most cases and aligns with
underlying system APIs. Microsoft STL’s implementation has a number of bugs in
([MSSTL-3254], [MSSTL-4711]) and will
likely need to change anyway. This also resolves [LWG4156].
An alternative approach could involve communicating the encoding from
. However, this introduces ABI challenges and complicates usage
compared to adopting a single encoding.
5. Wording
Add to "Header <system_error> synopsis" [system.error.syn]:
// [system.error.fmt], formatter template < class charT > struct formatter < error_code , charT > ;
Add a new section "Formatting" [system.error.fmt] under "Class
" [syserr.errcode]:
template < class charT > struct formatter < error_code , charT > { constexpr typename basic_format_parse_context < charT >:: iterator parse ( basic_format_parse_context < charT >& ctx ); template < class FormatContext > typename FormatContext :: iterator format ( const error_code & ec , FormatContext & ctx ) const ; };
constexpr typename basic_format_parse_context < charT >:: iterator parse ( basic_format_parse_context < charT >& ctx );
Effects: Parses the format specifier as a error-code-format-spec and stores the
parsed specifiers in
.
error-code-format-spec:
fill-and-alignopt widthopt
opt
where the productions fill-and-align and width are described in [format.string].
Returns: An iterator past the end of the error-code-format-spec.
template < class FormatContext > typename FormatContext :: iterator format ( const error_code & ec , FormatContext & ctx ) const ;
Effects: If the
option is used, then:
-
If the ordinary literal encoding is UTF-8, then let
bemsg
transcoded to UTF-8 with maximal subparts of ill-formed subsequences substituted with U+FFFD REPLACEMENT CHARACTER per the Unicode Standard, Chapter 3.9 U+FFFD Substitution in Conversion.ec . message () -
Otherwise, let
bemsg
transcoded to an implementation-defined encoding.ec . message ()
Otherwise, let
be
.
Writes
into
, adjusted according to the error-code-format-spec.
Returns: An iterator past the end of the output range.
Modify [syserr.errcat.virtuals]:
virtual const char * name () const noexcept = 0 ;
Returns: A string in the ordinary literal encoding naming the error category.
...
virtual string message ( int ev ) const = 0 ;
Returns: A string
of multibyte characters in the executon character
set
that describes the error condition denoted by
.
6. Implementation
The proposed
for
has been implemented in the
open-source {fmt} library ([FMT]).