1. Changelog
1.1. Changes since R0
-
Add a study on possible impact
-
Update the code example in Motivation
-
Add note about
in Cchar8_t -
Add note about [P0487R1]
2. Motivation
#include <iostream>#include <format>int main () { // Prints: std :: cout << static_cast < char > ( 48 ) << '\n' // 0 << static_cast < signed char > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < unsigned char > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < int8_t > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < uint8_t > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < short > ( 48 ) << '\n' // 48 << std :: format ( "{} \n " , static_cast < char > ( 48 )) // 0 << std :: format ( "{} \n " , static_cast < signed char > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < unsigned char > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < int8_t > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < uint8_t > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < short > ( 48 )); // 48 }
There are overloads for
for
,
that take an
, and a
.
In addition, there are overloads for
for
,
that take an
and an
.
These overloads are specified to have equivalent behavior to
the non-signedness qualified overloads: [istream.extractors] [ostream.inserters.character].
This is surprising. Per [basic.fundamental] p1 and p2:
There are five standard signed integer types: "
", "
signed char ", "
short int ", "
int ", and "
long int "... There may also be implementation-defined extended signed integer types. The standard and extended signed integer types are collectively called signed integer types.
long long int For each of the standard signed integer types, there exists a corresponding (but different) standard unsigned integer type: "
", "
unsigned char ", "
unsigned short int ", "
unsigned int ", and "
unsigned long int "... Likewise, for each of the extended signed integer types, there exists a corresponding extended unsigned integer types. The standard and extended unsigned integer types are collectively called unsigned integer types.
unsigned long long int
Thus,
and
should be treated as integers, not as characters.
This is highlighted by the fact, that
and
are specified to be aliases to (un)signed integer types,
which are in practice going to be
and
.
Note: The Solaris implementation is different, and defines
to be
by default.
This is not conformant.
and
are not character types.
Per [basic.fundamental] p11, since [P2314R4]:
The types
,
char ,
wchar_t ,
char8_t , and
char16_t are collectively called character types.
char32_t
and
are included in the set of ordinary character types and narrow character types ([basic.fundamental] p7),
but these definitions are used for specifying alignment, padding, and indeterminate values ([basic.indet]),
and are arguably not related to characters in the sense of pieces of text.
has already taken a step in the right direction here,
by treating
and
as integers.
It’s specified to not give special treatment to these types,
but to use the standard definitions of (un)signed integer type
to determine whether a type is to be treated as an integer when formatting.
This paper proposes that these overloads in iostreams should be deprecated.
3. Impact
It’s difficult to find examples where this is the sought-after behavior, and would become deprecated with this change. These snippets aren’t easily greppable.
It’s easy to find counter-examples, however, where workarounds have to be employed to insert or extract
s or
s
as integers. Some of them can be found with isocpp.org codesearch by searching for
or
, although false positives there are very prevalent.
/* ... */ << static_cast < int > ( my_schar );
These overloads have existed since C++98.
The signature of
for
was updated for C++20 in [P0487R1],
where these functions were changed to take
instead of
, for safety reasons.
No other changes to these overloads have been made in standard C++.
// Changes in P0487, applied to C++20 template < class charT , class traits , size_t N > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT * charT ( & )[ N ] ); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char * unsigned char ( & )[ N ] ); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char * signed char ( & )[ N ] );
It should be noted, that the C standard has defined
to be an alias (typedef) to
.
In C++,
is a distinct type with an underlying type of
.
3.1. Impact study
To gauge the potential impact of this deprecation, the author tried building open source C++ code bases, using a patched version of libc++. Below are the instances where the overloads proposed for deprecation were used in these builds.
For reference, the author built tensorflow-lite and Tenzir using a custom version of libc++ where these overloads
were marked as
d. These code bases number ~1½ MLoC in total, with a large number of dependencies,
are reasonably modern, and use iostreams.
3.1.1. Abseil
The Abseil logging library seems to
treat
and
as character types.
This is likely because the syntax used by the library is very similar to that
used by iostreams:
signed char my_schar = 65 ; LOG ( ERROR ) << my_schar ; // Will output: // E0520 13:49:47.968463 123694 absl_log.cpp:8] A // where the message itself is the 'A' here -----^
Internally in the library, this is achieved with this overload set:
// Abseil, version 20230802.1: // absl/log/internal/check_op.cc void MakeCheckOpValueString ( std :: ostream & os , const char v ) { if ( v >= 32 && v <= 128 ) { os << "'" << v << "'" ; } else { os << "char value " << int { v }; } } void MakeCheckOpValueString ( std :: ostream & os , const signed char v ) { if ( v >= 32 && v <= 128 ) { os << "'" << v << "'" ; } else { os << "signed char value " << int { v }; } } void MakeCheckOpValueString ( std :: ostream & os , const unsigned char v ) { if ( v >= 32 && v <= 128 ) { os << "'" << v << "'" ; } else { os << "unsigned char value " << int { v }; } }
where
and
are explicitly and intentionally
treated similarly to
, and are passed to an underlying
.
Notably, the values between
and
are really treated as character values,
as they are printed with '
, and are cast to integers otherwise.
3.1.2. FlatBuffers
In the implementation of
(the FlatBuffers schema compiler), there’s the following
function template:
// Flatbuffers, version 23.5.26: // src/annotated_binary_text_gen.cpp template < typename T > std :: string ToString ( T value ) { if ( std :: is_floating_point < T >:: value ) { std :: stringstream ss ; ss << value ; return ss . value (); } else { return std :: to_string ( value ); } }
where the proposed-for-deprecation overload of
is instantiated,
if
is
or
. The overloads are never actually called,
but because the above code is using
instead of
, the compiler
warns about usage, anyway.
The current behavior when using
or
is to use
,
which formats the value as an integer, as the overload
is picked
in overload resolution.
3.1.3. simdjson
The following piece of code is present in the implementation of simdjson:
// simdjson, version 3.9.1: // include/simdjson/dom/document-inl.h inline bool document::dump_raw_tape ( std :: ostream & os ) const noexcept { uint32_t string_length ; size_t tape_idx = 0 ; uint64_t tape_val = tape [ tape_idx ]; uint8_t type = uint8_t ( tape_val >> 56 ); os << tape_idx << " : " << type ; // ... os << tape_idx << " : " << type << " \t // pointing to " << // ... if ( type == 'r' ) // ... switch ( type ) { case '"' : // ... case 'l' : // ... } }
This member function is apparently intended to be used for debugging.
The
referenced is a library-internal representation of a parsed JSON document.
Above,
has the type of
, but is clearly treated as a character type.
Its value is compared to character literals, and thus, when written to a
,
is intended to be formatted as a character. The proposed deprecation would break this.
3.1.4. yaml-cpp
In yaml-cpp, the following piece of code can be found,
where the
overload is called with
and
:
// yaml-cpp, version 0.8.0: // include/yaml-cpp/node/convert.h // Used with T=signed char and T=unsigned char template < typename T > typename std :: enable_if <! std :: is_floating_point < T >:: value , void >:: type inner_encode ( const T & rhs , std :: stringstream & stream ){ stream << rhs ; }
This function template is instantiated and called when writing to an existing YAML document:
signed char my_schar = 65 ; unsigned char my_uchar = 65 ; auto node = YAML :: Load ( "{schar: 0, uchar: 0}" ); node [ "schar" ] = my_schar ; node [ "uchar" ] = my_uchar ; std :: cout << node ; // Outputs: {schar: A, uchar: A}
It’s unclear whether treating
as a character type here is the desired behavior,
or simply an oversight caused by the usage of
.
Elsewhere in the library,
is treated unambiguously as an integer,
whereas
is treated as a character:
signed char my_schar = 65 ; unsigned char my_uchar = 65 ; YAML :: Emitter out ; out << YAML :: BeginMap << YAML :: Key << "schar" << YAML :: Value << my_schar << YAML :: Key << "uchar" << YAML :: Value << my_uchar << YAML :: EndMap ; std :: cout << out . c_str (); // Outputs: // schar: 65 // uchar: A
There are two long-standing issues against yaml-cpp to inquire about this inconsistency, without a resolution before the mailing deadline.
3.1.5. Conclusion
Only four instances of use were found during this study, which is not a lot.
Notably, only uses of
taking a
or
were found.
No uses of the array-version of
or any of the
overloads were identified.
In these four cases:
-
One (probably) contained a bug, which could have been identified with the deprecation proposed here § 3.1.4 yaml-cpp
-
One was essentially a false-positive, where the deprecated overloads were never called, only instantiated § 3.1.2 FlatBuffers
-
Two were cases were the deprecated behavior was the desired one § 3.1.1 Abseil § 3.1.3 simdjson
So, the use in the wild for these overloads seems to be quite limited.
In some cases, the current behavior is asked for, but it’s difficult to ascertain whether the developers writing that code
initially got tripped up by this behavior.
With this deprecation, at least a single possible bug was identified, and it’s possible even more could be found,
once developers are forced to check their usages as their compilers start warning them.
If anything, forcing users to cast to
/
/
could be argued to be an increase in readability,
in favor of relying on the current behavior with
and
.
4. Wording
This wording is relative to [N4971].
4.1. Modify [istream.general] p1
// ... // [istream.extractors], character extraction templates template < class charT , class traits > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT & ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char & ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char & ); template < class charT , class traits , size_t N > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT ( & )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char ( & )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char ( & )[ N ]);
4.2. Modify [istream.extractors], around p7 to p12
template < class charT , class traits , size_t N > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT ( & )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char ( & )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char ( & )[ N ]);
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of
.
After a sentry object is constructed,
extracts characters and stores them into
.
If
is greater than zero,
is
.
Otherwise
is
.
is the maximum number of characters stored.
Characters are extracted and stored until any of the following occurs:
-
characters are stored;n -1 -
end of file occurs on the input sequence;
-
letting
bect
,use_facet < ctype < charT >> ( in . getloc ())
isct . is ( ct . space , c ) true
.
then stores a null byte (
) in the next position, which may be the first position if no characters were extracted.
then calls
.
If the function extracted no characters,
is set in the input function’s local error state before
is called.
Returns:
.
template < class charT , class traits > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT & ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char & ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char & );
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of
.
A character is extracted from
, if one is available, and stored in
.
Otherwise,
is set in the input function’s local error state before
is called.
Returns:
.
4.3. Modify [ostream.general] p1
// ... // [ostream.inserters.character], character inserters template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , charT ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , signed char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , unsigned char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , wchar_t ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char8_t ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char16_t ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char32_t ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , char8_t ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , char16_t ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , char32_t ) = delete ; template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , const charT * ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , const char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const signed char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const unsigned char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const wchar_t * ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char8_t * ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char16_t * ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char32_t * ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , const char8_t * ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , const char16_t * ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , const char32_t * ) = delete ; // ...
4.4. Modify [ostream.inserters.character]
template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , charT ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , char ); // specialization template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char ); // signed and unsigned template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , signed char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , unsigned char );
Effects: Behaves as a formatted output function of
.
Constructs a character sequence
.
If
has type
and the character container type of the stream is not
,
then
consists of
; otherwise
consists of
.
Determines padding for
as described in [ostream.formatted.reqmts].
Inserts
into
.
Calls
.
Returns:
.
template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , const charT * ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , const char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const signed char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const unsigned char * );
Preconditions:
is not a null pointer.
Effects: Behaves like a formatted inserter (as described in [ostream.formatted.reqmts]) of
.
Creates a character sequence
of
characters starting at
, each widened using
([basic.ios.members]),
where
is the number that would be computed as if by:
-
for the overload where the first argument is of typetraits :: length ( s )
and the second is of typebasic_ostream < charT , traits >&
, and also for the overload where the first argument is of typeconst charT *
and the second is of typebasic_ostream < char , traits >&
,const char * -
for the overload where the first argument is of typechar_traits < char > :: length ( s )
and the second is of typebasic_ostream < charT , traits >&
.const char * , -
for the other two overloads.traits :: length ( reinterpret_cast < const char *> ( s ))
Determines padding for
as described in [ostream.formatted.reqmts].
Inserts
into
.
Calls
.
Returns:
.
4.5. Add a new subclause in Annex D after [depr.atomics]
Deprecated
and
extraction [depr.istream.extractors]
The following function overloads are declared in addition to those specified in [istream.extractors]:
template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , unsigned char & c ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , signed char & c );
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of
.
A character is extracted from
, if one is available, and stored in
.
Otherwise,
is set in the input function’s local error state before
is called.
Returns:
.
template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , unsigned char ( & )[ N ] s ); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , signed char ( & )[ N ] s );
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of
.
After a sentry object is constructed,
extracts characters and stores them into
.
If
is greater than zero,
is
.
Otherwise
is
.
is the maximum number of characters stored.
Characters are extracted and stored until any of the following occurs:
-
characters are stored;n -1 -
end of file occurs on the input sequence;
-
letting
bect
,use_facet < ctype < charT >> ( in . getloc ())
isct . is ( ct . space , c ) true
.
then stores a null byte (
) in the next position, which may be the first position if no characters were extracted.
then calls
.
If the function extracted no characters,
is set in the input function’s local error state before
is called.
Returns:
.
4.6. Add a new subclause in Annex D after the above ([depr.istream.extractors])
Deprecated
and
insertion [depr.ostream.inserters]
The following function overloads are declared in addition to those specified in [ostream.inserters]:
template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , signed char c ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , unsigned char c );
Effects: Equivalent to:
.
template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , const signed char * s ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , const unsigned char * s );
Effects: Equivalent to:
.