Doc: N1985==06-055
Date: 2006-04-06
Author: Jack W. Reeves
jack.reeves@bleading-edge.com
Request the Standard Provide Explicit
Specialization of char_traits For All Built-in Character Types
Changes to Section 21.1 [lib.char.traits]
The Standard does not require explicit specializations of std::char_traits<> for any type other than ‘char’ and ‘wchar_t’. This can lead to some unexpected and undesirable behavior.
Consider the following:
std::basic_string<unsigned char>
buffer;
This is shorthand for
std::basic_string<unsigned char,
std::char_traits<unsigned char>, std::allocator<unsigned char> >
buffer;
This yields undefined behavior since the standard only provides a declaration for std::char_traits<> and does not require an explicit specialization of std::char_traits<unsigned char>.
Naturally, undefined behavior can mean anything, but several possible results are common.
Since the presence of a correct specialization of std::char_traits<unsigned char> is currently an implementation extension, any client attempting to use std::basic_string<unsigned char> in a portable manner will have to provide their own character traits class. Furthermore, they can not provide this as an explicit specialization of std::char_traits<>. The Standard explicitly states that explicit specialization of templates defined in namespace std for anything other than user defined types yields undefined behavior. As a practical matter, such an explicit specialization might work on platform (a) and (b) above, but would clearly clash with the implementation defined extension on platform (c). Furthermore, such an approach causes problems if multiple libraries try it even on platforms (a) and (b).
Most users want the declaration of a std::basic_string<unsigned char> to work. The typical user simply expects it to, and the sophisticated user who understands why it doesn’t will still be annoyed with the alternative. After all, the Standard uses std::basic_string<char> so why shouldn’t it work the same for unsigned char?
A typical reaction to the fact that std::basic_string<unsigned char> will not work portably is to suggest std::vector<unsigned char>. While this ordinarily will solve an immediate problem, the use of vector instead of basic_string for specializations of built-in types has potential performance hits which make it undesirable for reusable library code. Vector is required to work with any type. As a result, internal copying of vector’s storage must involve loops which invoke copy constructors and destructors. While these operations are NOOPs for built-in types, the loops will still exist and it becomes a “quality of implementation” issue whether they will be optimized away. This is clearly a valid concern, or the Standard would not require the copy(), move(), and assign() operations to be provided by any valid character traits class.
Suggested resolution: the Standard should require that explicit specializations of std::char_traits<> be provided for all built-in character types.
Proposed resolution:
Change 21.1/1 as follows:
This subclause defines requirements on classes representing
character
traits, and defines declares
a class template char_traits<charT>, along with two four
specializations, char_traits<char>, and char_traits<wchar_t>, char_traits<unsigned
char>, and char_traits<signed
char>,
that satisfy those requirements.
Change 21.1/4 as follows:
This subclause specifies a struct template, char_traits<charT>, and two four explicit
specializations of it, char_traits<char>, and
char_traits<wchar_t>, char_traits<unsigned
char>, and char_traits<signed
char>, all
of which appear in the header <string> and satisfy the requirements below.
Change 21.1.3 to read:
namespace
std {
template <> struct char_traits
<char >;
template <> struct char_traits <
wchar_t >;
template <> struct char_traits<unsigned
char>;
template <> struct
char_traits<signed char>;
}
Change
21.1.3/1 as follows:
The header <string> declares two four
structs that are specializations of the template struct char_traits.
Add
21.1.3.3 struct char_traits<unsigned
char>
namespace
std {
template <>
struct char_traits <unsigned char > {
typedef unsigned char char_type ;
typedef int int_type ;
typedef streamoff off_type ;
typedef streampos pos_type ;
typedef mbstate_t state_type ;
static void assign ( char_type & c1 ,
const char_type & c2 );
static bool eq( const char_type & c1 ,
const char_type & c2 );
static bool lt( const char_type & c1 ,
const char_type & c2 );
static int compare ( const char_type * s1 ,
const char_type * s2 , size_t n);
static size_t length ( const char_type *
s);
static const char_type * find ( const
char_type * s , size_t n,
const char_type & a);
static char_type * move ( char_type * s1 ,
const char_type * s2 , size_t n);
static
char_type * copy ( char_type * s1 , const char_type * s2 , size_t n);
static char_type * assign ( char_type * s ,
size_t n , char_type a);
static int_type not_eof ( const int_type
& c);
static char_type to_char_type ( const
int_type & c);
static int_type to_int_type ( const
char_type & c);
static bool eq_int_type ( const int_type
& c1 , const int_type & c2 );
static int_type eof ();
};
}
The header <string> (21.2) declares a
specialization of the template struct char_traits for unsigned char. It is for narroworiented iostream classes.
The defined types for int_type, pos_type, off_type, and state_type are int, streampos, streamoff, and mbstate_t respectively.
The type streampos is an implementation-defined type that
satisfies the requirements for POS_T in 21.1.2.
The type streamoff is an implementation-defined type that
satisfies the requirements for OFF_T in 21.1.2.
The type mbstate_t is defined in <cwchar>
and can represent any of the conversion states
possible to occur in an implementation-defined set of supported multibyte
character encoding rules.
The two-argument member assign is
defined identically to the built-in operator =. The two-argument members eq and lt are defined identically to
the built-in operators == and < for type unsigned char.
The member eof() returns EOF.
Add
21.1.3.4 struct char_traits<signed
char>
namespace
std {
template <>
struct char_traits <signed char > {
typedef signed char char_type ;
typedef int int_type ;
typedef streamoff off_type ;
typedef streampos pos_type ;
typedef mbstate_t state_type ;
static void assign ( char_type & c1 ,
const char_type & c2 );
static bool eq( const char_type & c1 ,
const char_type & c2 );
static bool lt( const char_type & c1 ,
const char_type & c2 );
static int compare ( const char_type * s1 ,
const char_type * s2 , size_t n);
static size_t length ( const char_type *
s);
static const char_type * find ( const
char_type * s , size_t n,
const char_type & a);
static char_type * move ( char_type * s1 ,
const char_type * s2 , size_t n);
static
char_type * copy ( char_type * s1 , const char_type * s2 , size_t n);
static char_type * assign ( char_type * s ,
size_t n , char_type a);
static int_type not_eof ( const int_type
& c);
static char_type to_char_type ( const
int_type & c);
static int_type to_int_type ( const
char_type & c);
static bool eq_int_type ( const int_type
& c1 , const int_type & c2 );
static int_type eof ();
};
}
The header <string> (21.2) declares a
specialization of the template struct char_traits for signed char. It is for narroworiented iostream classes.
The defined types for int_type, pos_type, off_type, and state_type are int, streampos, streamoff, and mbstate_t respectively.
The type streampos is an implementation-defined type that
satisfies the requirements for POS_T in 21.1.2.
The type streamoff is an implementation-defined type that
satisfies the requirements for OFF_T in 21.1.2.
The type mbstate_t is defined in <cwchar>
and can represent any of the conversion states
possible to occur in an implementation-defined set of supported multibyte
character encoding rules.
The two-argument member assign is
defined identically to the built-in operator =. The two-argument members eq and lt are defined identically to
the built-in operators == and < for type unsigned char.
The member eof() returns EOF.