P.J. Plauger Dinkumware, Ltd. pjp@dinkumware.com 2006-04-15
Dinkumware has been marketing several character code conversion aids for a number of years, as part of a package of supplemental features we call CoreX. They have now been merged into our latest comprehensive product, the Dinkum Compleat Library. Based on the success of that package, we now feel confident in proposing two template classes from it for inclusion in a future standard C++ library.
We submitted these template classes earlier, first as N1683
and then as N1957, and met with several criticisms. Here are
our responses to the most explicit remarks about
wstring_convert
:
Elem
template parameter is redundant,
since the Codecvt
parameter specifies an element type. We've found it convenient
to keep these types separate, so that (for example) you can
use the large existing corpus of char codecvt facets with
signed/unsigned char conversions as well.state_type
(see below).
the others have their uses in user code, and generally
parallel the ones in library classes.Codecvt
object at construction time.
While this is only occasionally necessary, it's a useful
functional enhancement that we've now incorporated in our
implementation.The descriptions that follow are taken primarily from our documentation for our latest library.
wstring_convert
Template class wstring_convert
performs conversions between a
wide string and a byte string. It lets you specify a code conversion
facet (like template class codecvt
) to perform the conversions,
without affecting any streams or locales. Say, for example, you have
a code conversion facet called codecvt_utf8
that you want
to use to output to cout
a UTF-8 multibyte sequence corresponding
to a wide string, but you don't want to alter the locale for cout
.
You can write something like:
wstring_convert<codecvt_utf8<wchar_t>> myconv(); std::string mbstring = myconv.to_bytes(L"Hello\n"); cout << mbstring;
Note that the Standard C++ library currently uses code conversion facets
only within template class basic_filebuf
, for converting from
multibyte sequences when reading from a file and for converting to
multibyte sequences when writing to a file. Something like template class
wstring_convert
is needed to perform similar conversions
between string objects, without involving file I/O.
namespace std { template<class Codecvt, class Elem = wchar_t> class wstring_convert { typedef std::basic_string<char> byte_string; typedef std::basic_string<Elem> wide_string; typedef typename Codecvt::state_type state_type; typedef typename wide_string::traits_type::int_type int_type; wstring_convert(Codecvt *pcvt = new Codecvt); wstring_convert(Codecvt *pcvt, state_type state); wstring_convert(const byte_string& byte_err, const wide_string& wide_err = wide_string()); wide_string from_bytes(char byte); wide_string from_bytes(const char *ptr); wide_string from_bytes(const byte_string& str); wide_string from_bytes(const char *first, const char *last); byte_string to_bytes(Elem wchar); byte_string to_bytes(const _Elem *wptr); byte_string to_bytes(const wide_string& wstr); byte_string to_bytes(const Elem *first, const Elem *last); size_t converted() const; state_type state() const; // exposition only private: byte_string byte_err_string; wide_string wide_err_string; Codecvt *cvtptr; state_type cvtstate; size_t cvtcount; }; } // namespace std
The template class describes an object that controls conversions
between wide string objects of class std::basic_string<Elem>
and byte string objects of class std::basic_string<char>
(also known as std::string
). The template class defines the
types wide_string
and byte_string
as synonyms for
these two types. Conversion between a sequence
of Elem
values (stored in a wide_string
object)
and multibyte sequences (stored in a byte_string
object)
is performed by an object of class
Codecvt<Elem, char, std::mbstate_t>
,
which meets the requirements of the standard code-conversion facet
std::codecvt<Elem, char, std::mbstate_t>
.
An object of this template class stores:
byte_err_string
--
a byte string to display on errorswide_err_string
--
a wide string to display on errorscvtptr
--
a pointer to the allocated conversion object (which is freed
when the wbuffer_convert
object is destroyed)cvtstate
--
a conversion state objectcvtcount
--
a conversion countwstring_convert::byte_string
typedef std::basic_string<char> byte_string;
The type is a synonym for std::basic_string<char>
.
wstring_convert::converted
size_t converted() const;
The member function returns cvtcount
.
wstring_convert::from_bytes
wide_string from_bytes(char byte); wide_string from_bytes(const char *ptr); wide_string from_bytes(const byte_string& str); wide_string from_bytes(const char *first, const char *last);
The first member function converts the single-element sequence byte
to a wide string.
The second member function converts the nul-terminated sequence beginning
at ptr
to a wide string.
The third member function converts the sequence stored in str
to a wide string.
The fourth member function converts the sequence defined by the range
[first, last)
to a wide string.
In all cases:
cvtstate
object was not constructed with an
explicit value, it is set to its default value (the initial conversion
state) before the conversion begins. Otherwise it is left unchanged.cvtcount
.std::range_error
.wstring_convert::int_type
typedef typename wide_string::traits_type::int_type int_type;
The type is a synonym for wide_string::traits_type::int_type
.
wstring_convert::state
state_type state() const;
The member function returns cvtstate
.
wstring_convert::state_type
typedef typename Codecvt::state_type state_type;
The type is a synonym for Codecvt::state_type
.
wstring_convert::to_bytes
byte_string to_bytes(Elem wchar); byte_string to_bytes(const _Elem *wptr); byte_string to_bytes(const wide_string& wstr); byte_string to_bytes(const Elem *first, const Elem *last);
The first member function converts the single-element sequence wchar
to a byte string.
The second member function converts the nul-terminated sequence beginning
at wptr
to a byte string.
The third member function converts the sequence stored in wstr
to a byte string.
The fourth member function converts the sequence defined by the range
[first, last)
to a byte string.
In all cases:
cvtstate
object was not constructed with an
explicit value, it is set to its default value (the initial conversion
state) before the conversion begins. Otherwise it is left unchanged.cvtcount
.std::range_error
.wstring_convert::wide_string
typedef std::basic_string<Elem> wide_string;
The type is a synonym for std::basic_string<Elem>
.
wstring_convert::wstring_convert
wstring_convert(Codecvt *pcvt = new Codecvt); wstring_convert(Codecvt *pcvt, state_type state); wstring_convert(const byte_string& byte_err, const wide_string& wide_err = wide_string());
The first constructor stores pcvt
in cvtptr
and
default values in cvtstate
, byte_err_string
,
and wide_err_string
.
The second constructor stores pcvt
in cvtptr
,
state
in cvtstate
, and default values in
byte_err_string
and wide_err_string
;
moreover the stored state is retained between calls to
from_bytes
and
to_bytes
.
The third constructor stores new Codecvt
in cvtptr
,
state_type()
in cvtstate
,
byte_err
in byte_err_string
,
and wide_err
in wide_err_string
.
wbuffer_convert
Template class wbuffer_convert
looks like a wide stream
buffer, but performs all its I/O through an underlying byte stream buffer
that you specify when you construct it. Like template class
wstring_convert
, it lets you specify a code conversion
facet to perform the conversions, without affecting any streams or locales.
The previous example can also be written as:
namespace std { template<class Codecvt, class Elem = wchar_t, class Tr = std::char_traits<Elem> > class wbuffer_convert : public std::basic_streambuf<Elem, Tr> { public: typedef typename Tr::state_type state_type; wbuffer_convert(std::streambuf *bytebuf = 0, Codecvt *pcvt = new Codecvt, state_type state = state_type()); std::streambuf *rdbuf() const; std::streambuf *rdbuf(std::streambuf *bytebuf); state_type state() const; // exposition only private: std::streambuf *bufptr; Codecvt *cvtptr; state_type cvtstate; }; } // namespace std
The template class describes a stream buffer that controls the
transmission of elements of type Elem
, whose character traits
are described by the class Tr
, to and from a byte stream
buffer of type std::streambuf
. Conversion between a sequence
of Elem
values and multibyte sequences is performed by an
object of class Codecvt<Elem, char, std::mbstate_t>
,
which meets the requirements of the standard code-conversion facet
std::codecvt<Elem, char, std::mbstate_t>
.
An object of this template class stores:
bufptr
--
a pointer to its underlying byte stream buffercvtptr
--
a pointer to the allocated conversion object (which is freed
when the wbuffer_convert
object is destroyed)cvtstate
--
a conversion state objectwbuffer_convert::state
state_type state() const;
The member function returns cvtstate
.
wbuffer_convert::rdbuf
std::streambuf *rdbuf() const; std::streambuf *rdbuf(std::streambuf *bytebuf);
The first member function returns bufptr
.
The second member function stores bytebuf
in bufptr
.
wbuffer_convert::wbuffer_convert
wbuffer_convert(std::streambuf *bytebuf = 0, Codecvt *pcvt = new Codecvt, state_type state = state_type());
The constructor constructs a stream buffer object, initializes
bufptr
to bytebuf
, initializes
cvtptr
to pcvt
, and initializes
cvtstate
to state
.
wbuffer_convert::state_type
typedef typename Codecvt::state_type state_type;
The type is a synonym for Codecvt::state_type
.
Copyright © 2002-2006 by Dinkumware, Ltd. All rights reserved.