Doc. no. WG21/N1973=06-0043
Date:
2006-04-10
Project: Programming Language C++
Reply to: Kevlin Henney <kevlin@curbralan.com>
Beman Dawes <bdawes@acm.org>
Introduction
Motivation and Scope
Impact on the Standard
Important Design Decisions
Proposed Text for TR2
Synopsis
Function template lexical_cast
Class bad_lexical_cast
This paper proposes addition of a lexical conversion library component to the
C++ Standard Library Technical Report 2. The proposal is based on the Boost
Conversion Library's lexical_cast
(www.boost.org/libs/conversion/lexical_cast.htm).
The lexical_cast
function template offers a convenient and
consistent form for supporting common conversions to and from arbitrary types
when they are represented as text. The Boost version of lexical_cast
is very widely used. It would be a pure addition to the C++ standard.
Boost lexical_cast
is particularly popular with end users. Five
of six
Who's using Boost in house users list lexical_cast
as
one of the Boost libraries they use.
For a good discussion of the options and issues involved in string-based
formatting, including comparison of stringstream
,
lexical_cast
, and others, see Herb Sutter's article,
The String Formatters of
Manor Farm.
Also see Björn Karlsson, Beyond the C++ Standard Library, 73-77, Addison Wesley, ISBN 0-321-13354-4, www.awprofessional.com/title/0321133544
Why is this important?
Sometimes a value must be converted to a literal text form, such as an
int
represented as a string
, or vice-versa, when a
string
is interpreted as an int
. Such examples are common
when converting between data types internal to a program and representation
external to a program, such as windows and configuration files.
The standard C and C++ libraries offer a number of facilities for performing such conversions. However, they vary with their ease of use, extensibility, and safety.
For instance, there are a number of limitations with the family of standard C
functions typified by atoi
:
sprintf
function, or
the loss of portability associated with non-standard functions such as
itoa
. int
, long
, and double
.
complex
or
rational
. The standard C functions typified by strtol
have the same basic
limitations, but offer finer control over the conversion process. However, for
the common case such control is often either not required or not used. The
scanf
family of functions offer even greater control, but also lack
safety and ease of use.
The standard C++ library offers stringstream
for the kind of
in-core formatting being discussed. It offers a great deal of control over the
formatting and conversion of I/O to and from arbitrary types through text.
However, for simple conversions direct use of stringstream
can be
either clumsy (with the introduction of extra local variables and the loss of
infix-expression convenience) or obscure (where stringstream
objects are created as temporary objects in an expression). Facets provide a
comprehensive concept and facility for controlling textual representation, but
their perceived complexity and high entry level requires an extreme degree of
involvement for simple conversions, and excludes all but a few programmers.
The lexical_cast
function template offers a convenient and
consistent form for supporting common conversions to and from arbitrary types
when they are represented as text. The simplification it offers is in
expression-level convenience for such conversions. For more involved
conversions, such as where precision or formatting need tighter control than is
offered by the default behavior of lexical_cast
, the conventional
stringstream
approach is recommended. Where the conversions are
numeric to numeric, other approaches may offer more reasonable behavior than
lexical_cast
.
What kinds of problems does it address, and what kinds of programmers is it intended to support?
The library addresses everyday needs, for both application programs and libraries. It is useful across many application domains. It is useful to all levels of programmers, from rank beginners to seasoned experts.
Is it based on existing practice? Is there a reference implementation?
Yes, very much so. It has been a mainstay of Boost for many years.
What does it depend on, and what depends on it?
It depends on some standard library components. No other proposals depend on it.
Is it a pure extension, or does it require changes to standard components?
It is a pure extension.
Can it be implemented using today's compilers, or does it require language features that will only be available as part of C++0x?
It can be (and has been) implemented with current compilers, and also many older compilers.
Why is the << plus >> analogy broken for the std::string
special case?
The default asymmetric behavior of I/O for strings is often a cause for
surprise amongst novices and, when wrapped inside
lexical_cast
, experts as well. Converting from a string and back again is
expected to be an identity operation, which is what is now supported. This
expectation is important, and the response is to make the behavior consistent
with the intent of the conversion rather than its underlying implementation.
Over time,
lexical_cast
has become more symmetric with respect to its conversions.
There is also a little bit of handling to ensure that numeric types do not lose
precision. Again, the I/O stream defaults are not what many people would expect.
And then there is special support for wchar_t<->char conversions, because again
I/O streams don't quite do the right thing. We are not in a position to change
I/O streams at this late stage, but something like
lexical_cast
is not required to repeat those little surprises.
Before these changes, Boost regularly received complaints and bug reports
about
lexical_cast
behavior. Once the changes were made, complaints and bug
reports stopped.
I don't like the name. Why don't you change it?
Suggestions always welcome. However, until something better comes along, the
proposal authors don't believe that there is sufficient reason to change from
lexical_cast
, which is very well established, used in books and other
teaching material, and does not seem to cause confusion among real users.
Since either the source or target are usually strings, why not provide separate to_string(x) and string_to<t>(x) functions?
The source or target isn't always a string. Furthermore, the from/to idea
cannot be expressed in a simple and consistent form. The illusion is that they
are easier than
lexical_cast
because of the name. This is theory. The practice is that
the two forms, although similarly and symmetrically named, are not at all
similar in use: one requires explicit provision of a template parameter and the
other not. This is a simple usability pitfall that is guaranteed to catch
experienced and inexperienced users alike -- the only difference being that the
experienced user will know what to do with the error message.
lexical_cast
used the default stream
precision for reading and writing floating-point numbers. For numerics that
have a corresponding specialization of std::numeric_limits
,
recent Boost versions and the proposal choose a precision to match. lexical_cast
did not support
conversion to or from any wide-character-based types. Recent Boost versions
and the proposal support conversions from wchar_t
, wchar_t
*
, and std::wstring
and to wchar_t
and
std::wstring
. lexical_cast
assumed that the
conventional stream extractor operators were sufficient for reading values.
However, string I/O is asymmetric, with the result that spaces play the role
of I/O separators rather than string content. Recent Boost versions and the
proposal fix this error for std::string
and so std::wstring
:
lexical_cast<std::string>("Hello, World")
succeeds instead of
failing with a bad_lexical_cast
exception. lexical_cast
allowed unsafe and
meaningless conversions to pointers. Recent Boost versions and the proposal
throw bad_lexical_cast
for conversions to pointers:
lexical_cast<char *>("Goodbye, World")
throws an exception instead of
causing undefined behavior. Text in gray is commentary and not part of the proposed text.
Choice of a new or existing header is deferred pending outcome of other conversion related proposals.
namespace std { namespace tr2 { class bad_lexical_cast; template<typename Target, typename Source> Target lexical_cast(const Source& arg); } }
lexical_cast
The lexical_cast
function template supplies common conversions
to and from arbitrary types represented as text, providing expression-level
convenience for such conversions.
The requirements on the argument and result types are:
Source
is OutputStreamable, meaning that an
operator<<
is defined that takes a std::ostream
or
std::wostream
object on the left hand side and an instance of the
argument type on the right. Target
is InputStreamable, meaning that an
operator>>
is defined that takes a std::istream
or
std::wistream
object on the left hand side and an instance of the
result type on the right. Source
and Target
are CopyConstructible
[20.1.3]. Target
is DefaultConstructible, meaning that it is
possible to default-initialize an object of that type [8.5, 20.1.4]. lexical_cast
behavior is specified in terms of operator<<
and operator>>
on a std::basic_stringstream
object. Implementations are not required to actually use a std::basic_stringstream
object to achieve the required behavior. Implementations are permitted to
provide specializations of the lexical_cast
template.
[Note: Implementations may use this "as if" leeway to achieve efficiency. -- end note.]
template<typename Target, typename Source> Target lexical_cast(const Source& arg);
Effects:
- Inserts
arg
into an emptystd::basic_stringstream
object viaoperator<<
.- Extracts the result, of type
Target
, from thestd::basic_stringstream
object viaoperator>>
.Throws:
bad_lexical_cast
if:
Source
is a pointer type.fail()
for thestd::basic_stringstream
object is true after eitheroperator<<
oroperator>>
is applied.get()
for thestd::basic_stringstream
object is notstd::char_traits<char_type>::eof()
after bothoperator<<
andoperator>>
are applied.Returns: The result as created by the effects.
Remarks: If
Target
is eitherstd::string
orstd::wstring
, stream extraction takes the whole content of the string, including spaces, rather than relying on the defaultoperator>>
behavior.The character type of the underlying stream is assumed to be
char
unless either theSource
or theTarget
requires wide-character streaming, in which case the underlying stream useswchar_t
.Source
types that require wide-character streaming arewchar_t
,wchar_t *
, andstd::wstring
.Target
types that require wide-character streaming arewchar_t
andstd::wstring
.If
std::numeric_limits<Target>::is_specialized
, the underlying stream precision is set according tostd::numeric_limits<Target>::digits10
+ 1, otherwise ifstd::numeric_limits<Source>::is_specialized
, the underlying stream precision is set according tostd::numeric_limits<Source>::digits10
+ 1.[Note: Where a higher degree of control is required over conversions,
std::stringstream
andstd::wstringstream
offer a more appropriate path. Where non-stream-based conversions are required,lexical_cast
is the wrong tool for the job and is not special-cased for such scenarios. -- end note.]
bad_lexical_cast
namespace std { namespace tr2 { class bad_lexical_cast : public std::bad_cast { public: bad_lexical_cast () throw (); bad_lexical_cast ( const bad_lexical_cast &) throw (); bad_lexical_cast & operator =( const bad_lexical_cast &) throw (); virtual const char * what () const throw (); }; } }
The virtual destructor is not
shown, following the practice of 18.5.2 Class bad_cast [lib.bad.cast].
The class bad_lexical_cast
defines the type of objects thrown as
exceptions by the implementation to report runtime
lexical_cast
failure.
bad_lexical_cast () throw ();
Effects: Constructs an object of class
bad_lexical_cast
.Remarks: The result of calling
what()
on the newly constructed object is implementation-defined.
bad_lexical_cast ( const bad_lexical_cast &) throw (); bad_lexical_cast & operator =( const bad_lexical_cast &) throw ();
Effects: Copies an object of class
bad_lexical_cast
.
virtual const char * what () const throw ();
Returns: An implementation-defined NTBS.
Remarks: The message may be a null-terminated multibyte string (17.3.2.1.3.2), suitable for conversion and display as a wstring (21.2, 22.2.1.4)
© Copyright Kevlin Henney 2000-2005
© Copyright Beman Dawes 2006
Last revised: 2006-04-10