Document number: | N3654 |
Date: | 2013-04-19 |
Project: | Programming Language C++ |
Reply-to: | Beman Dawes <bdawes at acm dot org> |
Character strings enclosed in quotation marks are an element of numerous common data formats (e.g. XML, CSV), yet C++ standard library stream I/O offers no direct support. Furthermore, standard library stream I/O has a problem with embedded spaces in strings that can trip the unwary. The proposed solution provides direct support for quoted strings, avoids embedded spaces problems and is more efficient than likely user provided solutions.
The proposal is suitable for either C++1y or a standard library Technical Specification (TS). It is a pure addition that will break no existing standard-conforming user code. It is based on a Boost component that has been shipping for several years. The declarations for the proposed functions can go in a new header or an existing header.
The proposed wording below assumes the target is C++1y and places the function declarations in <iomanip>.
C++ standard library stream I/O for strings that contain embedded spaces can produce unexpected results. For example,
std::stringstream ss; std::string original = "foolish me"; std::string round_trip; ss << original; ss >> round_trip; std::cout << original; // outputs: foolish me std::cout << round_trip; // outputs: foolish assert(original == round_trip); // assert will fire
The proposed quoted
stream I/O manipulator places delimiters, defaulted
to
double-quote ("
), around strings on output, and strips off
the delimiters on input. This ensures strings with embedded white space round-trip as
desired. For example,
std::stringstream ss; std::string original = "foolish me"; std::string round_trip; ss << quoted(original); ss >> quoted(round_trip); std::cout << original; // outputs: foolish me std::cout << round_trip; // outputs: foolish me assert(original == round_trip); // assert will not fire
If the string contains the delimiter character, on output that character will
be preceded by an escape character, default to backslash (\
), as will the escape character itself:
std::cout << quoted("She said \"Hi!\""); // outputs: "She said \"Hi!\""
N3654 - Revision 2 (post-Bristol mailing)
const basic_string&
and const char*
quoted functions in synopsis per Bristol LWG.charT delim='"'
to charT delim=charT('"')
per Bristol LWG.escape
. This wasn't asked for, but seems
inconsistent to change for delim
without also changing for
escape
.Removed unneeded template parameters traits
and
Allocator
from the const char*
signature per
Bristol LWG.
const
from extractor signature per Bristol
LWG.In two Returns:, clarify as indicated below that the
stream's char_type
and the charT
template
parameter must be the same type, as requested by the Bristol LWG.
No change has been made in response the Bristol LWG query "should the operator== actually be the eq from some character traits?" It seems harmless and very clear to leave the operator== spec wording unchanged. But the LWG has the most expertise to answer the question; I'll defer to them if they want a change.
Remove unnecessary std::
in several places.
(Daniel
Krügler)
Add "member type" in two places for improved clarity. (Daniel Krügler)
Revert a clarity "fix" and add comment in two places indicating the awkward wording comes from the current working paper. (Daniel Krügler)
Fix the spec for missing case of insertion via the T13 overload. (Nice catch from Alisdair Meredith)
Fix !in
and in >> s
typos. (Jonathan Wakely)
N3570 - Revision 1 (pre-Bristol mailing)
N3431 Initial paper (pre-Portland mailing)
Gray shaded italic text is commentary, and not to be added to the working paper.
Change 27.7.1 Overview [iostream.format.overview], "Header <iomanip> synopsis" as indicated:
namespace std { // types T1, T2, ... are unspecified implementation types T1 resetiosflags(ios_base::fmtflags mask); T2 setiosflags (ios_base::fmtflags mask); T3 setbase(int base); template<charT> T4 setfill(charT c); T5 setprecision(int n); T6 setw(int n); template <class moneyT> T7 get_money(moneyT& mon, bool intl = false); template <class moneyT> T8 put_money(const moneyT& mon, bool intl = false); template <class charT> T9 get_time(struct tm* tmb, const charT* fmt); template <class charT> T10 put_time(const struct tm* tmb, const charT* fmt); template <class charT> T11 quoted(const charT* s, charT delim=charT('"'), charT escape=charT('\\')); template <class charT, class traits, class Allocator> T12 quoted(const basic_string<charT, traits, Allocator>& s, charT delim=charT('"'), charT escape=charT('\\')); template <class charT, class traits, class Allocator> T13 quoted(basic_string<charT, traits, Allocator>& s, charT delim=charT('"'), charT escape=charT('\\')); }
After 27.7.5 Extended manipulators [ext.manip], add a new sub-section:
27.7.6 Quoted manipulators [quoted.manip]
[Note: Quoted manipulators provide string insertion and extraction of quoted strings (for example, XML and CSV formats). Quoted manipulators are useful in ensuring that the content of a string with embedded spaces remains unchanged if inserted and then extracted via stream I/O. --end note]
template <class charT> unspecified quoted(const charT* s, charT delim=charT('"'), charT escape=charT('\\')); template <class charT, class traits, class Allocator> unspecified quoted(const basic_string<charT, traits, Allocator>& s, charT delim=charT('"'), charT escape=charT('\\'));Returns: An object of unspecified type such that if
out
is an instance ofbasic_ostream
with member typechar_type
the same ascharT
,
<charT, traits>,then the expressions
is an instance of a type convertible tobasic_string<charT, traits, Allocator>
, orconst char*
, respectively, anddelim
andescape
are instances ofcharT
,out << quoted(s, delim, escape)
behaves as if it inserts the following characters intoout
using character inserter function templates ([ostream.inserters.character]), which may throwios_base::failure
([ios::failure]):
delim
.- Each character in
s
. If the character to be output is equal toescape
ordelim
, as determined byoperator==
, first outputescape
.delim
.The expression
out << quoted(s, delim, escape)
shall have typebasic_ostream<charT, traits>&
and valueout
. This wording is the form for such statements in the current wording paper [std.manip].template <class charT, class traits, class Allocator> unspecified quoted(basic_string<charT, traits, Allocator>& s, charT delim=charT('"'), charT escape=charT('\\'));Returns: An object of unspecified type such that:
- If
in
is an instance ofbasic_istream
with member typechar_type
the same ascharT
<charT, traits>,, then the expressions
is an instance ofbasic_string<charT, traits, Allocator>
, anddelim
andescape
are instances of typescharT
in >> quoted(s, delim, escape)
behaves as if it extracts the following characters fromin
using basic_istream::operator>> ([istream::extractors]) which may throwios_base::failure
([ios::failure]):
- If the first character extracted is equal to
delim
, as determined byoperator==
, then:
- Turn off the
skipws
flag.s.clear()
Until an unescaped
delim
character is reached or!in
, extract characters fromin
and append them tos
, except that if anescape
is reached, ignore it and append the next character tos
.- Discard the final
delim
character.- Restore the
skipws
flag to its original value.- Otherwise,
in >> s
.
- If
out
is an instance ofbasic_ostream
with member typechar_type
the same ascharT
, then the expressionout << quoted(s, delim, escape)
behaves as specified for theconst basic_string<charT, traits, Allocator>&
overload of thequoted
function.The expression
in >> quoted(s, delim, escape)
shall have typebasic_istream<charT, traits>&
and valuein
. The expressionout << quoted(s, delim, escape)
shall have typebasic_ostream<charT, traits>&
and valueout
. This wording is the form for such statements in the current wording paper [std.manip].
The quoted()
stream manipulators emerged from discussions on the
Boost developers mailing list. Participants included Beman Dawes, Rob Stewart,
Alexander Lamaison, Eric Niebler, Vicente Botet, Andrey Semashev, Phil Richards,
and Rob Murray. Eric Niebler's suggestions provided the basis for the name and
form of the templates. Thanks to the LWG in Portland and Bristol for additional
improvements.