Doc. no.: N3716
Date: 2013-08-18
Project: Programming Language C++, Library Working Group
Reply-to: Zhihao Yuan <lichray at gmail dot com>

A printf-like Interface for the Streams Library (revision 1)

Changes since N3506

Overview

cout << putf("hello, %s\n", "world");

Printf defines the most widely used syntax to format a text output. It exists in C, Perl, Python and even Java™, and is available from Qt to Boost.Format[1], but not C++ standard library. This proposal tries to define such an interface based on the printf function defined by C[2] for the C++ I/O streams library, with the error handling policy and the type safety considered.

Impact on the Standard

The proposed new header <ioformat> makes no changes to the existing interface of the streams library, other than an operator<<(basic_ostream) overload to print the unspecified return value of a new std::putf function. However, the proposed formatting features are not parallel to those provided by the existing streams library. For short, the I/O manipulators can be fully replaced by the member functions of ios_base, while std::putf can not.

The additional formatting features supported by std::putf are:

Design Decisions

The idea is to define a portable and readable syntax to enable the extensible formatting of the streams library, while allowing an implementation to perform any formatting without any extra buffering comparing to the << operator.

Syntax

The syntax from printf in C is preserved as much as possible. Such an syntax is:

For example, both of the following

cout << format("The answer:%5d\n") % 42;  // boost.format
cout << putf("The answer:%5d\n", 42);     // std::experimental::putf

print

The answer:   42

The width 5 can be parameterized:

cout << putf("The answer:%*d\n", 5, 42);  // same effect

This mechanism is supported by both C and POSIX, but not Boost.Format.

POSIX[4] style positional arguments are added because they are necessary for i18n.

So the example above can be rewrote into:

cout << putf("The answer:%2$*1$d\n", 5, 42);  // same effect

The %n specification is dropped because of the security problem (and its weird semantics); no known printf fork (in Java™, Python, Boost.Format, etc.) supports it.

However, Boost.Format’s position-only specification %N% is supported. So instead of writing

cout << putf("%2$s: %1$s\n", 42, "The answer");

, user can just say

cout << putf("%2%: %1%\n", 42, "The answer");

Such a syntax has been widely used in industry, and does not break the printf-compatibility.

C++ streams style error handling policy and type safety requirements are satisfied with the highest priority. However, that makes the length modifiers (hh, h, l, ll, j, z, t, L) unneeded. The proposed solution is to ignore them, like Boost.Format and Python[3], while the only difference is that, we completely ignore all of them according to the C standard, not just a subset.

Extensibility

A subset of the printf format specification can be translated into a combination of the formatting properties (flags(), width(), precision() and fill()) of an output stream. To balance the standard compliance and the extensibility, this proposal distinguishes the arguments to be printed into:

If an argument is internally formattable by a format specification, then C’s formatting is fully supported. For example, the following

cout << putf("The answer:% -.4d\n", 42);  // empty sign, left alignment, 4 minimal digits

has the same printing result as

printf("The answer:% -.4d\n", 42);

which gives

The answer: 0042

, while Boost.Format gives

The answer: 42

without an integer precision support.

But if an argument is potentially formattable by a specification, the following

cout << putf("The answer:% -.4f\n", 42);  // expects a floating point

has the same printing result as

cout << "The answer:" << left << setprecision(4) << 42 << "\n"

which gives

The answer:42

since there is no “empty sign” support in the streams library.

A detailed description is available in Formatting.

Technical Specifications

The description below is based on POSIX[4].

std::putf takes a format string, followed by zero or more arguments. A format string is composed of zero or more directives: ordinary characters, which are copied unchanged to the output stream, and format specifications, each of which expects zero or more arguments.

An empty format specification %% matches no argument; a '%' character is printed without formatting.

A numbered format specification introduced by "%n$" matches the nth argument in the argument list, where n is a decimal integer.

A number-only format specification, "%n%", behaviors as same as a numbered format specification "%n$s", where s is a type hint described below.

An unnumbered format specification introduced by '%' matches the first unmatched argument in the argument list.

Matching an out-of-range argument in a format string results in an error described in Error handling, while the unmatched arguments are ignored. An argument can be matched multiple times by a format string of the numbered format specifications.

The character sequence "%n$" or the '%' character, introducing a format specification, has the following appear in sequence:

A field width, or precision, or both, may be indicated by a numbered parameterized length ( "*n$" ), which is allowed within a numbered format specification, or an unnumbered parameterized length ( '*' ), which is allowed within an unnumbered format specification. In such cases an argument of type streamsize supplies the field width or precision. A numbered parameterized length matches the nth argument in the argument list, where n is a decimal integer. The unnumbered parameterized lengths, in their order of appearance, match the unmatched arguments in the argument list, before the format specification they belong to. A negative field width is taken as a '-' flag followed by a positive field width. A negative precision is taken as if the precision were omitted.

A format string can contain either numbered format specifications, or unnumbered format specifications, but not both. Mixing numbered and unnumbered specifications or parameterized lengths result in an error described in Error handling. The empty format specification %% can be mixed with any specifications.

Header <ioformat>

namespace std {
namespace experimental {

  // types _Ts1_ and _Ts2_ are sets of implementation types which are distinguishable for different T...

  template <typename CharT, typename... T>
  _Ts1_ putf(CharT const *fmt, T const&... t);

  template <typename CharT, typename Traits, typename Allocator, typename... T>
  _Ts2_ putf(basic_string<CharT, Traits, Allocator> const& fmt, T const&... t);

  template <typename CharT, typename Traits, typename... T>
  auto operator<<(basic_ostream<CharT, Traits>& os, _Ts1_or_Ts2_ bundle)
      -> decltype(os);

}}

The output functions of the return values of std::putf do formatted output, but behavior like the unformatted output functions. Specifically, flags(), width(), precision() and fill() of the output stream are preserved when the flow of control leaves these functions, but may be changed during the execution. Changing the return values of these members before the execution takes no effect to the printing, except:

Error handling

An output function of a return value of std::putf may encounter the following kinds of errors found in the return value:

The output function set ios_base::failbit on the output stream when one of the errors is encountered, and then can return. The well matched format specifications, as well as the ordinary characters, if any, before the format specification that fails, must be formatted and wrote to the output stream before the function returns.

Formatting

For a given format description, the matched argument is potentially formattable if:

If not, regarding the output stream type basic_ostream<CharT, Traits>, the matched argument is internally formattable if:

[Note: An internally formattable argument has an operator<< overload, member or non-member, in the <ostream> header, and can be printted by printf without a type-unsafe conversion. This note also applys to Traits::int_type, considering its underlying type. –end note]

Otherwise, the argument is potentially formattable.

If an internally formattable argument is an unsigned integer and the type hint is d or i, the argument is printed as if it is formatted by snprintf or a wide character equivalence, which conceptually uses a default padding character of os.fill(), given the same flags, field-width, and precision, if any, respectively, followed by a fitted length modifier, if needed, and a type hint of u. Otherwise, the argument is printed as if it is formatted by snprintf or a wide character equivalence, which conceptually uses a default padding character of os.fill(), given the same flags, field-width, and precision, if any, respectively, followed by a fitted length modifier, if needed, and the same type hint. [Note: u, o, x, X convert a signed argument to unsigned, while d and i do not convert an unsigned argument to signed. –end note]

If the argument is potentially formattable, width() and precision() of the output stream are defaulted to 0 and -1, respectively. The flags() member is defaulted to os.flags() & ios_base::unitbuf, and the fill() member is defaulted to the saved fill character of the output stream before entering the current output function.

For a given format description, if the argument is potentially formattable, for the following flag characters, the argument is formatted as if the corresponding actions are taken on the output stream:

Under the same preconditions, the field-width field, if any, sets the width() member of the output stream; and the precision field, if any, sets the precision() member of the output stream. [Note: The cases of a negative field-width or precision are described in Technical Specifications. –end note]

Under the same preconditions, the type hint characters and their effects on the output stream are:

And then, the potentially formattable argument, namely t, is printed by calling os << t.

Wording

This is an initial report; a wording can be prepared after a further discussion.

Sample Implementation

A sample implementation is available at https://github.com/lichray/formatxx/tree/proposal

One known defect in this implementation is that the %a and %A format specifications ignore the precision when printing a floating point argument.

Performance notes

The additional runtime performance costs comparing with the streams library are caused by parsing the format string and creating the formatting guards (to restore the flags, precision, etc., after formatting each specifications, exception-safely). In addition, to access a positional argument numbered N, N - 1 empty recursions are required to locate the correct template instantiation.

In the sample implementation, some extra copying are involved to emulate printf’s formatting features using streams. However, the internally formattable arguments are internally supported by the streams library, so a standard library implementation must be able to avoid these costs. For example, to print a string with precision, the sample implementation has to copy the string, while libstdc++ already has an internal interface __ostream_insert() which takes a size parameter. These costs are not shown by the benchmark below, and Boost.Format does the same thing, actually.

Here is a benchmark using Boost.Format’s test code, release mode:

Non-positional arguments/normal:

printf time         :0.367188
ostream time        :0.59375,  = 1.61702 * printf 
format time         :2.125,  = 5.78723 * printf ,  = 3.57895 * nullStream 
std::putf time      :0.90625,  = 2.46809 * printf ,  = 1.52632 * nullStream 

Positional arguments/normal:

printf time         :0.414062
ostream time        :0.59375,  = 1.43396 * printf 
format time         :2.11719,  = 5.11321 * printf ,  = 3.56579 * nullStream 
std::putf time      :1.00781,  = 2.43396 * printf ,  = 1.69737 * nullStream 

Environment:

FreeBSD 8.3-STABLE amd64
g++ 4.8.0 20121209
Boost 1.48.0

Explanations:

The two test cases take the same amount of arguments, and have the same formatting results. The streams library has no such “positional arguments”, so I reordered the arguments by hand.

“normal” means the locale is turned on. However, I did not see a stable difference between normal and no_locale.

The format object of boost can be reused, which brings a performance increase around %17. Such a “feature” is not applicable to printf or std::putf, so I did not include them.

Future Issues

  1. By overloading std::printf with a basic_ostream as the first argument, we can get an ADL-capable interface which is similiar to std::getline:

    printf(std::wcout, L"%d\n", 42);
    I suggest to add this interface in addition to std::putf; one for printf transition, one for boost::format transition.

  2. Do we need the vprintf-like interfaces, like, to take the tuples of arguments? If so, use a special flag or just more function names? For reference, check std::experimental::vputf in the sample implementation.

  3. Is an scanf equivalence, e.g., std::getf, worth to be added?

Acknowledgments

Andrew Sandoval, who gave me some suggestions on standard-compliance and error handling.

Herb Sutter, who encouraged me to prepare the proposal, suggested me to add the positional arguments, and even provided many suggestions and corrections on the proposal.

Many people in the “std-proposals” mailing list: Jeffrey Yasskin, who “enforced” me to add the positional arguments; Martin Desharnais, who gave me the link about how to implement one; and many others.

Roger Orr, who gave me the thought about the std::getline-like interface.

References

[1] The Boost Format library. http://www.boost.org/doc/libs/1_52_0/libs/format/doc/format.html

[2] The fprintf function. ISO/IEC 9899:2011. 7.21.6.1.

[3] String Formatting Operations. The Python Standard Library. 5.6.2. http://docs.python.org/2/library/stdtypes.html#string-formatting

[4] dprintf, fprintf, printf, snprintf, sprintf - print formatted output. IEEE Std 1003.1-2008. http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html