[SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

Marshall Clow mclow.lists at gmail.com
Wed Sep 11 21:32:47 CEST 2019


On Sat, Sep 7, 2019 at 5:13 PM Tom Honermann via Lib <lib at lists.isocpp.org>
wrote:

> [format.string.std]p7 <http://eel.is/c++draft/format#string.std-7> states:
>
> The *positive-integer* in *width* is a decimal integer defining the
> minimum field width.  If *width* is not specified, there is no minimum
> field width, and the field width is determined based on the content of the
> field.
>
> Is field width measured in code units, code points, or something else?
>
> Consider the following example assuming a UTF-8 locale:
>
> std::format("{}", "\xC3\x81");     // U+00C1        { LATIN CAPITAL
> LETTER A WITH ACUTE }
> std::format("{}", "\x41\xCC\x81"); // U+0041 U+0301 { LATIN CAPITAL
> LETTER A } { COMBINING ACUTE ACCENT }
>
> In both cases, the arguments encode the same user-perceived character
> (Á).  The first uses two UTF-8 code units to encode a single code point
> that represents a single glyph using a composed Unicode normalization
> form.  The second uses three code units to encode two code points that
> represent the same glyph using a decomposed Unicode normalization form.
>
> How is the field width determined?  If measured in code units, the first
> has a width of 2 and the second of 3.  If measured in code points, the
> first has a width of 1 and the second of 2.  If measured in grapheme
> clusters, both have a width of 1.  Is the determination locale dependent?
>
>
(Coming late to the party)
Let's ask a different question.

          std::string s = "/* some content */";
          std::ostringstream oss;
          oss << std::setw(22) << s;
          std::string result1 = oss.str();
          std::string result2 = std::format("{:22}", s);

What can we say about the contents of "result1" and "result2"?
Are they the same? Does it matter what the contents of `s` is?

-- Marshall
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190911/3e3ca4af/attachment-0001.html 


More information about the Unicode mailing list