<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 9/11/19 3:32 PM, Marshall Clow
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAMBqOshvViFspJFiSviBZZFNoc434KLBSRJ0mougOBLU0jhb1w@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Sat, Sep 7, 2019 at 5:13 PM Tom Honermann via
Lib <<a href="mailto:lib@lists.isocpp.org"
moz-do-not-send="true">lib@lists.isocpp.org</a>> wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p><a href="http://eel.is/c++draft/format#string.std-7"
target="_blank" moz-do-not-send="true">[format.string.std]p7</a>
states:</p>
<p> </p>
<blockquote type="cite">
<p>The <i>positive-integer</i> in <i>width</i> is a
decimal integer defining the minimum field width. If
<i>width</i> is not specified, there is no minimum
field width, and the field width is determined based
on the content of the field.</p>
</blockquote>
<p>Is field width measured in code units, code points, or
something else?</p>
<p>Consider the following example assuming a UTF-8 locale:<br>
</p>
<p><tt>std::format("{}", "\xC3\x81"); // U+00C1</tt><tt>
{ </tt><tt>LATIN CAPITAL LETTER A WITH ACUTE }</tt><br>
<tt>std::format("{}", "\x41\xCC\x81"); // U+0041 U+0301
{ </tt><tt>LATIN CAPITAL LETTER A } { </tt><tt>COMBINING
ACUTE ACCENT }<br>
</tt></p>
<p>In both cases, the arguments encode the same
user-perceived character (Á). The first uses two UTF-8
code units to encode a single code point that represents
a single glyph using a composed Unicode normalization
form. The second uses three code units to encode two
code points that represent the same glyph using a
decomposed Unicode normalization form.</p>
<p>How is the field width determined? If measured in code
units, the first has a width of 2 and the second of 3.
If measured in code points, the first has a width of 1
and the second of 2. If measured in grapheme clusters,
both have a width of 1. Is the determination locale
dependent?</p>
</div>
<br>
</blockquote>
<div><br>
</div>
<div>(Coming late to the party)</div>
<div>Let's ask a different question.</div>
<div><br>
</div>
<div> std::string s = "/* some content */";<br>
std::ostringstream oss;<br>
oss << std::setw(22) << s;<br>
std::string result1 = oss.str();<br>
std::string result2 = std::format("{:22}", s);<br>
</div>
<div><br>
</div>
<div>What can we say about the contents of "result1" and
"result2"?</div>
<div>Are they the same? Does it matter what the contents of
`s` is?</div>
</div>
</div>
</blockquote>
<p>Excellent questions.<br>
</p>
<p>I really want them to be the same (at least by default,
additional opt-in support for locale/encoding sensitive alignment
strike me as potentially reasonable assuming identification of
compelling use cases).<br>
I don't think the contents of `s` should matter (without
additional opt-in).<br>
</p>
<p>Tom.<br>
</p>
<blockquote type="cite"
cite="mid:CAMBqOshvViFspJFiSviBZZFNoc434KLBSRJ0mougOBLU0jhb1w@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>-- Marshall</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>