<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 9 Sep 2019 at 21:29, Tom Honermann <<a href="mailto:tom@honermann.net">tom@honermann.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail-m_948892757684974694moz-cite-prefix">On 9/9/19 3:26 AM, Corentin wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<div dir="auto">On Mon, Sep 9, 2019, 4:34 AM Tom Honermann <<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>> wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>My preferred direction for exploration is a future
extension that enables opt-in to field widths that are
encoding dependent (and therefore locale dependent for
char and wchar_t). For example (using 'L' appended to
the width; 'L' doesn't conflict with the existing type
options):<br>
</p>
<p><tt>std::format("{:3L}", "\xC3\x81"); // produces
"\xC3\x81\x20\x20"; 3 EGCs.</tt></p>
</div>
</blockquote>
<div>std::format("{:3L}", "ch"); what does that produces?</div>
</div>
</div>
</div>
</blockquote>
"ch " (one trailing space). The implied constraint with respect to
literals is that they must be compatible with whatever the locale
dependent encoding is. If your question was intended to ask whether
transliteration should occur here or whether "ch" might be presented
with a ligature, well that is yet another dimension of why field
widths don't really work for aligning text (in general, it works
just fine for characters for which one code unit == one code point
== one glyph that can be presented in a monospace font).<br></div></blockquote><div><br></div><div>See <a href="https://en.wikipedia.org/wiki/Slovak_orthography">https://en.wikipedia.org/wiki/Slovak_orthography</a></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<blockquote type="cite">
<div dir="ltr">
<div dir="auto">
<div class="gmail_quote">
<div>Locale specifiers should only affect region specific
rules, not whether something is interpreted as bytes or
not <br>
</div>
</div>
</div>
</div>
</blockquote>
Ideally I agree, but that isn't the reality we are faced with.<br></div></blockquote><div><br></div><div>I feel like we completely talk past each other and i am sorry I don't make my point clear.<br></div><div>Yes, the encoding is currently derived from the locale, no, it does not have to be.</div><div><br></div><div>It is possible to answer the question "what is the encoding the current process" without pulling the <locale> header.</div><div>Pulling the locale header does NOT give you that information.</div><div>And yes on some systems (linux), it is attached to the idea of locale.</div><div><br></div><div>It is important to separate the two when dealing with Unicode</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<blockquote type="cite">
<div dir="ltr">
<div dir="auto">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p><tt> </tt></p>
<p>But again, I'm far from convinced that this is
actually useful since EGCs don't suffice to ensure an
aligned result anyway as nicely described in Henri's
post (<a href="https://hsivonen.fi/string-length" rel="noreferrer" target="_blank">https://hsivonen.fi/string-length</a>).</p>
</div>
</blockquote>
<div>Agreed but i think you know that code units is the
least useful option in this case and i am concerned about
choosing a bad option to make a fix easy.</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
<p>I didn't propose code units in order to make an easy fix. The
intent was to choose the best option given the trade offs
involved. Since none of code units, code points, scalar values,
or EGCs would result in reliable alignment and most uses of such
alignment (e.g., via printf) are used in situations where
characters outside the basic source character set are unlikely to
appear [citation needed], I felt that avoiding the locale
dependency was the more important goal.</p></div></blockquote><div>I think the user intent is more important. I don't want an emoji to be considered 17 width units to quote Henri's</div><div>EGCs is the less bad approximation<br></div><div><br></div><div>But stating that the char overload is bytes and the upcoming char8_t one is text would be okay, I think. Maybe. even if surprising </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><p>
</p>
<p>Tom.<br>
</p>
</div>
</blockquote></div></div>