<div dir="ltr"><div dir="ltr">On Sun, Sep 8, 2019 at 3:00 PM Tom Honermann via Lib <<a href="mailto:lib@lists.isocpp.org">lib@lists.isocpp.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><br><div dir="ltr">On Sep 8, 2019, at 2:46 PM, Corentin via Lib <<a href="mailto:lib@lists.isocpp.org" target="_blank">lib@lists.isocpp.org</a>> wrote:<br><br></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at 19:30, Tom Honermann <<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail-m_3952312726224711374gmail-m_4045717672081106664moz-cite-prefix">On 9/8/19 12:40 PM, Corentin wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at 18:12,
Tom Honermann <<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail-m_3952312726224711374gmail-m_4045717672081106664gmail-m_1796657059973223044moz-cite-prefix">On
9/8/19 6:00 AM, Corentin via Lib wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019
at 11:17, Corentin <<a href="mailto:corentin.jabot@gmail.com" target="_blank">corentin.jabot@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, 8
Sep 2019 at 09:52, Billy O'Neal (VC LIBS)
<<a href="mailto:bion@microsoft.com" target="_blank">bion@microsoft.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div class="gmail-m_3952312726224711374gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396WordSection1">
<p class="MsoNormal">> I agree that
EGCS is the best option. That doesn't
drag locale</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Because we don’t
get to assume that we’re talking about
Unicode at all, it absolutely drags in
locale.</p>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Sorry, I should have been more specific.</div>
<div>There is a non-tailored Unicode EGCS
boundary algorithm (but it can be tailored)</div>
<div>I didn't mean to imply that text
manipulation can be done without knowing its
encoding and never use "locale" to mean
encoding. </div>
<div><br>
</div>
<div>EGCS are only defined for text whose
character repertoire is Unicode, other
encodings deal with codepoints</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>To be clear, the difference of whether the EGC
algorithm is required to be tailored or not is
that tailoring for all intent and purposes
requires</div>
<div>icu or something with CLDR, which restrict the
platforms on which this can be implemented <br>
</div>
</div>
</div>
</blockquote>
<p>Tailoring is not relevant to this discussion.</p>
</div>
</blockquote>
<div>It is - see <a href="https://unicode.org/reports/tr29/" target="_blank">https://unicode.org/reports/tr29/</a> "ch"
is 2 EGCS in most locales but in Slovak it's 1. I don't make
the rules :D</div>
</div>
</div>
</blockquote>
It isn't relevant in determining how we resolve this issue. If the
resolution is that field widths are measured in EGCs, then we've
already decided that the width is locale dependent and tailoring
becomes an implementation detail.<br></div></blockquote><div><br></div><div>No, format decided to be locale-independent (for good reason) and applying locale specific behavior implicitly would be against that.</div><div>I'n arguing for encoding specific behavior</div></div></div></div></blockquote><div><br></div>You seem to be missing the point that, for char and wchar_t, the encoding can’t be known (in general) without consulting the locale. Again, LANG=C vs LANG=C.UTF-8. <div><br></div><div>Tom. </div></div></blockquote><div><br></div><div>Tom, you seem to be missing the point that std::format does not such consultation! It is locale-agnostic. It is assumed to be char-based, not Windows 1252, not UTF-8, not even ASCII.</div><div><br></div><div>This means that the definition of width as being a CU is the de facto status quo. I'm suggesting that later on, we pull a fast one and specify that we meant that it should have been UTF-8-based instead of char-based. This may mean that we need to add a char8_t overload, or it may be palatable to just change the current interface's contract. I assume the former will be necessary, since people tend to hate silent contract changes (with good reason).<br></div><div><br></div><div>So, if we do nothing, we get what you want. If we *specify* that CUs are the width, we color the future debate about the Unicode-aware version in a Unicode-unfriendly direction.</div><div><br></div><div>Zach</div><div><br></div></div></div>