<div dir="ltr"><div dir="ltr">On Sun, Sep 8, 2019 at 3:00 PM Tom Honermann via Lib &lt;<a href="mailto:lib@lists.isocpp.org">lib@lists.isocpp.org</a>&gt; wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><br><div dir="ltr">On Sep 8, 2019, at 2:46 PM, Corentin via Lib &lt;<a href="mailto:lib@lists.isocpp.org" target="_blank">lib@lists.isocpp.org</a>&gt; wrote:<br><br></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at 19:30, Tom Honermann &lt;<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <div class="gmail-m_3952312726224711374gmail-m_4045717672081106664moz-cite-prefix">On 9/8/19 12:40 PM, Corentin wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at 18:12,
            Tom Honermann &lt;<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>&gt; wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <div class="gmail-m_3952312726224711374gmail-m_4045717672081106664gmail-m_1796657059973223044moz-cite-prefix">On
                9/8/19 6:00 AM, Corentin via Lib wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr"><br>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019
                      at 11:17, Corentin &lt;<a href="mailto:corentin.jabot@gmail.com" target="_blank">corentin.jabot@gmail.com</a>&gt;
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">
                        <div dir="ltr"><br>
                        </div>
                        <br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Sun, 8
                            Sep 2019 at 09:52, Billy O&#39;Neal (VC LIBS)
                            &lt;<a href="mailto:bion@microsoft.com" target="_blank">bion@microsoft.com</a>&gt;
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div>
                              <div class="gmail-m_3952312726224711374gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396WordSection1">
                                <p class="MsoNormal">&gt; I agree that
                                  EGCS is the best option. That doesn&#39;t
                                  drag locale</p>
                                <p class="MsoNormal"> </p>
                                <p class="MsoNormal">Because we don’t
                                  get to assume that we’re talking about
                                  Unicode at all, it absolutely drags in
                                  locale.</p>
                              </div>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Sorry, I should have been more specific.</div>
                          <div>There is a non-tailored Unicode EGCS
                            boundary algorithm (but it can be tailored)</div>
                          <div>I didn&#39;t mean to imply that text
                            manipulation can be done without knowing its
                            encoding and never use &quot;locale&quot; to mean
                            encoding. </div>
                          <div><br>
                          </div>
                          <div>EGCS are only defined for text whose
                            character repertoire is Unicode, other
                            encodings deal with codepoints</div>
                        </div>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                    <div><br>
                    </div>
                    <div>To be clear, the difference of whether the EGC
                      algorithm is required to be tailored or not is
                      that tailoring for all intent and purposes
                      requires</div>
                    <div>icu or something with CLDR, which restrict the
                      platforms on which this can be implemented <br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>Tailoring is not relevant to this discussion.</p>
            </div>
          </blockquote>
          <div>It is - see <a href="https://unicode.org/reports/tr29/" target="_blank">https://unicode.org/reports/tr29/</a> &quot;ch&quot;
            is 2 EGCS in most locales but in Slovak it&#39;s 1. I don&#39;t make
            the rules :D</div>
        </div>
      </div>
    </blockquote>
    It isn&#39;t relevant in determining how we resolve this issue.  If the
    resolution is that field widths are measured in EGCs, then we&#39;ve
    already decided that the width is locale dependent and tailoring
    becomes an implementation detail.<br></div></blockquote><div><br></div><div>No, format decided to be locale-independent (for good reason) and applying locale specific behavior implicitly would be against that.</div><div>I&#39;n arguing for encoding specific behavior</div></div></div></div></blockquote><div><br></div>You seem to be missing the point that, for char and wchar_t, the encoding can’t be known (in general) without consulting the locale. Again, LANG=C vs LANG=C.UTF-8. <div><br></div><div>Tom. </div></div></blockquote><div><br></div><div>Tom, you seem to be missing the point that std::format does not such consultation!  It is locale-agnostic.  It is assumed to be char-based, not Windows 1252, not UTF-8, not even ASCII.</div><div><br></div><div>This means that the definition of width as being a CU is the de facto status quo.  I&#39;m suggesting that later on, we pull a fast one and specify that we meant that it should have been UTF-8-based instead of char-based.  This may mean that we need to add a char8_t overload, or it may be palatable to just change the current interface&#39;s contract.  I assume the former will be necessary, since people tend to hate silent contract changes (with good reason).<br></div><div><br></div><div>So, if we do nothing, we get what you want.  If we *specify* that CUs are the width, we color the future debate about the Unicode-aware version in a Unicode-unfriendly direction.</div><div><br></div><div>Zach</div><div><br></div></div></div>