<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 9/8/19 7:05 PM, Zach Laine wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CALOpkJBA7d+htTNdWZqRpHy0MVDKA4WVt4jYuDdUsfcFvziHSQ@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">On Sun, Sep 8, 2019 at 3:00 PM Tom Honermann via
          Lib &lt;<a href="mailto:lib@lists.isocpp.org"
            moz-do-not-send="true">lib@lists.isocpp.org</a>&gt; wrote:<br>
        </div>
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="auto"><br>
              <div dir="ltr">On Sep 8, 2019, at 2:46 PM, Corentin via
                Lib &lt;<a href="mailto:lib@lists.isocpp.org"
                  target="_blank" moz-do-not-send="true">lib@lists.isocpp.org</a>&gt;
                wrote:<br>
                <br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr"><br>
                    </div>
                    <br>
                    <div class="gmail_quote">
                      <div dir="ltr" class="gmail_attr">On Sun, 8 Sep
                        2019 at 19:30, Tom Honermann &lt;<a
                          href="mailto:tom@honermann.net"
                          target="_blank" moz-do-not-send="true">tom@honermann.net</a>&gt;
                        wrote:<br>
                      </div>
                      <blockquote class="gmail_quote" style="margin:0px
                        0px 0px 0.8ex;border-left:1px solid
                        rgb(204,204,204);padding-left:1ex">
                        <div bgcolor="#FFFFFF">
                          <div
class="gmail-m_3952312726224711374gmail-m_4045717672081106664moz-cite-prefix">On
                            9/8/19 12:40 PM, Corentin wrote:<br>
                          </div>
                          <blockquote type="cite">
                            <div dir="ltr">
                              <div dir="ltr"><br>
                              </div>
                              <br>
                              <div class="gmail_quote">
                                <div dir="ltr" class="gmail_attr">On
                                  Sun, 8 Sep 2019 at 18:12, Tom
                                  Honermann &lt;<a
                                    href="mailto:tom@honermann.net"
                                    target="_blank"
                                    moz-do-not-send="true">tom@honermann.net</a>&gt;
                                  wrote:<br>
                                </div>
                                <blockquote class="gmail_quote"
                                  style="margin:0px 0px 0px
                                  0.8ex;border-left:1px solid
                                  rgb(204,204,204);padding-left:1ex">
                                  <div bgcolor="#FFFFFF">
                                    <div
class="gmail-m_3952312726224711374gmail-m_4045717672081106664gmail-m_1796657059973223044moz-cite-prefix">On
                                      9/8/19 6:00 AM, Corentin via Lib
                                      wrote:<br>
                                    </div>
                                    <blockquote type="cite">
                                      <div dir="ltr">
                                        <div dir="ltr"><br>
                                        </div>
                                        <br>
                                        <div class="gmail_quote">
                                          <div dir="ltr"
                                            class="gmail_attr">On Sun, 8
                                            Sep 2019 at 11:17, Corentin
                                            &lt;<a
                                              href="mailto:corentin.jabot@gmail.com"
                                              target="_blank"
                                              moz-do-not-send="true">corentin.jabot@gmail.com</a>&gt;
                                            wrote:<br>
                                          </div>
                                          <blockquote
                                            class="gmail_quote"
                                            style="margin:0px 0px 0px
                                            0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
                                            <div dir="ltr">
                                              <div dir="ltr"><br>
                                              </div>
                                              <br>
                                              <div class="gmail_quote">
                                                <div dir="ltr"
                                                  class="gmail_attr">On
                                                  Sun, 8 Sep 2019 at
                                                  09:52, Billy O'Neal
                                                  (VC LIBS) &lt;<a
                                                    href="mailto:bion@microsoft.com"
                                                    target="_blank"
                                                    moz-do-not-send="true">bion@microsoft.com</a>&gt;
                                                  wrote:<br>
                                                </div>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0px 0px
                                                  0px
                                                  0.8ex;border-left:1px
                                                  solid
                                                  rgb(204,204,204);padding-left:1ex">
                                                  <div>
                                                    <div
class="gmail-m_3952312726224711374gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396WordSection1">
                                                      <p
                                                        class="MsoNormal">&gt;
                                                        I agree that
                                                        EGCS is the best
                                                        option. That
                                                        doesn't drag
                                                        locale</p>
                                                      <p
                                                        class="MsoNormal"> </p>
                                                      <p
                                                        class="MsoNormal">Because
                                                        we don’t get to
                                                        assume that
                                                        we’re talking
                                                        about Unicode at
                                                        all, it
                                                        absolutely drags
                                                        in locale.</p>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                                <div><br>
                                                </div>
                                                <div>Sorry, I should
                                                  have been more
                                                  specific.</div>
                                                <div>There is a
                                                  non-tailored Unicode
                                                  EGCS boundary
                                                  algorithm (but it can
                                                  be tailored)</div>
                                                <div>I didn't mean to
                                                  imply that text
                                                  manipulation can be
                                                  done without knowing
                                                  its encoding and never
                                                  use "locale" to mean
                                                  encoding. </div>
                                                <div><br>
                                                </div>
                                                <div>EGCS are only
                                                  defined for text whose
                                                  character repertoire
                                                  is Unicode, other
                                                  encodings deal with
                                                  codepoints</div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <div><br>
                                          </div>
                                          <div><br>
                                          </div>
                                          <div>To be clear, the
                                            difference of whether
                                            the EGC algorithm is
                                            required to be tailored or
                                            not is that tailoring for
                                            all intent and purposes
                                            requires</div>
                                          <div>icu or something
                                            with CLDR, which restrict
                                            the platforms on which this
                                            can be implemented <br>
                                          </div>
                                        </div>
                                      </div>
                                    </blockquote>
                                    <p>Tailoring is not relevant to this
                                      discussion.</p>
                                  </div>
                                </blockquote>
                                <div>It is - see <a
                                    href="https://unicode.org/reports/tr29/"
                                    target="_blank"
                                    moz-do-not-send="true">https://unicode.org/reports/tr29/</a> "ch"
                                  is 2 EGCS in most locales but in
                                  Slovak it's 1. I don't make the rules
                                  :D</div>
                              </div>
                            </div>
                          </blockquote>
                          It isn't relevant in determining how we
                          resolve this issue.  If the resolution is that
                          field widths are measured in EGCs, then we've
                          already decided that the width is locale
                          dependent and tailoring becomes an
                          implementation detail.<br>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                      <div>No, format decided to be locale-independent
                        (for good reason) and applying locale specific
                        behavior implicitly would be against that.</div>
                      <div>I'n arguing for encoding specific behavior</div>
                    </div>
                  </div>
                </div>
              </blockquote>
              <div><br>
              </div>
              You seem to be missing the point that, for char and
              wchar_t, the encoding can’t be known (in general) without
              consulting the locale. Again, LANG=C vs LANG=C.UTF-8. 
              <div><br>
              </div>
              <div>Tom. </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Tom, you seem to be missing the point that std::format
            does not such consultation!  It is locale-agnostic.  It is
            assumed to be char-based, not Windows 1252, not UTF-8, not
            even ASCII.</div>
        </div>
      </div>
    </blockquote>
    That is exactly my point!  And why my proposed resolution was to
    specify width in terms of code units.<br>
    <blockquote type="cite"
cite="mid:CALOpkJBA7d+htTNdWZqRpHy0MVDKA4WVt4jYuDdUsfcFvziHSQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>This means that the definition of width as being a CU is
            the de facto status quo.  I'm suggesting that later on, we
            pull a fast one and specify that we meant that it should
            have been UTF-8-based instead of char-based.  This may mean
            that we need to add a char8_t overload, or it may be
            palatable to just change the current interface's contract. 
            I assume the former will be necessary, since people tend to
            hate silent contract changes (with good reason).<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>Victor's fmtlib implementation already effectively does what you
      suggest.  See
<a class="moz-txt-link-freetext" href="https://github.com/fmtlib/fmt/commit/38325248e5310ddbea41390974e496e8495f7324">https://github.com/fmtlib/fmt/commit/38325248e5310ddbea41390974e496e8495f7324</a>.</p>
    <p>I think this isn't a good state to be in though.  If the current
      locale has a UTF-8 encoding, I would be disappointed if the
      following two calls produced different string contents:</p>
    <p><tt>std::format(  "{:3}",   "\xC3\x81"); // U+00C1</tt><tt> { </tt><tt>LATIN
        CAPITAL LETTER A WITH ACUTE }<br>
      </tt><tt>std::format(u8"{:3}", u8"\xC3\x81"); // U+00C1</tt><tt> {
      </tt><tt>LATIN CAPITAL LETTER A WITH ACUTE }</tt></p>
    <p>If the width is code units for the char based overload and EGCs
      for the char8_t based one, then the first will produce
      "\xC3\x81\x20" (one inserted space) and the second
      "\xC3\x81\x20\x20" (two inserted spaces).  I think users would
      find that surprising.<br>
    </p>
    <blockquote type="cite"
cite="mid:CALOpkJBA7d+htTNdWZqRpHy0MVDKA4WVt4jYuDdUsfcFvziHSQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>So, if we do nothing, we get what you want.  If we
            *specify* that CUs are the width, we color the future debate
            about the Unicode-aware version in a Unicode-unfriendly
            direction.</div>
        </div>
      </div>
    </blockquote>
    <p>If we do nothing, we are in the situation where different
      implementors may do different things.</p>
    <p>My preferred direction for exploration is a future extension that
      enables opt-in to field widths that are encoding dependent (and
      therefore locale dependent for char and wchar_t).  For example
      (using 'L' appended to the width; 'L' doesn't conflict with the
      existing type options):<br>
    </p>
    <p><tt>std::format("{:3L}", "\xC3\x81"); // produces
        "\xC3\x81\x20\x20"; 3 EGCs.<br>
      </tt></p>
    <p>But again, I'm far from convinced that this is actually useful
      since EGCs don't suffice to ensure an aligned result anyway as
      nicely described in Henri's post (<a
        href="https://hsivonen.fi/string-length">https://hsivonen.fi/string-length</a>).</p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite"
cite="mid:CALOpkJBA7d+htTNdWZqRpHy0MVDKA4WVt4jYuDdUsfcFvziHSQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>Zach</div>
          <div><br>
          </div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>