<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 9/8/19 6:00 AM, Corentin via Lib
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CA+Om+SgTG-_viY5Me+2n8J96ybVJKAULyPnF1m-EqonGziu47Q@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at 11:17,
            Corentin &lt;<a href="mailto:corentin.jabot@gmail.com"
              target="_blank" moz-do-not-send="true">corentin.jabot@gmail.com</a>&gt;
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div dir="ltr"><br>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at
                  09:52, Billy O'Neal (VC LIBS) &lt;<a
                    href="mailto:bion@microsoft.com" target="_blank"
                    moz-do-not-send="true">bion@microsoft.com</a>&gt;
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div>
                    <div
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396WordSection1">
                      <p class="MsoNormal">&gt; I agree that EGCS is the
                        best option. That doesn't drag locale</p>
                      <p class="MsoNormal"> </p>
                      <p class="MsoNormal">Because we don’t get to
                        assume that we’re talking about Unicode at all,
                        it absolutely drags in locale.</p>
                    </div>
                  </div>
                </blockquote>
                <div><br>
                </div>
                <div>Sorry, I should have been more specific.</div>
                <div>There is a non-tailored Unicode EGCS boundary
                  algorithm (but it can be tailored)</div>
                <div>I didn't mean to imply that text manipulation can
                  be done without knowing its encoding and never use
                  "locale" to mean encoding. </div>
                <div><br>
                </div>
                <div>EGCS are only defined for text whose character
                  repertoire is Unicode, other encodings deal with
                  codepoints</div>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div><br>
          </div>
          <div>To be clear, the difference of whether the EGC algorithm
            is required to be tailored or not is that tailoring for all
            intent and purposes requires</div>
          <div>icu or something with CLDR, which restrict the platforms
            on which this can be implemented <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>Tailoring is not relevant to this discussion.</p>
    <p>The locale dependency stems from the encoding itself being
      dependent on locale.  Again, LANG=C vs LANG=C.UTF-8.  If the
      specified behavior is encoding dependent (as it would have to be
      for field width to be a count of any of code points, scalar
      values, or EGCs), then it is also locale dependent (for char and
      wchar_t).  Thus there is a trade off:</p>
    <ol>
      <li>Either the behavior is locale dependent in which case, field
        widths could be specified such that they count code points,
        scalar values, or EGCs when the locale selects a Unicode
        encoding (and something else for non-Unicode encodings), or</li>
      <li>The behavior is not locale dependent in which case, field
        widths can only be specified in terms of code units.<br>
      </li>
    </ol>
    <p>Recall that, unless there is a call to <tt>std::setlocale</tt>,
      all C and C++ processes start with the locale set to <tt>"C"</tt>.<br>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite"
cite="mid:CA+Om+SgTG-_viY5Me+2n8J96ybVJKAULyPnF1m-EqonGziu47Q@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div class="gmail_quote">
                <div><br>
                </div>
                <div><br>
                </div>
                <div> </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div>
                    <div
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396WordSection1">
                      <p class="MsoNormal"> </p>
                      <p class="MsoNormal">Billy3</p>
                      <p class="MsoNormal"> </p>
                    </div>
                    <hr style="display:inline-block;width:98%">
                    <div
id="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396divRplyFwdMsg"
                      dir="ltr"><font style="font-size:11pt"
                        color="#000000" face="Calibri, sans-serif"><b>From:</b>
                        Lib &lt;<a
                          href="mailto:lib-bounces@lists.isocpp.org"
                          target="_blank" moz-do-not-send="true">lib-bounces@lists.isocpp.org</a>&gt;
                        on behalf of Corentin via Lib &lt;<a
                          href="mailto:lib@lists.isocpp.org"
                          target="_blank" moz-do-not-send="true">lib@lists.isocpp.org</a>&gt;<br>
                        <b>Sent:</b> Saturday, September 7, 2019
                        11:08:25 PM<br>
                        <b>To:</b> Library Working Group &lt;<a
                          href="mailto:lib@lists.isocpp.org"
                          target="_blank" moz-do-not-send="true">lib@lists.isocpp.org</a>&gt;<br>
                        <b>Cc:</b> Corentin &lt;<a
                          href="mailto:corentin.jabot@gmail.com"
                          target="_blank" moz-do-not-send="true">corentin.jabot@gmail.com</a>&gt;;
                        Victor Zverovich &lt;<a
                          href="mailto:victor.zverovich@gmail.com"
                          target="_blank" moz-do-not-send="true">victor.zverovich@gmail.com</a>&gt;;
                        Tom Honermann &lt;<a
                          href="mailto:tom@honermann.net"
                          target="_blank" moz-do-not-send="true">tom@honermann.net</a>&gt;;
                        <a href="mailto:unicode@isocpp.open-std.org"
                          target="_blank" moz-do-not-send="true">unicode@isocpp.open-std.org</a>
                        &lt;<a href="mailto:unicode@open-std.org"
                          target="_blank" moz-do-not-send="true">unicode@open-std.org</a>&gt;<br>
                        <b>Subject:</b> Re: [isocpp-lib] New issue: Are
                        std::format field widths code units, code
                        points, or something else?</font>
                      <div> </div>
                    </div>
                    <div>
                      <div dir="auto">
                        <div><br>
                          <br>
                          <div class="gmail_quote">
                            <div dir="ltr" class="gmail_attr">On Sun,
                              Sep 8, 2019, 5:30 AM Tom Honermann via Lib
                              &lt;<a href="mailto:lib@lists.isocpp.org"
                                target="_blank" moz-do-not-send="true">lib@lists.isocpp.org</a>&gt;
                              wrote:<br>
                            </div>
                            <blockquote class="gmail_quote"
                              style="margin:0px 0px 0px
                              0.8ex;border-left:1px solid
                              rgb(204,204,204);padding-left:1ex">
                              <div bgcolor="#FFFFFF">
                                <div
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334moz-cite-prefix">On
                                  9/7/19 10:44 PM, Victor Zverovich
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div>&gt; <span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im">Is
                                        field width measured in code
                                        units, code points, or something
                                        else?</span></div>
                                    <div><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><br>
                                      </span></div>
                                    <div><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"></span>I
                                      think the main consideration here
                                      is that width should be
                                      locale-independent by default for
                                      consistency with the rest of
                                      std::format's design.</div>
                                  </div>
                                </blockquote>
                                I agree with that goal, but...<br>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div>If we can say that width is
                                      measured in grapheme clusters or
                                      code points based on the execution
                                      encoding (or whatever the
                                      standardese term) without querying
                                      the locale then I suggest doing
                                      so.</div>
                                  </div>
                                </blockquote>
                                I don't know how to do that.  From my
                                response to Zach, if code units aren't
                                used, then behavior should be different
                                for LANG=C vs LANG=C.UTF-8.<br>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div>I have slight preference for
                                      grapheme clusters since those
                                      correspond to user-perceived
                                      characters, but only have
                                      implementation experience with
                                      code points (this is what both the
                                      fmt library and Python do).<br>
                                    </div>
                                  </div>
                                </blockquote>
                                <p>I would definitely vote for EGCs over
                                  code points.  I think code points are
                                  probably the worst of the options
                                  since it makes the results dependent
                                  on Unicode normalization form.<br>
                                </p>
                              </div>
                            </blockquote>
                          </div>
                        </div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">I disagree. Code Units is the
                          worse option. For me anything involving code
                          units is a big red flag. I agree that EGCS is
                          the best option. That doesn't drag locale,
                          might be a bit involved for implementers in
                          20. </div>
                        <div dir="auto">I don't think specify EGCS for
                          Unicode text and codepoints otherwise wouldn't
                          be too difficult - implementation might be a
                          bit challenging on some platforms in the 20
                          time frame but they could fallback to
                          codepoints in the meantime. Not perfect but I
                          think we need a good long term solution rather
                          than a bad short term one</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0px 0px 0px
                              0.8ex;border-left:1px solid
                              rgb(204,204,204);padding-left:1ex">
                              <div bgcolor="#FFFFFF">
                                <p>Tom.<br>
                                </p>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><br>
                                        </span></span></div>
                                    <div><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im">Cheers,</span></span></div>
                                    <div><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><span
class="m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im">Victor</span></span></div>
                                  </div>
                                  <br>
                                  <div class="gmail_quote">
                                    <div dir="ltr" class="gmail_attr">On
                                      Sat, Sep 7, 2019 at 5:13 PM Tom
                                      Honermann via Lib &lt;<a
                                        href="mailto:lib@lists.isocpp.org"
                                        rel="noreferrer" target="_blank"
                                        moz-do-not-send="true">lib@lists.isocpp.org</a>&gt;
                                      wrote:<br>
                                    </div>
                                    <blockquote class="gmail_quote"
                                      style="margin:0px 0px 0px
                                      0.8ex;border-left:1px solid
                                      rgb(204,204,204);padding-left:1ex">
                                      <div bgcolor="#FFFFFF">
                                        <p><a
href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Fformat%23string.std-7&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252854619&amp;sdata=WsHw%2BM62uyiOBrr91P6W1GzwGe313EDe30bKN5i006Q%3D&amp;reserved=0"
                                            rel="noreferrer"
                                            target="_blank"
                                            moz-do-not-send="true">[format.string.std]p7</a>
                                          states:</p>
                                        <blockquote type="cite">
                                          <p>The <i>positive-integer</i>
                                            in <i>width</i> is a
                                            decimal integer defining the
                                            minimum field width.  If
                                            <i>width</i> is not
                                            specified, there is no
                                            minimum field width, and the
                                            field width is determined
                                            based on the content of the
                                            field.</p>
                                        </blockquote>
                                        <p>Is field width measured in
                                          code units, code points, or
                                          something else?</p>
                                        <p>Consider the following
                                          example assuming a UTF-8
                                          locale:<br>
                                        </p>
                                        <p><tt>std::format("{}",
                                            "\xC3\x81");     // U+00C1</tt><tt>       
                                            { </tt><tt>LATIN CAPITAL
                                            LETTER A WITH ACUTE }</tt><br>
                                          <tt>std::format("{}",
                                            "\x41\xCC\x81"); // U+0041
                                            U+0301 { </tt><tt>LATIN
                                            CAPITAL LETTER A } {
                                          </tt><tt>COMBINING ACUTE
                                            ACCENT }<br>
                                          </tt></p>
                                        <p>In both cases, the arguments
                                          encode the same user-perceived
                                          character (Á).  The first uses
                                          two UTF-8 code units to encode
                                          a single code point that
                                          represents a single glyph
                                          using a composed Unicode
                                          normalization form.  The
                                          second uses three code units
                                          to encode two code points that
                                          represent the same glyph using
                                          a decomposed Unicode
                                          normalization form.</p>
                                        <p>How is the field width
                                          determined?  If measured in
                                          code units, the first has a
                                          width of 2 and the second of
                                          3.  If measured in code
                                          points, the first has a width
                                          of 1 and the second of 2.  If
                                          measured in grapheme clusters,
                                          both have a width of 1.  Is
                                          the determination locale
                                          dependent?</p>
                                        <p><b>Proposed resolution:</b></p>
                                        <p>Field widths are measured in
                                          code units and are not locale
                                          dependent. Modify <a
href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Fformat%23string.std-7&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252864612&amp;sdata=36WpbP64Oqoi4Pne9kFrEu6nauHLNr2VunnfkvdWcPY%3D&amp;reserved=0"
                                            rel="noreferrer"
                                            target="_blank"
                                            moz-do-not-send="true">
                                            [format.string.std]p7</a> as
                                          follows:</p>
                                        <blockquote type="cite">
                                          <p>The <i>positive-integer</i>
                                            in <i>width</i> is a
                                            decimal integer defining the
                                            minimum field width.  If
                                            <i>width</i> is not
                                            specified, there is no
                                            minimum field width, and the
                                            field width is determined
                                            based on the content of the
                                            field. 
                                            <b><font color="#33cc00">Field
                                                width is measured in
                                                code units.  Each byte
                                                of a multibyte character
                                                contributes to the field
                                                width.</font></b><br>
                                          </p>
                                        </blockquote>
                                        <p>(<i>code unit</i> is not
                                          formally defined in the
                                          standard.  Most uses occur in
                                          UTF-8 and UTF-16 specific
                                          contexts, but
                                          <a
href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Flex.ext%235&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252864612&amp;sdata=UyG%2Fr7BXuLAPAXP78ekpXS%2FWhqdeU2QCHTmTeBPjImQ%3D&amp;reserved=0"
                                            rel="noreferrer"
                                            target="_blank"
                                            moz-do-not-send="true">
                                            [lex.ext]p5</a> uses it in
                                          an encoding agnostic context.)<br>
                                        </p>
                                        <p>Tom.<br>
                                        </p>
                                      </div>
_______________________________________________<br>
                                      Lib mailing list<br>
                                      <a
                                        href="mailto:Lib@lists.isocpp.org"
                                        rel="noreferrer" target="_blank"
                                        moz-do-not-send="true">Lib@lists.isocpp.org</a><br>
                                      Subscription: <a
href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252874608&amp;sdata=ieyJCXmZ0Bj3UfW4Lvi3hW1HlOq6oeEML86Xyry9uFI%3D&amp;reserved=0"
                                        rel="noreferrer noreferrer"
                                        target="_blank"
                                        moz-do-not-send="true">
https://lists.isocpp.org/mailman/listinfo.cgi/lib</a><br>
                                      Link to this post: <a
href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib%2F2019%2F09%2F13440.php&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252874608&amp;sdata=l4UxwaFExnxKireder%2F%2BAnU2mszZXMYatHrd2zGSSWQ%3D&amp;reserved=0"
                                        rel="noreferrer noreferrer"
                                        target="_blank"
                                        moz-do-not-send="true">
http://lists.isocpp.org/lib/2019/09/13440.php</a><br>
                                    </blockquote>
                                  </div>
                                </blockquote>
                                <p><br>
                                </p>
                              </div>
_______________________________________________<br>
                              Lib mailing list<br>
                              <a href="mailto:Lib@lists.isocpp.org"
                                rel="noreferrer" target="_blank"
                                moz-do-not-send="true">Lib@lists.isocpp.org</a><br>
                              Subscription: <a
href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252884602&amp;sdata=B0%2BhF8pSkAy2MbEwWHk1r3uVjbIpvIoQ%2Fi%2BckyTQ94A%3D&amp;reserved=0"
                                rel="noreferrer noreferrer"
                                target="_blank" moz-do-not-send="true">
https://lists.isocpp.org/mailman/listinfo.cgi/lib</a><br>
                              Link to this post: <a
href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib%2F2019%2F09%2F13446.php&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252894598&amp;sdata=NVwyEiiPWSwvAApse%2FxktecxI6oAiGhUWKjyXw8yYMw%3D&amp;reserved=0"
                                rel="noreferrer noreferrer"
                                target="_blank" moz-do-not-send="true">
http://lists.isocpp.org/lib/2019/09/13446.php</a><br>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </blockquote>
              </div>
            </div>
          </blockquote>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Lib mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Lib@lists.isocpp.org">Lib@lists.isocpp.org</a>
Subscription: <a class="moz-txt-link-freetext" href="https://lists.isocpp.org/mailman/listinfo.cgi/lib">https://lists.isocpp.org/mailman/listinfo.cgi/lib</a>
Link to this post: <a class="moz-txt-link-freetext" href="http://lists.isocpp.org/lib/2019/09/13453.php">http://lists.isocpp.org/lib/2019/09/13453.php</a>
</pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>