<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 9 Sep 2019 at 01:25, Tom Honermann &lt;<a href="mailto:tom@honermann.net">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><br><div dir="ltr">On Sep 8, 2019, at 3:31 PM, Tony V E via Lib &lt;<a href="mailto:lib@lists.isocpp.org" target="_blank">lib@lists.isocpp.org</a>&gt; wrote:<br><br></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div>Do we have / could we have / should we have</div><div>a clear long term
 (20 years)

 direction for text in C++?<br></div></div></div></blockquote><div><br></div>I would like that very much, but we don’t control the ecosystem, and will have to, to some degree, roll with where the community takes us. </div></blockquote><div><br></div><div>The community is waiting for us to catch up and i do believe we have some control</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div><br></div><div>ie the long term direction is unicode.</div><div>and/or specifically the long term direction is UTF8.</div></div></div></blockquote><div><br></div>I think we do have wide spread agreement on that, though UTF-16 is likely to remain strongly relevant in some niches. </div><div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div>We expect everyone to use char8_t then?  Or we expect char to become utf8 someday?</div></div></div></blockquote><div><br></div><div>I think it is very unlikely that there will be a mass migration to char8_t. My expectation is that it will be used for the internal encoding within some percentage of new projects and components. </div><div><br></div><div>With regard to char, I expect it to remain the type used for text that may or may not be UTF-8.</div><div><br></div><div>I think Microsoft will eventually provide (non-experimental) means to use UTF-8 with Win32 and that this will likely come in three forms </div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><div><br></div><div>1) support for UTF-8 as the system wide Active Code Page (ACP). This is already available as an experimental option. </div></div></div></blockquote><div><br></div><div>They di</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><div><br></div><div>2) support for executables to opt-in to a per-process override of the system wide ACP. In this mode, stdio would presumably traffic in the system wide ACP and require transcoding (I don’t think implicit transcoding is realistic). This is already available as an experimental option. </div></div></div></blockquote><div><br></div><div><br></div><div>They do</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><div><br></div><div>3) support for a subset of Win32 interfaces that take char8_t.  E.g., U8 variants of some existing A/W interfaces. </div></div></div></blockquote><div><br></div><div>That seems unlikely ?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><div><br></div><div>z/OS is a bit more interesting. Though EBCDIC based, ASCII interfaces <span style="background-color:rgba(255,255,255,0)">that implicitly transcode to EBCDIC </span>are available for a subset of C interfaces . As far as I am aware, there are no plans to extend this support to include UTF-8. </div></div></div></blockquote><div><br></div><div>Their interest in text is limited, it is clearly a small minority here.</div><div>I think there is a difference between not breaking their use cases and designing for that platform specifically. </div><div>Whatever we do, they will be fine</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div>What do we want the long term future to look like?</div></div></div></blockquote><div><br></div><div>🎵You can’t always get what you want 🎶</div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div>deprecate std::string?<br></div></div></div></blockquote><div><br></div>Probably not. </div></div></blockquote><div><br></div><div>We should supersede it and an may chips fall were they may.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div><br></div><div>And then a list of short term stop-gap measures, like &quot;we know we can&#39;t do X yet,so we do Y for now&quot;.</div><div>Like we use char, but plan on switching to char8_t.</div><div>Or QoI escape hatches.  etc.</div></div></div></blockquote><div><br></div><div>I think we need to plan to support use of both char and char8_t for UTF-8 text for the foreseeable future. </div><div><br></div><div>Tom. </div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 8, 2019 at 2:46 PM Corentin via Lib &lt;<a href="mailto:lib@lists.isocpp.org" target="_blank">lib@lists.isocpp.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at 19:30, Tom Honermann &lt;<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    <div class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664moz-cite-prefix">On 9/8/19 12:40 PM, Corentin wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019 at 18:12,
            Tom Honermann &lt;<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>&gt; wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <div class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044moz-cite-prefix">On
                9/8/19 6:00 AM, Corentin via Lib wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr"><br>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Sun, 8 Sep 2019
                      at 11:17, Corentin &lt;<a href="mailto:corentin.jabot@gmail.com" target="_blank">corentin.jabot@gmail.com</a>&gt;
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">
                        <div dir="ltr"><br>
                        </div>
                        <br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Sun, 8
                            Sep 2019 at 09:52, Billy O&#39;Neal (VC LIBS)
                            &lt;<a href="mailto:bion@microsoft.com" target="_blank">bion@microsoft.com</a>&gt;
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div>
                              <div class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396WordSection1">
                                <p class="MsoNormal">&gt; I agree that
                                  EGCS is the best option. That doesn&#39;t
                                  drag locale</p>
                                <p class="MsoNormal"> </p>
                                <p class="MsoNormal">Because we don’t
                                  get to assume that we’re talking about
                                  Unicode at all, it absolutely drags in
                                  locale.</p>
                              </div>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Sorry, I should have been more specific.</div>
                          <div>There is a non-tailored Unicode EGCS
                            boundary algorithm (but it can be tailored)</div>
                          <div>I didn&#39;t mean to imply that text
                            manipulation can be done without knowing its
                            encoding and never use &quot;locale&quot; to mean
                            encoding. </div>
                          <div><br>
                          </div>
                          <div>EGCS are only defined for text whose
                            character repertoire is Unicode, other
                            encodings deal with codepoints</div>
                        </div>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                    <div><br>
                    </div>
                    <div>To be clear, the difference of whether the EGC
                      algorithm is required to be tailored or not is
                      that tailoring for all intent and purposes
                      requires</div>
                    <div>icu or something with CLDR, which restrict the
                      platforms on which this can be implemented <br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>Tailoring is not relevant to this discussion.</p>
            </div>
          </blockquote>
          <div>It is - see <a href="https://unicode.org/reports/tr29/" target="_blank">https://unicode.org/reports/tr29/</a> &quot;ch&quot;
            is 2 EGCS in most locales but in Slovak it&#39;s 1. I don&#39;t make
            the rules :D</div>
        </div>
      </div>
    </blockquote>
    It isn&#39;t relevant in determining how we resolve this issue.  If the
    resolution is that field widths are measured in EGCs, then we&#39;ve
    already decided that the width is locale dependent and tailoring
    becomes an implementation detail.<br></div></blockquote><div><br></div><div>No, format decided to be locale-independent (for good reason) and applying locale specific behavior implicitly would be against that.</div><div>I&#39;n arguing for encoding specific behavior</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <p>The locale dependency stems from the encoding itself
                being dependent on locale.  Again, LANG=C vs
                LANG=C.UTF-8.  If the specified behavior is encoding
                dependent (as it would have to be for field width to be
                a count of any of code points, scalar values, or EGCs),
                then it is also locale dependent (for char and
                wchar_t).  Thus there is a trade off:</p>
              <ol>
                <li>Either the behavior is locale dependent in which
                  case, field widths could be specified such that they
                  count code points, scalar values, or EGCs when the
                  locale selects a Unicode encoding (and something else
                  for non-Unicode encodings), or</li>
                <li>The behavior is not locale dependent in which case,
                  field widths can only be specified in terms of code
                  units.<br>
                </li>
              </ol>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Agreed, but let me rephrase:</div>
          <div><br>
          </div>
          <div>Either a string is text and therefore we need and to know
            its encoding, or it is a sequence of bytes (in the case of
            char)</div>
          <div>I have an opinion about what we are dealing with in this
            context :D</div>
        </div>
      </div>
    </blockquote>
    <p>So your preference is for trade off #1 above and the cost is that
      <tt>std::format</tt> is no longer locale insensitive even in the
      cases where a <tt>std::locale</tt> argument is not provided.</p></div></blockquote><div>It would be _encoding_ sensitive</div><div>It would not change for example the decimal separator.</div><div><br></div><div>When Unicode is involved - and even when it is not, it is I think important not to conflate locale and encoding even if C kinda amalgamates the two and derives one from the other.</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
    <p>Since I don&#39;t think field width works for alignment, even if EGCs
      are used (see Henri&#39;s post - <a class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664moz-txt-link-freetext" href="https://hsivonen.fi/string-length" target="_blank">https://hsivonen.fi/string-length</a>), I
      prefer trade off #2.<br>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <p>Recall that, unless there is a call to <tt>std::setlocale</tt>,
                all C and C++ processes start with the locale set to <tt>&quot;C&quot;</tt></p>
            </div>
          </blockquote>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div bgcolor="#FFFFFF">
              <p> </p>
              <p>Tom.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div><br>
                    </div>
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">
                        <div class="gmail_quote">
                          <div><br>
                          </div>
                          <div><br>
                          </div>
                          <div> </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            <div>
                              <div class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396WordSection1">
                                <p class="MsoNormal"> </p>
                                <p class="MsoNormal">Billy3</p>
                                <p class="MsoNormal"> </p>
                              </div>
                              <hr style="display:inline-block;width:98%">
                              <div id="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri,
                                  sans-serif" color="#000000"><b>From:</b> Lib &lt;<a href="mailto:lib-bounces@lists.isocpp.org" target="_blank">lib-bounces@lists.isocpp.org</a>&gt;
                                  on behalf of Corentin via Lib &lt;<a href="mailto:lib@lists.isocpp.org" target="_blank">lib@lists.isocpp.org</a>&gt;<br>
                                  <b>Sent:</b> Saturday, September 7,
                                  2019 11:08:25 PM<br>
                                  <b>To:</b> Library Working Group &lt;<a href="mailto:lib@lists.isocpp.org" target="_blank">lib@lists.isocpp.org</a>&gt;<br>
                                  <b>Cc:</b> Corentin &lt;<a href="mailto:corentin.jabot@gmail.com" target="_blank">corentin.jabot@gmail.com</a>&gt;;
                                  Victor Zverovich &lt;<a href="mailto:victor.zverovich@gmail.com" target="_blank">victor.zverovich@gmail.com</a>&gt;;
                                  Tom Honermann &lt;<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>&gt;;
                                  <a href="mailto:unicode@isocpp.open-std.org" target="_blank">unicode@isocpp.open-std.org</a>
                                  &lt;<a href="mailto:unicode@open-std.org" target="_blank">unicode@open-std.org</a>&gt;<br>
                                  <b>Subject:</b> Re: [isocpp-lib] New
                                  issue: Are std::format field widths
                                  code units, code points, or something
                                  else?</font>
                                <div> </div>
                              </div>
                              <div>
                                <div dir="auto">
                                  <div><br>
                                    <br>
                                    <div class="gmail_quote">
                                      <div dir="ltr" class="gmail_attr">On
                                        Sun, Sep 8, 2019, 5:30 AM Tom
                                        Honermann via Lib &lt;<a href="mailto:lib@lists.isocpp.org" target="_blank">lib@lists.isocpp.org</a>&gt;
                                        wrote:<br>
                                      </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div bgcolor="#FFFFFF">
                                          <div class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334moz-cite-prefix">On
                                            9/7/19 10:44 PM, Victor
                                            Zverovich wrote:<br>
                                          </div>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div>&gt; <span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im">Is
                                                  field width measured
                                                  in code units, code
                                                  points, or something
                                                  else?</span></div>
                                              <div><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><br>
                                                </span></div>
                                              <div><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"></span>I
                                                think the main
                                                consideration here is
                                                that width should be
                                                locale-independent by
                                                default for consistency
                                                with the rest of
                                                std::format&#39;s design.</div>
                                            </div>
                                          </blockquote>
                                          I agree with that goal, but...<br>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div>If we can say that
                                                width is measured in
                                                grapheme clusters or
                                                code points based on the
                                                execution encoding (or
                                                whatever the standardese
                                                term) without querying
                                                the locale then I
                                                suggest doing so.</div>
                                            </div>
                                          </blockquote>
                                          I don&#39;t know how to do that. 
                                          From my response to Zach, if
                                          code units aren&#39;t used, then
                                          behavior should be different
                                          for LANG=C vs LANG=C.UTF-8.<br>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div>I have slight
                                                preference for grapheme
                                                clusters since those
                                                correspond to
                                                user-perceived
                                                characters, but only
                                                have implementation
                                                experience with code
                                                points (this is what
                                                both the fmt library and
                                                Python do).<br>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <p>I would definitely vote for
                                            EGCs over code points.  I
                                            think code points are
                                            probably the worst of the
                                            options since it makes the
                                            results dependent on Unicode
                                            normalization form.<br>
                                          </p>
                                        </div>
                                      </blockquote>
                                    </div>
                                  </div>
                                  <div dir="auto"><br>
                                  </div>
                                  <div dir="auto">I disagree. Code Units
                                    is the worse option. For me anything
                                    involving code units is a big red
                                    flag. I agree that EGCS is the best
                                    option. That doesn&#39;t drag locale,
                                    might be a bit involved for
                                    implementers in 20. </div>
                                  <div dir="auto">I don&#39;t think specify
                                    EGCS for Unicode text and codepoints
                                    otherwise wouldn&#39;t be too difficult
                                    - implementation might be a bit
                                    challenging on some platforms in the
                                    20 time frame but they could
                                    fallback to codepoints in the
                                    meantime. Not perfect but I think we
                                    need a good long term solution
                                    rather than a bad short term one</div>
                                  <div dir="auto"><br>
                                  </div>
                                  <div dir="auto">
                                    <div class="gmail_quote">
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div bgcolor="#FFFFFF">
                                          <p>Tom.<br>
                                          </p>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><br>
                                                  </span></span></div>
                                              <div><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im">Cheers,</span></span></div>
                                              <div><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im"><span class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044m_-5900481427510438976gmail-m_-7176513910300778324gmail-m_-1423556694114109396m_-5342112777345943334gmail-m_-1131282094399464115m_5127634081229612262gmail-im">Victor</span></span></div>
                                            </div>
                                            <br>
                                            <div class="gmail_quote">
                                              <div dir="ltr" class="gmail_attr">On
                                                Sat, Sep 7, 2019 at 5:13
                                                PM Tom Honermann via Lib
                                                &lt;<a href="mailto:lib@lists.isocpp.org" rel="noreferrer" target="_blank">lib@lists.isocpp.org</a>&gt;
                                                wrote:<br>
                                              </div>
                                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                <div bgcolor="#FFFFFF">
                                                  <p><a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Fformat%23string.std-7&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252854619&amp;sdata=WsHw%2BM62uyiOBrr91P6W1GzwGe313EDe30bKN5i006Q%3D&amp;reserved=0" rel="noreferrer" target="_blank">[format.string.std]p7</a>
                                                    states:</p>
                                                  <blockquote type="cite">
                                                    <p>The <i>positive-integer</i>
                                                      in <i>width</i>
                                                      is a decimal
                                                      integer defining
                                                      the minimum field
                                                      width.  If <i>width</i>
                                                      is not specified,
                                                      there is no
                                                      minimum field
                                                      width, and the
                                                      field width is
                                                      determined based
                                                      on the content of
                                                      the field.</p>
                                                  </blockquote>
                                                  <p>Is field width
                                                    measured in code
                                                    units, code points,
                                                    or something else?</p>
                                                  <p>Consider the
                                                    following example
                                                    assuming a UTF-8
                                                    locale:<br>
                                                  </p>
                                                  <p><tt>std::format(&quot;{}&quot;,
                                                      &quot;\xC3\x81&quot;);    
                                                      // U+00C1</tt><tt>       
                                                      { </tt><tt>LATIN
                                                      CAPITAL LETTER A
                                                      WITH ACUTE }</tt><br>
                                                    <tt>std::format(&quot;{}&quot;,
                                                      &quot;\x41\xCC\x81&quot;);
                                                      // U+0041 U+0301 {
                                                    </tt><tt>LATIN
                                                      CAPITAL LETTER A }
                                                      { </tt><tt>COMBINING
                                                      ACUTE ACCENT }<br>
                                                    </tt></p>
                                                  <p>In both cases, the
                                                    arguments encode the
                                                    same user-perceived
                                                    character (Á).  The
                                                    first uses two UTF-8
                                                    code units to encode
                                                    a single code point
                                                    that represents a
                                                    single glyph using a
                                                    composed Unicode
                                                    normalization form. 
                                                    The second uses
                                                    three code units to
                                                    encode two code
                                                    points that
                                                    represent the same
                                                    glyph using a
                                                    decomposed Unicode
                                                    normalization form.</p>
                                                  <p>How is the field
                                                    width determined? 
                                                    If measured in code
                                                    units, the first has
                                                    a width of 2 and the
                                                    second of 3.  If
                                                    measured in code
                                                    points, the first
                                                    has a width of 1 and
                                                    the second of 2.  If
                                                    measured in grapheme
                                                    clusters, both have
                                                    a width of 1.  Is
                                                    the determination
                                                    locale dependent?</p>
                                                  <p><b>Proposed
                                                      resolution:</b></p>
                                                  <p>Field widths are
                                                    measured in code
                                                    units and are not
                                                    locale dependent.
                                                    Modify <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Fformat%23string.std-7&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252864612&amp;sdata=36WpbP64Oqoi4Pne9kFrEu6nauHLNr2VunnfkvdWcPY%3D&amp;reserved=0" rel="noreferrer" target="_blank">
[format.string.std]p7</a> as follows:</p>
                                                  <blockquote type="cite">
                                                    <p>The <i>positive-integer</i>
                                                      in <i>width</i>
                                                      is a decimal
                                                      integer defining
                                                      the minimum field
                                                      width.  If <i>width</i>
                                                      is not specified,
                                                      there is no
                                                      minimum field
                                                      width, and the
                                                      field width is
                                                      determined based
                                                      on the content of
                                                      the field.  <b><font color="#33cc00">Field width is measured in code units.  Each byte of a
                                                          multibyte
                                                          character
                                                          contributes to
                                                          the field
                                                          width.</font></b><br>
                                                    </p>
                                                  </blockquote>
                                                  <p>(<i>code unit</i>
                                                    is not formally
                                                    defined in the
                                                    standard.  Most uses
                                                    occur in UTF-8 and
                                                    UTF-16 specific
                                                    contexts, but <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Feel.is%2Fc%2B%2Bdraft%2Flex.ext%235&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252864612&amp;sdata=UyG%2Fr7BXuLAPAXP78ekpXS%2FWhqdeU2QCHTmTeBPjImQ%3D&amp;reserved=0" rel="noreferrer" target="_blank">
                                                      [lex.ext]p5</a>
                                                    uses it in an
                                                    encoding agnostic
                                                    context.)<br>
                                                  </p>
                                                  <p>Tom.<br>
                                                  </p>
                                                </div>
_______________________________________________<br>
                                                Lib mailing list<br>
                                                <a href="mailto:Lib@lists.isocpp.org" rel="noreferrer" target="_blank">Lib@lists.isocpp.org</a><br>
                                                Subscription: <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252874608&amp;sdata=ieyJCXmZ0Bj3UfW4Lvi3hW1HlOq6oeEML86Xyry9uFI%3D&amp;reserved=0" rel="noreferrer
                                                  noreferrer" target="_blank">
https://lists.isocpp.org/mailman/listinfo.cgi/lib</a><br>
                                                Link to this post: <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib%2F2019%2F09%2F13440.php&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252874608&amp;sdata=l4UxwaFExnxKireder%2F%2BAnU2mszZXMYatHrd2zGSSWQ%3D&amp;reserved=0" rel="noreferrer
                                                  noreferrer" target="_blank">
http://lists.isocpp.org/lib/2019/09/13440.php</a><br>
                                              </blockquote>
                                            </div>
                                          </blockquote>
                                          <p><br>
                                          </p>
                                        </div>
_______________________________________________<br>
                                        Lib mailing list<br>
                                        <a href="mailto:Lib@lists.isocpp.org" rel="noreferrer" target="_blank">Lib@lists.isocpp.org</a><br>
                                        Subscription: <a href="https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252884602&amp;sdata=B0%2BhF8pSkAy2MbEwWHk1r3uVjbIpvIoQ%2Fi%2BckyTQ94A%3D&amp;reserved=0" rel="noreferrer noreferrer" target="_blank">
https://lists.isocpp.org/mailman/listinfo.cgi/lib</a><br>
                                        Link to this post: <a href="https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib%2F2019%2F09%2F13446.php&amp;data=02%7C01%7Cbion%40microsoft.com%7C92b795de78e843d852bf08d73422ffe8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637035197252894598&amp;sdata=NVwyEiiPWSwvAApse%2FxktecxI6oAiGhUWKjyXw8yYMw%3D&amp;reserved=0" rel="noreferrer noreferrer" target="_blank">
http://lists.isocpp.org/lib/2019/09/13446.php</a><br>
                                      </blockquote>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                </div>
                <br>
                <fieldset class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044mimeAttachmentHeader"></fieldset>
                <pre class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044moz-quote-pre">_______________________________________________
Lib mailing list
<a class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044moz-txt-link-abbreviated" href="mailto:Lib@lists.isocpp.org" target="_blank">Lib@lists.isocpp.org</a>
Subscription: <a class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044moz-txt-link-freetext" href="https://lists.isocpp.org/mailman/listinfo.cgi/lib" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/lib</a>
Link to this post: <a class="gmail-m_7996595089554956195gmail-m_-8006968449179590632gmail-m_4045717672081106664gmail-m_1796657059973223044moz-txt-link-freetext" href="http://lists.isocpp.org/lib/2019/09/13453.php" target="_blank">http://lists.isocpp.org/lib/2019/09/13453.php</a>
</pre>
              </blockquote>
              <p><br>
              </p>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </div>

</blockquote></div></div>
_______________________________________________<br>
Lib mailing list<br>
<a href="mailto:Lib@lists.isocpp.org" target="_blank">Lib@lists.isocpp.org</a><br>
Subscription: <a href="https://lists.isocpp.org/mailman/listinfo.cgi/lib" rel="noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/lib</a><br>
Link to this post: <a href="http://lists.isocpp.org/lib/2019/09/13458.php" rel="noreferrer" target="_blank">http://lists.isocpp.org/lib/2019/09/13458.php</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail-m_7996595089554956195gmail_signature"><div dir="ltr"><div>Be seeing you,<br></div>Tony<br></div></div>
</div></blockquote><blockquote type="cite"><div dir="ltr"><span>_______________________________________________</span><br><span>Lib mailing list</span><br><span><a href="mailto:Lib@lists.isocpp.org" target="_blank">Lib@lists.isocpp.org</a></span><br><span>Subscription: <a href="https://lists.isocpp.org/mailman/listinfo.cgi/lib" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/lib</a></span><br><span>Link to this post: <a href="http://lists.isocpp.org/lib/2019/09/13459.php" target="_blank">http://lists.isocpp.org/lib/2019/09/13459.php</a></span><br></div></blockquote></div></div></blockquote></div></div>