<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 9/9/19 10:31 AM, Tony V E wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Mon, Sep 9, 2019 at 3:31
            AM Corentin &lt;<a href="mailto:corentin.jabot@gmail.com"
              moz-do-not-send="true">corentin.jabot@gmail.com</a>&gt;
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div dir="ltr"><br>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Mon, 9 Sep 2019 at
                  01:25, Tom Honermann &lt;<a
                    href="mailto:tom@honermann.net" target="_blank"
                    moz-do-not-send="true">tom@honermann.net</a>&gt;
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div dir="auto"><br>
                    <div dir="ltr">On Sep 8, 2019, at 3:31 PM, Tony V E
                      via Lib &lt;<a href="mailto:lib@lists.isocpp.org"
                        target="_blank" moz-do-not-send="true">lib@lists.isocpp.org</a>&gt;
                      wrote:<br>
                      <br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div dir="ltr">
                          <div>Do we have / could we have / should we
                            have</div>
                          <div>a clear long term (20 years) direction
                            for text in C++?<br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                    I would like that very much, but we don’t control
                    the ecosystem, and will have to, to some degree,
                    roll with where the community takes us. </div>
                </blockquote>
                <div><br>
                </div>
                <div>The community is waiting for us to catch up and i
                  do believe we have some control</div>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>yep, every other language just decided for the community.</div>
        </div>
      </div>
    </blockquote>
    <p>That is not correct.  Examples include C, Fortran, and COBOL.  In
      general, I think languages that decided for the community had a
      few advantages that we do not:</p>
    <ol>
      <li>Less history and legacy code to support.<br>
      </li>
      <li>Fewer implementations.</li>
      <li>Designed with more abstractions (e.g., VM languages) that
        enabled sandboxing the language environment (with associated
        performance costs).<br>
      </li>
      <li>Designed after Unicode was standardized.<br>
      </li>
    </ol>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>As C++, we have to allow the user to do _anything_, but
            they already can.  And they will still be able to.</div>
        </div>
      </div>
    </blockquote>
    Indeed, but as a standard, one of our responsibilities is to produce
    a specification that reflects existing practice.  We can (and
    should) lead, but need to remain focused on support for existing
    code as well.  I worry about repeating the Python 2-&gt;3 experience
    if we aren't careful.<br>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div class="gmail_quote">
                <div> </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div dir="auto">
                    <div><br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div dir="ltr">
                            <div><br>
                            </div>
                            <div>ie the long term direction is unicode.</div>
                            <div>and/or specifically the long term
                              direction is UTF8.</div>
                          </div>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                      I think we do have wide spread agreement on that,
                      though UTF-16 is likely to remain strongly
                      relevant in some niches. </div>
                    <div><br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div dir="ltr">
                            <div>We expect everyone to use char8_t
                              then?  Or we expect char to become utf8
                              someday?</div>
                          </div>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                      <div>I think it is very unlikely that there will
                        be a mass migration to char8_t. My expectation
                        is that it will be used for the internal
                        encoding within some percentage of new projects
                        and components. </div>
                      <div><br>
                      </div>
                      <div>With regard to char, I expect it to remain
                        the type used for text that may or may not be
                        UTF-8.</div>
                      <div><br>
                      </div>
                      <div>I think Microsoft will eventually provide
                        (non-experimental) means to use UTF-8 with Win32
                        and that this will likely come in three forms </div>
                    </div>
                  </div>
                </blockquote>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div dir="auto">
                    <div>
                      <div><br>
                      </div>
                      <div>1) support for UTF-8 as the system wide
                        Active Code Page (ACP). This is already
                        available as an experimental option. </div>
                    </div>
                  </div>
                </blockquote>
                <div><br>
                </div>
                <div>They di</div>
                <div> </div>
                <blockquote class="gmail_quote" style="margin:0px 0px
                  0px 0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div dir="auto">
                    <div>
                      <div><br>
                      </div>
                      <div>2) support for executables to opt-in to a
                        per-process override of the system wide ACP. In
                        this mode, stdio would presumably traffic in the
                        system wide ACP and require transcoding (I don’t
                        think implicit transcoding is realistic). This
                        is already available as an experimental option. </div>
                    </div>
                  </div>
                </blockquote>
                <div><br>
                </div>
                <div><br>
                </div>
                <div>They do</div>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>How does "override system wide ACP" and "stdio traffic in
            system wide ACP" fit together?  Either my process thinks the
            world is on the UTF8 ACP, or it doesn't.  I would expect
            transcoding or whatever else is required.  I would expect
            fopen to work, etc.<br>
          </div>
        </div>
      </div>
    </blockquote>
    Basically, the option (a declaration in a manifest file) causes the
    Win32 "ANSI" APIs to work in UTF-8 mode for that process only. 
    Other processes on the system that don't opt-in to the option run
    with whatever the system ACP is.  So, any information exchanged
    between them will require transcoding.  I would expect implicit
    transcoding for command line options and environment variables
    (those are already implicitly transcoded from their wide variants),
    but stdio is unaffected.  So, piped data between processes that both
    adhere to (their perception of) the ACP would require intervention. 
    But, stdio can be binary anyway.  And executable written in some
    other languages expect UTF-8 regardless, so I don't think this is a
    significant issue.<br>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>If that works, I believe almost every Windows developer
            will turn this on, and char will be utf8 (as it is on linux,
            IIUC).</div>
          <div>Most code will "just work".</div>
        </div>
      </div>
    </blockquote>
    <p>Quite possibly.</p>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>In 10 years, it will be the assumption.</div>
        </div>
      </div>
    </blockquote>
    Representatives at Microsoft have so far stated that their testing
    of the UTF-8 ACP option revealed that it breaks too many widely
    deployed applications for them to make it a default at this point. 
    And their strong commitment to backward compatibility may invite a
    longer migration period.<br>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>I think we sure steer in the direction that char becomes
            UTF8.</div>
        </div>
      </div>
    </blockquote>
    I agree, and that is what is already happening.<br>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>In the short term we could say char is whatever the
            system is in, but we encourage UTF8.  Or something like
            that.  Maybe the standard "assumes" UTF8, but
            implementations are allowed to vary.  Whatever "assumes"
            means for a given API.</div>
        </div>
      </div>
    </blockquote>
    I think that is the status quo.  We could add a non-normative note
    encouraging UTF-8, but I think the likelihood of any greenfield
    project picking anything else is highly unlikely.<br>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div>We could define things like fmt to be "if the system is
            UTF8, then behaviour is X, otherwise YMMV (ie implementation
            defined)".</div>
        </div>
      </div>
    </blockquote>
    <p>We could.  But that makes the behavior locale dependent because,
      on most platforms, that is the reality.<br>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote"><br clear="all">
        </div>
        <br>
        -- <br>
        <div dir="ltr" class="gmail_signature">
          <div dir="ltr">
            <div>Be seeing you,<br>
            </div>
            Tony<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>