<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 8/13/19 8:35 AM, Corentin Jabot
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CA+Om+SjNjeZiSFM8wwuDWFxcdWuKsHUP1x=kRfvmAWPem8sY9g@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <div dir="ltr">Chiming in with my favorite solution:<br>
        </div>
        <div dir="ltr">
          <ul>
            <li>Forbid lossy source -&gt; presumed execution encoding
              conversion (all ready ill formed in gcc but not msvc)</li>
          </ul>
        </div>
      </div>
    </blockquote>
    I think this may be reasonable.<br>
    <blockquote type="cite"
cite="mid:CA+Om+SjNjeZiSFM8wwuDWFxcdWuKsHUP1x=kRfvmAWPem8sY9g@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <ul>
            <li>Forbid u8/u16/u32 literals in non unicode encoded files</li>
          </ul>
        </div>
      </div>
    </blockquote>
    I don't understand this at all.  u8/u16/u32 specify the encoding to
    be used at run-time.  The source file encoding isn't relevant at all
    (as Steve noted, source file characters are converted to internal
    encoding).<br>
    <blockquote type="cite"
cite="mid:CA+Om+SjNjeZiSFM8wwuDWFxcdWuKsHUP1x=kRfvmAWPem8sY9g@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <ul>
            <li>Expose the "presumed execution encoding" (= "narrow/wide
              character literal encoding") as a consteval function
              returning the name as specified by iana <a
href="https://www.iana.org/assignments/character-sets/character-sets.txt"
                moz-do-not-send="true">https://www.iana.org/assignments/character-sets/character-sets.txt</a></li>
          </ul>
        </div>
      </div>
    </blockquote>
    This may be useful, but needs more justification (preferably in the
    form of a paper).<br>
    <br>
    <blockquote type="cite"
cite="mid:CA+Om+SjNjeZiSFM8wwuDWFxcdWuKsHUP1x=kRfvmAWPem8sY9g@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div>I would expect changing the encoding of char would break
            everything... I'd leave char and wchar_t mostly alone and
            start clean on char8_t.</div>
        </div>
      </div>
    </blockquote>
    I agree, but I don't think that will be suffiicent.  Not all
    projects are going to adopt char8_t.  A substantial portion,
    especially on Linux/UNIX systems will choose to continue use of
    UTF-8 using char.  I think we're going to have to provide Unicode
    support for char and char8_t (and char16_t, and perhaps char32_t).<br>
    <blockquote type="cite"
cite="mid:CA+Om+SjNjeZiSFM8wwuDWFxcdWuKsHUP1x=kRfvmAWPem8sY9g@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div><br>
          </div>
          <div>Anyhow, I agree with Tom that the names are not
            indicative</div>
          <div>How about: "narrow/wide character literal encoding" ?</div>
        </div>
      </div>
    </blockquote>
    <p>"execution encoding" has a long history in both WG14 and WG21
      (though not POSIX I think) and that makes me reluctant to try and
      challenge it.  In Slack, discussion, I think Steve Downey probably
      hit on the right approach; provide a formal definition of it.  I
      think we *might* be successful in using "execution encoding" to
      apply to both the compile-time and run-time encodings by extending
      the term with specific qualifiers; e.g., "presumed execution
      encoding" and "run-time/system/native execution encoding".<br>
    </p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite"
cite="mid:CA+Om+SjNjeZiSFM8wwuDWFxcdWuKsHUP1x=kRfvmAWPem8sY9g@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div><br>
          </div>
          <div><br>
          </div>
        </div>
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Tue, 13 Aug 2019 at
            10:39, Niall Douglas &lt;<a
              href="mailto:s_sourceforge@nedprod.com"
              moz-do-not-send="true">s_sourceforge@nedprod.com</a>&gt;
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">Before progressing with a
            solution, can I ask the question:<br>
            <br>
            Is it politically feasible for C++ 23 and C 2x to require<br>
            implementations to default to interpreting source files as
            either (i) 7<br>
            bit ASCII or (ii) UTF-8? To be specific, char literals would
            thus be<br>
            either 7 bit ASCII or UTF-8.<br>
            <br>
            (The reason for the 7 bit ASCII is that it is a perfect
            subset of UTF-8,<br>
            and that C very much wants to retain the language being
            implementable in<br>
            a small code base i.e. without UTF-8 support. Note the
            qualifier<br>
            "default" as well)<br>
            <br>
            An answer to the above would determine how best to solve
            your issue Tom,<br>
            I think. As much as we all expect IBM et al to veto such a
            proposal, one<br>
            never gets anywhere without asking first.<br>
            <br>
            Niall<br>
            <br>
            On 13/08/2019 03:25, Tom Honermann wrote:<br>
            &gt; I agree with this (mostly), but would prefer not to
            discuss further in<br>
            &gt; this thread.  The only reason I included the filesystem
            references is<br>
            &gt; because the wording there uses "native" for an encoding
            that is related<br>
            &gt; (though distinct) from the encodings referenced in the
            codecvt and ctype<br>
            &gt; wording, where "native" is also used.  This suggests
            that "native"<br>
            &gt; serves (or should serve) a role in naming these
            run-time encodings, or<br>
            &gt; is a source of conflation (or both).<br>
            &gt; <br>
            &gt; Tom.<br>
            &gt; <br>
            &gt; On 8/12/19 5:08 PM, Niall Douglas wrote:<br>
            &gt;&gt;&gt;   1. [fs.path.type.cvt]p1 &lt;<a
              href="http://eel.is/c++draft/fs.path.type.cvt#1"
              rel="noreferrer" target="_blank" moz-do-not-send="true">http://eel.is/c++draft/fs.path.type.cvt#1</a>&gt;:<br>
            &gt;&gt;&gt;      (though the definition provided here
            appears to be specific to path<br>
            &gt;&gt;&gt;      names).<br>
            &gt;&gt;&gt;      "The /native encoding/ of an ordinary
            character string is the<br>
            &gt;&gt;&gt;      operating system dependent current
            encoding for path names.  The<br>
            &gt;&gt;&gt;      /native encoding/ for wide character
            strings is the<br>
            &gt;&gt;&gt;      implementation-defined execution
            wide-character set encoding."<br>
            &gt;&gt; We discussed the problems with the choice of
            normative wording in<br>
            &gt;&gt; <a
              href="http://eel.is/c++draft/fs.class.path#fs.path.cvt"
              rel="noreferrer" target="_blank" moz-do-not-send="true">http://eel.is/c++draft/fs.class.path#fs.path.cvt</a>,
            if you remember,<br>
            &gt;&gt; during SG16's discussion of filesystem::path_view.<br>
            &gt;&gt;<br>
            &gt;&gt; The problem is that filesystem paths have different
            encoding and<br>
            &gt;&gt; interpretation per-path-component i.e. for a path<br>
            &gt;&gt;<br>
            &gt;&gt; /A/B/C/D<br>
            &gt;&gt;<br>
            &gt;&gt; ... A, B, C and D may each have its own,
            individual, encoding and<br>
            &gt;&gt; interpretation depending on the mount points and
            filesystems configured<br>
            &gt;&gt; on the current system. This is not what is
            suggested by the current<br>
            &gt;&gt; normative wording, which appears to think that some
            mapping exists<br>
            &gt;&gt; between C++ paths and OS kernel paths.<br>
            &gt;&gt;<br>
            &gt;&gt; There *is* a mapping, but it is 100% C++-side. The
            OS kernel generally<br>
            &gt;&gt; consumes arrays of bytes.<br>
            &gt;&gt;<br>
            &gt;&gt; A more correct normative wording would more clearly
            separate these two<br>
            &gt;&gt; kinds of path representation. OS kernel paths are
            arrays of `byte`, but<br>
            &gt;&gt; with certain implementation-defined byte sequences
            not permitted. C++<br>
            &gt;&gt; paths can be in char, wchar_t, char8_t, char16_t,
            char32_t etc, and<br>
            &gt;&gt; there are well defined conversions between those
            C++ paths and the array<br>
            &gt;&gt; of bytes supplied to the OS kernel. The standard
            can say nothing useful<br>
            &gt;&gt; about how the OS kernel may interpret the byte
            array C++ supplies to it.<br>
            &gt;&gt;<br>
            &gt;&gt; If path_view starts the standards track, I'll need
            to propose a document<br>
            &gt;&gt; fixing up <a
              href="http://eel.is/c++draft/fs.class.path#fs.path.cvt"
              rel="noreferrer" target="_blank" moz-do-not-send="true">http://eel.is/c++draft/fs.class.path#fs.path.cvt</a>
            in any case.<br>
            &gt;&gt; But to come back to your original question, I think
            that you ought to<br>
            &gt;&gt; split off filesystem paths from everything else,
            consider them separate,<br>
            &gt;&gt; and then I think you'll find it much easier to make
            the non-path<br>
            &gt;&gt; normative wording more consistent.<br>
            &gt;&gt;<br>
            &gt;&gt; Niall<br>
            &gt;&gt; _______________________________________________<br>
            &gt;&gt; SG16 Unicode mailing list<br>
            &gt;&gt; <a href="mailto:Unicode@isocpp.open-std.org"
              target="_blank" moz-do-not-send="true">Unicode@isocpp.open-std.org</a><br>
            &gt;&gt; <a
              href="http://www.open-std.org/mailman/listinfo/unicode"
              rel="noreferrer" target="_blank" moz-do-not-send="true">http://www.open-std.org/mailman/listinfo/unicode</a><br>
            &gt; <br>
            &gt; <br>
            _______________________________________________<br>
            SG16 Unicode mailing list<br>
            <a href="mailto:Unicode@isocpp.open-std.org" target="_blank"
              moz-do-not-send="true">Unicode@isocpp.open-std.org</a><br>
            <a href="http://www.open-std.org/mailman/listinfo/unicode"
              rel="noreferrer" target="_blank" moz-do-not-send="true">http://www.open-std.org/mailman/listinfo/unicode</a><br>
          </blockquote>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
SG16 Unicode mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Unicode@isocpp.open-std.org">Unicode@isocpp.open-std.org</a>
<a class="moz-txt-link-freetext" href="http://www.open-std.org/mailman/listinfo/unicode">http://www.open-std.org/mailman/listinfo/unicode</a>
</pre>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>