<div dir="ltr">Initial thoughts:<br>I believe the wording in filesystem is a red herring. It's there to deal with the fact that actual file systems, even on a single OS, will have different notions of the encoding of paths. It's more related to a cooked vs uncooked distinction. I certainly don't think there was intention for the wording there to apply outside that part of the filesystem components in the library. <br><br>I also believe that "execution character set" is used in opposition to the "source character set", and it is applied to the translation of string literals because that's when it comes up. On the other hand, this may be pre-locale wording that has survived, at least partly because no one wants to touch locale. </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Aug 12, 2019 at 12:09 PM Tom Honermann via Core <<a href="mailto:core@lists.isocpp.org">core@lists.isocpp.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>I (and SG16 in general) have been using the term "execution
character set" and "execution encoding" to refer to both the
encoding known at compile-time that is used to encode character
and string literals and the locale dependent encoding specified by
the LC_CTYPE locale category that is used at run-time by the
character classification and conversion functions. When necessary
to avoid confusion, I've been referring to the former as the
"presumed execution encoding" and the latter as simply the
"run-time execution encoding".<br>
</p>
<p>A <a href="https://www.reddit.com/r/cpp/comments/bfyp6x/overview_of_stdfilesystem_my_talk/" target="_blank">discussion</a>
[1] with user 'alfps' on an r/cpp Reddit thread alerted me to the
possibility that I/we have been using this term incorrectly. I
spent some time looking at both the C and C++ standards and there
does appear to be evidence that "execution character set"
(encoding) refers solely to the encoding known at compile-time
that is used to encode literals. But there doesn't seem to be a
clear term defined for the locale dependent run-time encoding that
governs the behavior of the character classification and
conversion functions. There is some evidence for this encoding
being referred to using the term "native".</p>
<p>From the C++ standard:<br>
</p>
<ol>
<li><a href="http://eel.is/c++draft/fs.path.type.cvt#1" target="_blank">[fs.path.type.cvt]p1</a>:
(though the definition provided here appears to be specific to
path names).<br>
"The <font color="#ff6600"><i>native encoding</i></font> of an
ordinary character string is the operating system dependent
current encoding for path names. The <font color="#ff6600"><i>native
encoding</i></font> for wide character strings is the
implementation-defined execution wide-character set encoding."</li>
<li><a href="http://eel.is/c++draft/fs.path.type.cvt#2.1" target="_blank">[fs.path.type.cvt]p2.1</a>:
(This paragraph, the next one, and p8 (not listed here)
constitute the only uses of "native (ordinary|wide) encoding" in
the C++ standard).<br>
"<tt>char</tt>: The encoding is the <font color="#ff6600">native
ordinary encoding</font>. ..."</li>
<li><a href="http://eel.is/c++draft/fs.path.type.cvt#2.2" target="_blank">[fs.path.type.cvt]p2.2</a>:<br>
"<tt>wchar_t</tt>: The encoding is the <font color="#ff6600">native
wide encoding</font>. ..."</li>
<li><a href="http://eel.is/c++draft/locale.codecvt#3" target="_blank">[locale.codecvt]p3</a>:<br>
"The specializations required in Table 101 ([locale.category])
convert the implementation-defined <font color="#ff6600">native
character set</font>. ... <tt>codecvt<wchar_t, char,
mbstate_t></tt> converts between the <font color="#ff6600">native character sets</font> for ordinary and
wide characters. ..."</li>
<li><a href="http://eel.is/c++draft/category.ctype#locale.ctype-2" target="_blank">[locale.ctype]p2</a>:<br>
"The specializations required in Table 101 ([locale.category]),
namely <tt>ctype<char></tt> and <tt>ctype<wchar_t></tt>,
implement character classing appropriate to the implementation's
<font color="#ff6600">native character set</font>."<br>
</li>
</ol>
<p>As far as I can tell, none of the highlighted terms above appear
in the C17 standard, but "native environment" appears in a related
wording:<br>
</p>
<ul>
<li>7.11.1.1p3 "The setlocale function":<br>
"A value of "C" for locale specifies the minimal environment for
C translation; a value of "" for locale specifies the
locale-specific <font color="#ff6600">native environment</font>.
Other implementation-defined strings may be passed as the second
argument to setlocale."</li>
</ul>
<p>C17 suggests that "extended character set" may also be the right
term:<br>
</p>
<ul>
<li>7.22p3 "General utilities <stdlib.h>":<br>
"... that is the maximum number of bytes in a multibyte
character for the <font color="#ff6600">extended character set</font>
specified by the current locale (category <tt>LC_CTYPE</tt>),
which is never greater than MB_LEN_MAX."</li>
</ul>
<p>However, the C++ standard states (non-normatively) that the
"extended character set" extends the basic source character set
and (normatively) that it applies to both the source and execution
character sets:<br>
</p>
<ul>
<li><a href="http://eel.is/c++draft/intro.defs#defns.multibyte" target="_blank">[defns.multibyte]</a>:<br>
"[ Note: The <font color="#ff6600">extended character set</font>
is a superset of the basic character set ([lex.charset]). — end
note ]"</li>
<li><a href="http://eel.is/c++draft/lex.phases#1.1" target="_blank">[lex.phases]p1</a>:<br>
"... An implementation may use any internal encoding, so long as
an actual <font color="#ff6600">extended character</font>
encountered in the source file, and the same <font color="#ff6600">extended character</font> expressed in the
source file as a universal-character-name (e.g., using the <tt>\uXXXX</tt>
notation), are handled equivalently except where this
replacement is reverted ([lex.pptoken]) in a raw string
literal."</li>
<li><a href="http://eel.is/c++draft/basic.fundamental#8" target="_blank">[basic.fundamental]p8</a>:<br>
"... The values of type <tt>wchar_t</tt> can represent
distinct codes for all members of the largest <font color="#ff6600">extended character set</font> specified among
the supported locales ([locale])."<br>
</li>
</ul>
<p>So, what term should we be using here? Perhaps a core issue
should be opened for this? A brief search didn't reveal an
existing one.</p>
<p>(note: you may need to click "continue this thread" when reading
the Reddit thread to see all relevant comments).<br>
</p>
<p>Tom.</p>
<p>[1]:
<a class="gmail-m_-5355867925327608624moz-txt-link-freetext" href="https://www.reddit.com/r/cpp/comments/bfyp6x/overview_of_stdfilesystem_my_talk/" target="_blank">https://www.reddit.com/r/cpp/comments/bfyp6x/overview_of_stdfilesystem_my_talk/</a><br>
</p>
</div>
_______________________________________________<br>
Core mailing list<br>
<a href="mailto:Core@lists.isocpp.org" target="_blank">Core@lists.isocpp.org</a><br>
Subscription: <a href="https://lists.isocpp.org/mailman/listinfo.cgi/core" rel="noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/core</a><br>
Link to this post: <a href="http://lists.isocpp.org/core/2019/08/7026.php" rel="noreferrer" target="_blank">http://lists.isocpp.org/core/2019/08/7026.php</a><br>
</blockquote></div>