<div dir="auto">That's what the standard refers to now as the internal encoding.<div dir="auto"><br></div><div dir="auto"><a href="http://eel.is/c++draft/lex#phases-1.1.sentence-4" rel="noreferrer noreferrer" target="_blank">http://eel.is/c++draft/lex#phases-1.1.sentence-4</a><br></div><div dir="auto"><br></div><div dir="auto"><div dir="auto">An implementation may use any internal
encoding, so long as an actual extended character encountered in the
source file, and the same extended character expressed in the source
file as a <i><a href="http://eel.is/c++draft/lex#nt:universal-character-name" target="_blank" rel="noreferrer">universal-character-name</a></i> (e.g., using the \
uXXXX notation), are handled equivalently
except where this replacement is reverted (<a href="http://eel.is/c++draft/lex#pptoken" target="_blank" rel="noreferrer">[lex.pptoken]</a>) in a raw string literal<a href="http://eel.is/c++draft/lex#phases-1.1.sentence-4" target="_blank" rel="noreferrer">.</a></div></div><div dir="auto"><br></div><div dir="auto">Now, this doesn't quite require that the internal encoding be Unicode. If I'm reading it correctly, it could be lossy. However, given the other requirements around u literals, it's somewhat unlikely. It might be worth exploring making it an explicit requirement that the internal encoding be some unspecified unicode transform, so even if it's utf-ebcdic, that's ok.</div><div dir="auto"><br></div><div dir="auto">All of this language in the standard seems to have been drafted between 94 and 98, and doesn't correspond well to current nomenclature around character encodings. It also comes from a time when it wasn't clear that programs would routinely have to deal with multiple encodings at the same time during their lifetime, and that one of the most common would be a multibyte encoding. </div><div dir="auto"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Aug 15, 2019, 07:55 Lyberta <<a href="mailto:lyberta@lyberta.net" rel="noreferrer noreferrer" target="_blank">lyberta@lyberta.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">There is so much discussion and misunderstandings about C++ charsets in<br>
the adjacent thread and on the Internet. Maybe we can simplify this a bit.<br>
<br>
I propose we add an "Intermediate Character Set" and define it as<br>
implementation-defined Unicode encoding form.<br>
<br>
Then we add rules like these:<br>
<br>
When compiling TU, a text in source charset gets converted to<br>
intermediate charset before preprocessor. This eliminates any ambiguity<br>
about string literals and comments.<br>
<br>
Pretty much all text operations during compilation work in terms of<br>
intermediate charset.<br>
<br>
As the last step before writing an object file text data gets converted<br>
to various "execution" encodings.<br>
<br>
This will allow us to write standardese in the framework of Unicode but<br>
still allows exotic charsets as input and output.<br>
<br>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" rel="noreferrer noreferrer noreferrer" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div>