[SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)
Steve Downey
sdowney at gmail.com
Thu Aug 15 03:06:58 CEST 2019
On Wed, Aug 14, 2019 at 8:54 PM Ed Catmur via Liaison <
liaison at lists.isocpp.org> wrote:
>
>
>
> Note that the compiler already necessarily knows the source file encoding
> and the execution encoding, to be able to perform the various
> [lex.phases].
> Would it be enough or at least help to expose those, or at least the
> latter?
>
>
> The compiler makes assumptions about the source file encoding and
execution encoding. From a standard perspective, it depends on locale, in
some unspecified way. That is, the values of characters in the "execution
character set" depend on locale. Execution encoding isn't actually a term
in the standard, although it's implied.
If the compiler assumes a single byte encoding like Latin-1 it can't tell
that the intended encoding is UTF-8. This happens all the time, and
sometimes actually appears to work when the string literals are eventually
interpreted as UTF-8 instead of Latin-1. Other times, mojibake happens.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190814/97892222/attachment.html
More information about the Unicode
mailing list