[SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)
Corentin
corentin.jabot at gmail.com
Wed Aug 14 19:55:41 CEST 2019
On Wed, Aug 14, 2019, 7:36 PM Niall Douglas <s_sourceforge at nedprod.com>
wrote:
> > The present implementation-defined interpretation of the byte sequence
> in
> > source files allows a default of "UTF-8 in strings, comments can use
> > arbitrary bytes" (which thus allows existing source files in a range of
> > ASCII-compatible 8-bit character sets if the non-ASCII characters only
> > appear in comments, without needing to tell the compiler which character
> > set is being used). That approach (which is what GCC does by default)
> > seems more friendly to users with existing source files using various
> > character sets in comments than strictly requiring everything to be
> UTF-8
> > (even in comments) unless the compiler is explicitly told otherwise.
>
> I would find that choice unhelpful for tooling which processes C++
> source code. e.g. Python, which insists that text you feed it is either
> correct, or not text. And that's not unreasonable, either text is
> encoded correctly, or it is not.
>
> What do you think of my "all 7-bit clean ASCII" proposal? #pragma
> encoding (if supported by your C compiler) to opt out.
>
That seems like a step backwards. It's basically what people have had to do
for the past 40 years.
As always here the issue is we lack data about what people actually put in
their strings :(
>
> Niall
> _______________________________________________
> SG16 Unicode mailing list
> Unicode at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/unicode
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190814/442f1cff/attachment.html
More information about the Unicode
mailing list