[SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
Tom Honermann
tom at honermann.net
Wed Aug 14 14:31:57 CEST 2019
On 8/14/19 2:49 AM, Corentin Jabot via Core wrote:
>
>
> On Wed, Aug 14, 2019, 4:46 AM Tony V E <tvaneerd at gmail.com
> <mailto:tvaneerd at gmail.com>> wrote:
>
>
>
> On Tue, Aug 13, 2019 at 8:57 AM Corentin Jabot
> <corentinjabot at gmail.com <mailto:corentinjabot at gmail.com>> wrote:
>
>
>
> On Tue, 13 Aug 2019 at 14:52, Ville Voutilainen
> <ville.voutilainen at gmail.com
> <mailto:ville.voutilainen at gmail.com>> wrote:
>
> On Tue, 13 Aug 2019 at 15:35, Corentin Jabot via Core
> <core at lists.isocpp.org <mailto:core at lists.isocpp.org>> wrote:
> >
> >
> > Chiming in with my favorite solution:> Forbid u8/u16/u32
> literals in non unicode encoded files
>
> But presumably not the ones that look like u8"\U1234" ?
>
>
> Yes, there is no reason to disallow that as It can't be
> misinterpreted by neither the compiler or people (and quite a
> lot of code would needlessly break)
>
>
> I find your lack of faith in people's ability to misinterpret
> something disturbing.
> :-)
>
>
> 😁 (Challenging your mail client)
>
>
> \Uxxxx is unambiguous.
>
> u8"é" is ambiguous. Both people and the compiler may interpret that in
> a variety of ways. Notably if I have utf-8 in that file, which I wrote
> on Linux, but then the msvc compiler thinks it's windows 1252...
> Mojibake.
There is no ambiguity there, just bog standard mojibake due to incorrect
source file encoding assumptions. "é" has exactly the same set of
"problems" as L"é", u8"é", u"é", and U"é".
>
>
> People also seem to be confused
>
> https://stackoverflow.com/questions/23471935/how-are-u8-literals-supposed-to-work
Yes, that is a typical example of someone learning that source file
encoding and execution encoding can be independently controlled. Note
that the example even illustrates the individual being confused about
handling of u8 literals and *then* becoming confused about handling of
ordinary literals after learning about gcc's -finput-charset option (but
apparently having not yet learned about gcc's -fexec-charset option).
Tom.
>
>
> --
> Be seeing you,
> Tony
>
>
> _______________________________________________
> Core mailing list
> Core at lists.isocpp.org
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2019/08/7049.php
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190814/775f701b/attachment-0001.html
More information about the Unicode
mailing list