[SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
Corentin Jabot
corentinjabot at gmail.com
Wed Aug 14 14:43:00 CEST 2019
On Wed, Aug 14, 2019, 2:31 PM Tom Honermann <tom at honermann.net> wrote:
> On 8/14/19 2:49 AM, Corentin Jabot via Core wrote:
>
>
>
> On Wed, Aug 14, 2019, 4:46 AM Tony V E <tvaneerd at gmail.com> wrote:
>
>>
>>
>> On Tue, Aug 13, 2019 at 8:57 AM Corentin Jabot <corentinjabot at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, 13 Aug 2019 at 14:52, Ville Voutilainen <
>>> ville.voutilainen at gmail.com> wrote:
>>>
>>>> On Tue, 13 Aug 2019 at 15:35, Corentin Jabot via Core
>>>> <core at lists.isocpp.org> wrote:
>>>> >
>>>> >
>>>> > Chiming in with my favorite solution:> Forbid u8/u16/u32 literals in
>>>> non unicode encoded files
>>>>
>>>> But presumably not the ones that look like u8"\U1234" ?
>>>>
>>>
>>> Yes, there is no reason to disallow that as It can't be misinterpreted
>>> by neither the compiler or people (and quite a lot of code would needlessly
>>> break)
>>>
>>>
>> I find your lack of faith in people's ability to misinterpret something
>> disturbing.
>> :-)
>>
>
> 😁 (Challenging your mail client)
>
>
> \Uxxxx is unambiguous.
>
> u8"é" is ambiguous. Both people and the compiler may interpret that in a
> variety of ways. Notably if I have utf-8 in that file, which I wrote on
> Linux, but then the msvc compiler thinks it's windows 1252...
> Mojibake.
>
> There is no ambiguity there, just bog standard mojibake due to incorrect
> source file encoding assumptions. "é" has exactly the same set of
> "problems" as L"é", u8"é", u"é", and U"é".
>
Yes. People make assumptions, compilers make assumptions and voilà,
mojibake. Assuming that all parties involved have the same intent and
assumptions is the issue. Preventing wrong assumptions reduce the amount of
mojibake
>
>
> People also seem to be confused
>
>
> https://stackoverflow.com/questions/23471935/how-are-u8-literals-supposed-to-work
>
> Yes, that is a typical example of someone learning that source file
> encoding and execution encoding can be independently controlled. Note that
> the example even illustrates the individual being confused about handling
> of u8 literals and *then* becoming confused about handling of ordinary
> literals after learning about gcc's -finput-charset option (but
> apparently having not yet learned about gcc's -fexec-charset option).
>
Yes. I would make the bold claim (I don't have data) that most people are
confused about strings, even more so in the context of C++. The current
model makes it difficult to do the right thing and easy to create mojibake.
Tom.
>
>
>
>> --
>> Be seeing you,
>> Tony
>>
>
> _______________________________________________
> Core mailing listCore at lists.isocpp.org
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2019/08/7049.php
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190814/6ea25f8f/attachment.html
More information about the Unicode
mailing list