[SG16-Unicode] String views with strong code unit types
Steve Downey
sdowney at gmail.com
Tue Jun 4 13:11:59 CEST 2019
That literals aren't required to be well formed is a subset of the problem
that char8_t data may have come from anywhere and can't be assumed to be
well formed. Real world text is frequently broken.
On Tue, Jun 4, 2019, 06:27 JeanHeyd Meneide <phdofthehouse at gmail.com> wrote:
> On Tue, Jun 4, 2019 at 5:39 AM Lyberta <lyberta at lyberta.net> wrote:
>
>> We can always modify the standard so that we get strong types via
>> compiler magic. I was thinking:
>>
>> utf8'a' -> std::unicode::utf8_code_unit
>> utf16'a' -> std::unicode::utf16_code_unit
>> utf32'a' -> std::unicode::utf32_code_unit
>> utf8"a" -> std::unicode::utf8_code_unit_sequence_view
>> utf16"a" -> std::unicode::utf16_code_unit_sequence_view
>> utf32"a" -> std::unicode::utf32_code_unit_sequence_view
>>
>> Well, that's future. I want something I can use now.
>>
>> Also, does the standard require well formed sequences in literals?
>>
>
> No, we lobbied specifically that you can insert "ill-formed" sequences
> (e.g., not perfectly well formed Unicode Scalar Values) into string
> literals. This is specifically to enable people who need literals of types
> that are not exactly conformant for various reasons (testing, or
> specifically creating WTF8/CESU8/etc. literals, and more).
>
> Granted, the only way you can do this is by writing `\x` values
> specifically in the string literal: it's a very powerful show that someone
> is doing something non-standard. That doesn't mean you can't assume
> char8_t, char16_t, and char32_t are not well-formed: if someone's shoving
> in direct code unit values with backslash-X syntax, you have to assume they
> are a Very Smart Person Who Knows What They Are Getting Themselves Into.
> _______________________________________________
> SG16 Unicode mailing list
> Unicode at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/unicode
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190604/51190d5f/attachment.html
More information about the Unicode
mailing list