[SG16-Unicode] Strong code unit types

Tom Honermann tom at honermann.net
Wed Dec 5 07:40:27 CET 2018


On 12/4/18 11:17 PM, Lyberta wrote:
> This is something that hit me recently. Why are we using fundamental
> types for code units? CppCon 2018 is full of people saying that we
> should migrate to strong types, that std::size_t should have been a
> struct, etc.
The primary reason for using fundamental types for code units is that 
those are the types used for character and string literals.
>
> I propose we add strong types for code units:
>
> * utf8_code_unit
> * utf16_code_unit
> * utf32_code_unit
>
> These will hold char8,16,32_t inside of them respectively but will not
> allow the invalid values such as >245 for UTF-8, surrogates and
>> 0x10FFFF for UTF-32, etc.
> This will guarantee that all code units are valid and will allow us to
> write much faster code because we will never need to check for invalid
> values.

The downside of such validating types is the validation overhead.

I am in favor of introducing strong types for code points.

Tom.



More information about the Unicode mailing list