[SG16-Unicode] code_unit_sequence and code_point_sequence

Tom Honermann tom at honermann.net
Wed Jun 20 03:52:05 CEST 2018


On 06/19/2018 04:19 PM, Lyberta wrote:
> keld at keldix.com:
>> Is your code point advisory the same as codepoints in 10646/Unicode, also
>> called characters in 10646?
> Yes. A code point is unsigned 32 bit integer with the values in the
> range of 0-10FFFF. Modern C and C++ have type char32_t which is most
> suitable for holding code points.
>
>> And why not just treat these as 32-bit wchar-t?
>> I believe this is what we do in C.
> Because wide execution character set is implementation defined. So far
> nobody has expressed opinion of changing that and Windows violates the
> standard by having 16 bit wchar_t.

Technically, Windows doesn't violate the standard by having a 16-bit 
wchar_t.  It violates the standard by using a wide execution character 
set that defines code points that do not fit in it's (16-bit) wchar_t 
type.  We have an issue (https://github.com/sg16-unicode/sg16/issues/9) 
to track modifying the standard to enable Microsoft's implementation to 
be conforming.

Tom.


More information about the Unicode mailing list