[SG16-Unicode] code_unit_sequence and code_point_sequence
Sergey Zubkov
cubbi at cubbi.com
Tue Jun 19 22:34:30 CEST 2018
> if you truly want to work with text, you usually need to work on the
layer above code points - grapheme clusters.
What makes interactive selection (which uses GCs) more "true" than
rendering or collation (to give two examples of work with text that use
other kinds of code point sequences)?
Perhaps we could use a list of uses.
On Tue, Jun 19, 2018, 4:19 PM Lyberta <lyberta at lyberta.net> wrote:
> keld at keldix.com:
> > Is your code point advisory the same as codepoints in 10646/Unicode, also
> > called characters in 10646?
>
> Yes. A code point is unsigned 32 bit integer with the values in the
> range of 0-10FFFF. Modern C and C++ have type char32_t which is most
> suitable for holding code points.
>
> > And why not just treat these as 32-bit wchar-t?
> > I believe this is what we do in C.
>
> Because wide execution character set is implementation defined. So far
> nobody has expressed opinion of changing that and Windows violates the
> standard by having 16 bit wchar_t.
>
> > Then you can have functions converting to and from wchar-t.
>
> Yes, except if you convert text to UTF-32 before processing it, you will
> waste memory and a lot of interfaces still expect char*. More
> importantly, if you truly want to work with text, you usually need to
> work on the layer above code points - grapheme clusters.
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/unicode
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20180619/9ae7294b/attachment.html
More information about the Unicode
mailing list