[SG16-Unicode] Fundamental Unicode types

Lyberta lyberta at lyberta.net
Thu Mar 28 10:32:00 CET 2019


> charX_t have a requirement to be code units in C++20.

But they do not have the requirements of holding the valid values hence
they are still dangerous.

> 
> We also really do not want to have code units API. Because you can not do
> anything useful with it.
> Especially iterating over code units or querying the properties of code
> units is something that is probably not useful ever (and has a propency to
> be missed used)

This is a transition mechanism from std::basic_string and many other
legacy string classes. Besides, all those functions are required when
implementing encoding forms so I decided to expose them. I don't think
they are harmful.

> 
> Scalar value and grapheme views are useful indeed Imo. Text is useful but
> it's basically something that can spawn a scalar or grapheme view with some
> storage, high level invariants and state.

Current consensus is that if you call std::begin on std::text, it will
return grapheme cluster iterator, I'd personally use .to_graphemes() for
that. So for now I plan to implement it as a distinct type. I haven't
yet implemented grapheme cluster level so I don't have insights yet.

> Lastly, I am very concerned about a design that would throw by default.
> Especially something like domain_error. It basically means I wouldn't use
> any standard Unicode facilities and nor would people in a lot of Industries
> (games, embedded etc).

I do plan to rebase my proposal on top of p0709. That way those people
won't have any excuse any longer :P

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://www.open-std.org/pipermail/unicode/attachments/20190328/91ed977d/attachment.bin 


More information about the Unicode mailing list