[SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

Lyberta lyberta at lyberta.net
Mon Sep 9 19:42:00 CEST 2019


Tony V E:
> Do we have / could we have / should we have
> a clear long term (20 years) direction for text in C++?
> 
> ie the long term direction is unicode.
> and/or specifically the long term direction is UTF8.
> We expect everyone to use char8_t then?  Or we expect char to become utf8
> someday?
> What do we want the long term future to look like?
> deprecate std::string?
Here's how I imagine perfect text APIs in C++:

Remove char and wchar_t and remove all library APIs that depend on them.
That includes iostreams, std::char_traits, std::basic_string,
std::regex, std::format, entire C library for text, etc.

Replace char8_t with std::unicode::utf8_code_unit.
Replace char16_t with std::unicode::utf16_code_unit.
Replace char32_t with std::unicode::utf32_code_unit.

Add new literals:
utf8'a' -> std::unicode::utf8_code_unit.
utf16'a' -> std::unicode::utf16_code_unit.
utf32'a' -> std::unicode::utf32_code_unit.

utf8"foo" -> const std::unicode::utf8_code_unit[3].
utf16"foo" -> const std::unicode::utf16_code_unit[3].
utf32"foo" -> const std::unicode::utf32_code_unit[3].

Note the lack of NUL-termination.

Add Unicode libraries:
Storage:
std::unicode::code_unit_sequence
std::unicode::scalar_value_sequence
std::unicode::grapheme_cluster_sequence
std::unicode::text

View versions of those.

New Unicode streams, new Unicode regex, new Unicode formatting, etc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://www.open-std.org/pipermail/unicode/attachments/20190909/f1669366/attachment.bin 


More information about the Unicode mailing list