[SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

Billy O'Neal (VC LIBS) bion at microsoft.com
Sun Sep 8 09:48:38 CEST 2019


> Grapheme breaking is simple, and requires no locale info.

The encoding that goes with char* is part of the locale. Where the breaks go in a shift-jis stream is probably different than where they go in a UTF-8 stream or a latin-1 stream.

Billy3

________________________________
From: Lib <lib-bounces at lists.isocpp.org> on behalf of Zach Laine via Lib <lib at lists.isocpp.org>
Sent: Saturday, September 7, 2019 6:11:47 PM
To: Library Working Group <lib at lists.isocpp.org>
Cc: Zach Laine <whatwasthataddress at gmail.com>; Tony V E <tvaneerd at gmail.com>; Tom Honermann <tom at honermann.net>; unicode at isocpp.open-std.org <unicode at open-std.org>
Subject: Re: [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

On Sat, Sep 7, 2019 at 7:31 PM Tom Honermann via Lib <lib at lists.isocpp.org<mailto:lib at lists.isocpp.org>> wrote:
On 9/7/19 8:27 PM, Tony V E wrote:
I think we would want it to be measured in glyphs.
I agree that would be ideal, but...

Stop right there.  If that's ideal, let's do that.  Or at least, let's leave room for it to be done at some point.  Specifying CUs now prevents the ideal from ever being realized.
Are you suggesting code points because glyphs are too hard?
I don't know how to achieve that.  Field width doesn't really work for alignment unless one assumes a monospace font.  We could measure in terms of extended grapheme clusters, but EGCS width has changed over time (e.g., family emoji).  That makes alignment dependent on both display properties and Unicode version.  And, of course, this would drag in locale dependence as well.

If you just count N=EGCs, you get the "right" answer.  if your terminal shows more or less than N characters, get a new terminal.  What I mean by this is that there should be no consideration of fonts.

As for the need for a locale, I don't get that.  Grapheme breaking is simple, and requires no locale info.  Do you mean Unicode data?  Picking a version and sticking with it should be sufficient.  No system that I know of has multiple Unicode versions to pick from programatically.

Zach

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190908/3ce5175e/attachment.html 


More information about the Unicode mailing list