[SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

Tom Honermann tom at honermann.net
Sun Sep 8 18:29:12 CEST 2019


On 9/7/19 11:25 PM, Tom Honermann via Lib wrote:
> On 9/7/19 9:11 PM, Zach Laine wrote:
>> On Sat, Sep 7, 2019 at 7:31 PM Tom Honermann via Lib 
>> <lib at lists.isocpp.org <mailto:lib at lists.isocpp.org>> wrote:
>>
>>     On 9/7/19 8:27 PM, Tony V E wrote:
>>>     I think we would want it to be measured in glyphs.
>>     I agree that would be ideal, but...
>>
>>
>> Stop right there.  If that's ideal, let's do that.  Or at least, 
>> let's leave room for it to be done at some point.  Specifying CUs now 
>> prevents the ideal from ever being realized.
> There are other options.  For example, a future extension could allow 
> specifying what units are to be used for field width.
>>
>>>     Are you suggesting code points because glyphs are too hard?
>>     I don't know how to achieve that.  Field width doesn't really
>>     work for alignment unless one assumes a monospace font.  We could
>>     measure in terms of extended grapheme clusters, but EGCS width
>>     has changed over time (e.g., family emoji).  That makes alignment
>>     dependent on both display properties and Unicode version.  And,
>>     of course, this would drag in locale dependence as well.
>>
>>
>> If you just count N=EGCs, you get the "right" answer. if your 
>> terminal shows more or less than N characters, get a new terminal.  
>> What I mean by this is that there should be no consideration of fonts.
> I see field width as either indicating storage (number of code units) 
> or alignment.  The number of user perceived characters is not useful 
> for aligning text unless a monospace font is assumed. Therefore, 
> storage seems like the more useful measurement.  This also aligns with 
> format_to_n and formatted_size which, unless I'm mistaken, work in 
> code units.  (It would be nice to clarify the wording for these as 
> well; what is meant by "number of characters in the character 
> representation"?)

Henri Sivonen just today posted a fantastic analysis of the various ways 
in which we think about the length/width of a string.  Particularly 
relevant to this discussion is the "Display Space" section, but I 
encourage everyone to read the entire article.  It's fascinating!
- https://hsivonen.fi/string-length

Tom.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190908/c882a7ea/attachment.html 


More information about the Unicode mailing list