[SG16-Unicode] [isocpp-lib] New issue: Are std::format field widths code units, code points, or something else?

Tom Honermann tom at honermann.net
Fri Sep 13 18:08:53 CEST 2019


On 9/13/19 10:35 AM, Victor Zverovich wrote:
> I'll report back my findings in a paper. It may not be solvable 
> perfectly but I think we can come up with a good practical 
> approximation that addresses the main use case and I'm fine with not 
> addressing esoteric ones. People somehow manage to write CLIs that do 
> this and work with fancy emojis and Asian scripts even in C =).

Please make sure to address some of the more funny characters in the 
paper.  Here are a few examples, but I'm sure there are many more.

  * U+200B { ZERO WIDTH SPACE }
  * U+2063 { INVISIBLE SEPARATOR }
  * U+2064 { INVISIBLE PLUS }
  * Half and full width characters
  * Family emoji

I tried an experiment a little while back.  I thought it would be fun to 
take Eric Niebler's range-v3 calendar example 
(https://github.com/ericniebler/range-v3/blob/master/example/calendar.cpp) 
and modify it to generate emoji for some holidays.  I didn't actually go 
so far as to modify his code, but rather just did a simple hack to test 
output to a terminal.

$ cat cal.cpp
#include <iostream>
#include <locale>
int main() {
   std::setlocale(LC_ALL, "");
   std::cout <<
     "        October              November December\n"
     "              1  2  3   1  2  3  4  5  6  7 1  2  3  4  5\n"
     "  4  5  6  7  8  9 10   8  9 10 11 12 13 14   6  7 8  9 10 11 12\n"
     " 11 12 13 14 15 16 17  15 16 17 18 19 20 21  13 14 15 16 17 18 19\n"
     " 18 19 20 21 22 23 24  22 23 24 25  \xF0\x9F\xA6\x83 27 28  20 21 
22 23 24  \xF0\x9F\x8E\x84 26\n"
     " 25 26 27 28 29 30  \xF0\x9F\x8E\x83 29 30                 27 28 
29 30 31\n";
}

Here is what konsole on Ubuntu 18.04 displays for me today:

I find it interesting that misalignment is not consistent even when font 
support is not present.

I wasn't able to get font fallback working in the time I allotted to 
this.  The only way I could get emoji to appear was to install the 
"fonts-noto-color-emoji" package and then change konsole's font to 
select it.  This is a proportional font, so of course everything looks 
ridiculous.

Tom.

>
> - Victor
>
> On Fri, Sep 13, 2019 at 6:57 AM Niall Douglas 
> <s_sourceforge at nedprod.com <mailto:s_sourceforge at nedprod.com>> wrote:
>
>     On 13/09/2019 14:36, Victor Zverovich wrote:
>     >> Instead of inventing something in the abstract, a good next
>     step would
>     >> be to figure out how (in UTF-8 mode) Apple Terminal, Gnome
>     Terminal,
>     >> Konsole, and the new Windows Terminal determine how many terminal
>     >> display column a string takes. (I'm not volunteering.)
>     >
>     > I'm volunteering to do this since improving handling of width is
>     already
>     > on my TODO list for the fmt library.
>
>     I'll be interested in what you come up with on this, as I don't think
>     this solvable.
>
>     For example, imagine formatting into a file, and then that file is
>     rendered onto a console.
>
>     Another example: imagine formatting into a clipboard, which on Windows
>     and POSIX might involve three or four renditions into differing
>     formats
>     and encodings. Then the consumer of the clipboard chooses an
>     unknown one
>     of those renditions, and reinterprets it in some unknown way into a
>     paste into some document.
>
>     Personally speaking, I think the best course is to declare
>     codepoint or
>     byte based formatting widths, and draw a line under it.
>
>     Niall
>     _______________________________________________
>     SG16 Unicode mailing list
>     Unicode at isocpp.open-std.org <mailto:Unicode at isocpp.open-std.org>
>     http://www.open-std.org/mailman/listinfo/unicode
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/unicode


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190913/80b41d76/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpliahnfhpdpkcdp.png
Type: image/png
Size: 11225 bytes
Desc: not available
Url : http://www.open-std.org/pipermail/unicode/attachments/20190913/80b41d76/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: phifkfikdmicoocp.png
Type: image/png
Size: 11839 bytes
Desc: not available
Url : http://www.open-std.org/pipermail/unicode/attachments/20190913/80b41d76/attachment-0003.png 


More information about the Unicode mailing list