<div dir="ltr"><div dir="ltr">On Tue, Sep 10, 2019 at 10:36 AM Niall Douglas <<a href="mailto:s_sourceforge@nedprod.com">s_sourceforge@nedprod.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
> Perhaps it would be helpful to enumerate what we expect to be portable<br>
> uses of field widths. My personal take is that they are useful to<br>
> specify widths for fields where the content is restricted to members of<br>
> the basic source character set where we already have a guarantee that<br>
> each character can be represented with one code unit. <br>
<br>
Most programmers would use field widths for padding items so they appear<br>
in a grid. They would expect that 𐐗 padded to eight characters yields<br>
seven spaces and 𐐗, not four spaces and 𐐗 (because 𐐗 consumes four<br>
bytes of UTF-8).<br>
<br>
That said, as we have no idea how unicode would get rendered (0, 1, or 4<br>
characters for 𐐗 being the most likely), I cannot improve on your<br>
proposal. The situation sucks, quite frankly.<br></blockquote><div><br></div><div> One of the benefits of using code units for char and wchar_t here is that, even if its visually wrong, its <i>dependably</i> wrong. I can pass char-based utf8 and know exactly how to mitigate the problem if I care, and on all platforms I will have exactly the same problem, regardless of whether the program is deployed on a Turkish, German, or Japanese machine. This, combined with the ability to not do anything with std::locale for char and wchar_t, is extremely valuable (if frustrating for those who care).<br><br></div><div> char and wchar_t are portability dead ends; let's leave it to the mess that they are and focus on having a really good story for char8_t, char16_t, and char32_t.<br><br></div><div>Sincerely,<br></div><div>JeanHeyd<br></div></div></div>