[SG16-Unicode] [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code Points" blog post

Tom Honermann tom at honermann.net
Tue Nov 12 21:53:31 CET 2019


On 11/12/19 6:11 PM, Billy O'Neal (VC LIBS) via Lib-Ext wrote:
>
> It came up in the context of that width thing in format and I was 
> asking if I had permission to make wider-than-2 characters format 
> properly, and the forwarded text doesn’t seem to allow that (which is 
> OK, I just wanted to understand at the time); I was thinking of U+FDFD 
> (﷽).
>
Can you elaborate?  My understanding of the forwarded wording is that 
the assumed encoding for the input text is implementation defined 
(though not locale sensitive) and that implementors are encouraged to 
use the Unicode code point ranges indicated in the wording, but are not 
required to (that is my interpretation of the use of the word "should" 
in the proposed wording).

It does look like the provided code point ranges don't handle U+FDFD 
correctly.

I don't know how much confidence should be placed on the listed code 
point ranges.  But I think it is important that we consider them 
amenable to change.  I suspect that U+FDFD is not the last code point 
we'll find that is not correctly handled.

Tom.

> Billy3
>
> *From: *Corentin <mailto:corentin.jabot at gmail.com>
> *Sent: *Tuesday, November 12, 2019 8:42 AM
> *To: *C++ Library Evolution Working Group 
> <mailto:lib-ext at lists.isocpp.org>
> *Cc: *lib at lists.isocpp.org <mailto:lib at lists.isocpp.org>; Billy O'Neal 
> (VC LIBS) <mailto:bion at microsoft.com>; SG16 <mailto:unicode at open-std.org>
> *Subject: *Re: [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to 
> Code Points" blog post
>
> On Tue, 12 Nov 2019 at 16:58, Billy O'Neal (VC LIBS) via Lib-Ext 
> <lib-ext at lists.isocpp.org <mailto:lib-ext at lists.isocpp.org>> wrote:
>
>     During review of some Unicode stuff in LWG we had a mini
>     discussion for some folks about grapheme clusters and I mentioned
>     everyone who touches this stuff might understand the complexities
>     better if they read this:
>
>     https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/
>     <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmanishearth.github.io%2Fblog%2F2017%2F01%2F14%2Fstop-ascribing-meaning-to-unicode-code-points%2F&data=02%7C01%7Cbion%40microsoft.com%7C22aa5da59e6e43dbfe3b08d7678f3d0c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091737219671542&sdata=b0Hnaqt4CnrqiFurlnAUZkPxvvAw%2BhJ3qoYl6MRGLok%3D&reserved=0>
>
> +1
>
> FYI SG-16 is aware of that blog post and i think there is a pretty 
> strong agreement with it.
>
> Codepoints have some use (notably the Unicode Character Database is 
> really the Unicode Codepoint Database, and most Unicode algorithms 
> works on codepoints), but any kind of user facing UX should deal with 
> EGCS.
>
> It is not always what applications choose to do for a variety of 
> reasons. Notably Twitter character counts deals in codepoints, web 
> browsers search function use codepoints as to ignore diacritics, and 
> comparisons can be done on (normalized)  codepoint sequences.
>
> There is also not always a 1-1 mapping between what people understand 
> as "character", grapheme clusters and glyphes.
>
>     Billy3
>
> _______________________________________________
> Lib-Ext mailing list
> Lib-Ext at lists.isocpp.org <mailto:Lib-Ext at lists.isocpp.org>
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext 
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C22aa5da59e6e43dbfe3b08d7678f3d0c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091737219681537&sdata=VI5tLJY3rxnxUU4OxZDjj7Gr0tnb8bGS0BX%2BI2hxm6E%3D&reserved=0>
> Link to this post: http://lists.isocpp.org/lib-ext/2019/11/13606.php 
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13606.php&data=02%7C01%7Cbion%40microsoft.com%7C22aa5da59e6e43dbfe3b08d7678f3d0c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091737219681537&sdata=qeVJfITvEFCykjDV28iQkM7BnmepHgy%2BbL121uIQriQ%3D&reserved=0>
>
>
> _______________________________________________
> Lib-Ext mailing list
> Lib-Ext at lists.isocpp.org
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext
> Link to this post: http://lists.isocpp.org/lib-ext/2019/11/13609.php


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20191112/39352053/attachment.html 


More information about the Unicode mailing list