[SG16-Unicode] [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code Points" blog post
Billy O'Neal (VC LIBS)
bion at microsoft.com
Tue Nov 12 19:11:41 CET 2019
It came up in the context of that width thing in format and I was asking if I had permission to make wider-than-2 characters format properly, and the forwarded text doesn’t seem to allow that (which is OK, I just wanted to understand at the time); I was thinking of U+FDFD (﷽).
Billy3
From: Corentin<mailto:corentin.jabot at gmail.com>
Sent: Tuesday, November 12, 2019 8:42 AM
To: C++ Library Evolution Working Group<mailto:lib-ext at lists.isocpp.org>
Cc: lib at lists.isocpp.org<mailto:lib at lists.isocpp.org>; Billy O'Neal (VC LIBS)<mailto:bion at microsoft.com>; SG16<mailto:unicode at open-std.org>
Subject: Re: [isocpp-lib-ext] The "Let's Stop Ascribing Meaning to Code Points" blog post
On Tue, 12 Nov 2019 at 16:58, Billy O'Neal (VC LIBS) via Lib-Ext <lib-ext at lists.isocpp.org<mailto:lib-ext at lists.isocpp.org>> wrote:
During review of some Unicode stuff in LWG we had a mini discussion for some folks about grapheme clusters and I mentioned everyone who touches this stuff might understand the complexities better if they read this:
https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmanishearth.github.io%2Fblog%2F2017%2F01%2F14%2Fstop-ascribing-meaning-to-unicode-code-points%2F&data=02%7C01%7Cbion%40microsoft.com%7C22aa5da59e6e43dbfe3b08d7678f3d0c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091737219671542&sdata=b0Hnaqt4CnrqiFurlnAUZkPxvvAw%2BhJ3qoYl6MRGLok%3D&reserved=0>
+1
FYI SG-16 is aware of that blog post and i think there is a pretty strong agreement with it.
Codepoints have some use (notably the Unicode Character Database is really the Unicode Codepoint Database, and most Unicode algorithms works on codepoints), but any kind of user facing UX should deal with EGCS.
It is not always what applications choose to do for a variety of reasons. Notably Twitter character counts deals in codepoints, web browsers search function use codepoints as to ignore diacritics, and comparisons can be done on (normalized) codepoint sequences.
There is also not always a 1-1 mapping between what people understand as "character", grapheme clusters and glyphes.
Billy3
_______________________________________________
Lib-Ext mailing list
Lib-Ext at lists.isocpp.org<mailto:Lib-Ext at lists.isocpp.org>
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C22aa5da59e6e43dbfe3b08d7678f3d0c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091737219681537&sdata=VI5tLJY3rxnxUU4OxZDjj7Gr0tnb8bGS0BX%2BI2hxm6E%3D&reserved=0>
Link to this post: http://lists.isocpp.org/lib-ext/2019/11/13606.php<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13606.php&data=02%7C01%7Cbion%40microsoft.com%7C22aa5da59e6e43dbfe3b08d7678f3d0c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637091737219681537&sdata=qeVJfITvEFCykjDV28iQkM7BnmepHgy%2BbL121uIQriQ%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20191112/781be285/attachment-0001.html
More information about the Unicode
mailing list