[SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++

JeanHeyd Meneide phdofthehouse at gmail.com
Sun Apr 28 23:25:28 CEST 2019


On Sun, Apr 28, 2019 at 4:01 PM <keld at keldix.com> wrote:

>   I believe there are a number of encodings in East Asia that there will
> still be
> developed for for quite some time.
>
> major languages and toolkits and operating systems are still character set
> independent.
> some people believe that unicode has not won, and some people are not
> happy with
> the unicode consortium. why abandon a model that still delivers for all?
>
> keld
>

I think there's really only one thing that needs to be fixed, and that's
the POSIX and C locales. Right now, they force a by-requirement 256
single-byte encoding. (Chapter 6, Section 2, first sentence:
http://pubs.opengroup.org/onlinepubs/9699919799/).

This restriction is what has been utterly and absolutely destroying the
ability to behave properly with a large set of encodings deployed around
the world, including Unicode, as a default. I am actually spending time and
cycles now contacting people on the C Standards Committee and reaching out
to people to find the POSIX individuals responsible for overseeing this
standard: that the locale is a single-byte encoding is not "character set
independent": it means that only a small fraction (ASCII, or similar) can
possibly be the default C or POSIX locale. That Unicode (specifically,
UTF8) happens to work in C and C++ is because the defaults for many of the
implementations simply pass char/wchar_t/char16_t/char32_t through their
interfaces and do not touch it. But, the moment anyone uses facets or
locales in any meaningful manner, much of it falls over.

POSIX/C need to acknowledge that multibyte encodings are reasonable
defaults (not just recommended extensions, but plausible defaults). Until
then, no: the C standard does not deliver for all and actively harms the
development and growth of international text processing on large and small
hardware systems.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190428/16a7d0bf/attachment.html 


More information about the Unicode mailing list