<div dir="ltr">Is there an open access version of 14652 ? It sounds like an extension of chapters 6 and 7 of the Posix spec? The ISO desc mentions APIs that will be developed? Also it looks like the spec is currently withdrawn, is there a replacement? <a href="https://www.iso.org/standard/37069.html">https://www.iso.org/standard/37069.html</a></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 29, 2019 at 2:11 PM <<a href="mailto:keld@keldix.com">keld@keldix.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, Apr 29, 2019 at 02:00:23PM -0400, Steve Downey wrote:<br>
> Not "some". I want the entire set of Unicode functionality as a first class<br>
> citizen, although some of them have higher priority than others.<br>
<br>
<br>
I am willing to consider support for all of unicode functionality.<br>
much is however not defined by unicode. and not in a c/c++ style<br>
I think some unicode is not well designed, like ucs16.<br>
(the stuff they copied from me, however, is ok:-)<br>
<br>
<br>
<br>
<br>
> Keld, what capabilities not provided by Unicode algorithms and databases<br>
> are you concerned about not being supported? I've been doing text<br>
> processing a lot, and working with code points or scalar values has made my<br>
> life easier, with less complaints from my customers. Well, except for a few<br>
> who don't like CDATA in XML dumps, but were somehow OK with utterly broken<br>
> XML.<br>
<br>
i did a number of designs that they have not yet copied, eg in 14652, and posix has some<br>
non-unicode stuff on the way.<br>
<br>
keld <br>
<br>
> On Mon, Apr 29, 2019 at 1:48 PM Steve Downey <<a href="mailto:sdowney@gmail.com" target="_blank">sdowney@gmail.com</a>> wrote:<br>
> <br>
> > The "POSIX" and "C" locales, where the "POSIX" locale is the superset of<br>
> > capabilities of the "C" locale, but otherwise by definition equivalent, is<br>
> > the one you get if you do not make a setlocale() call.<br>
> > So, not _a_ posix locale, but _the_ POSIX locale.<br>
> ><br>
> > On Mon, Apr 29, 2019 at 1:37 PM <<a href="mailto:keld@keldix.com" target="_blank">keld@keldix.com</a>> wrote:<br>
> ><br>
> >> On Sun, Apr 28, 2019 at 05:25:28PM -0400, JeanHeyd Meneide wrote:<br>
> >> > On Sun, Apr 28, 2019 at 4:01 PM <<a href="mailto:keld@keldix.com" target="_blank">keld@keldix.com</a>> wrote:<br>
> >> ><br>
> >> > > I believe there are a number of encodings in East Asia that there<br>
> >> will<br>
> >> > > still be<br>
> >> > > developed for for quite some time.<br>
> >> > ><br>
> >> > > major languages and toolkits and operating systems are still<br>
> >> character set<br>
> >> > > independent.<br>
> >> > > some people believe that unicode has not won, and some people are not<br>
> >> > > happy with<br>
> >> > > the unicode consortium. why abandon a model that still delivers for<br>
> >> all?<br>
> >> > ><br>
> >> > > keld<br>
> >> > ><br>
> >> ><br>
> >> > I think there's really only one thing that needs to be fixed, and that's<br>
> >> > the POSIX and C locales. Right now, they force a by-requirement 256<br>
> >> > single-byte encoding. (Chapter 6, Section 2, first sentence:<br>
> >> > <a href="http://pubs.opengroup.org/onlinepubs/9699919799/" rel="noreferrer" target="_blank">http://pubs.opengroup.org/onlinepubs/9699919799/</a>).<br>
> >><br>
> >> the posix std has since 1991 had provisions for iso 10646 and most posix<br>
> >> implementations<br>
> >> today supports iso 10646 and iso 14651 - with a lot of collation and<br>
> >> character attribure support<br>
> >> long befor unicide made something up.<br>
> >><br>
> >> ><br>
> >> > This restriction is what has been utterly and absolutely destroying the<br>
> >> > ability to behave properly with a large set of encodings deployed around<br>
> >> > the world, including Unicode, as a default. I am actually spending time<br>
> >> and<br>
> >> > cycles now contacting people on the C Standards Committee and reaching<br>
> >> out<br>
> >> > to people to find the POSIX individuals responsible for overseeing this<br>
> >> > standard: that the locale is a single-byte encoding is not "character<br>
> >> set<br>
> >> > independent": it means that only a small fraction (ASCII, or similar)<br>
> >> can<br>
> >> > possibly be the default C or POSIX locale. That Unicode (specifically,<br>
> >> > UTF8) happens to work in C and C++ is because the defaults for many of<br>
> >> the<br>
> >> > implementations simply pass char/wchar_t/char16_t/char32_t through their<br>
> >> > interfaces and do not touch it. But, the moment anyone uses facets or<br>
> >> > locales in any meaningful manner, much of it falls over.<br>
> >><br>
> >> this is not true, quite the contrary.<br>
> >> yes posix has a standard posix locale which is 7/8 bit and portable,<br>
> >> but 10646 has been supported since 1991 in posix. and works are inderway<br>
> >> for a posix 10646 locale,<br>
> >> iso 14652 has a candidate for that which is also the base for many glibc<br>
> >> national locales.<br>
> >><br>
> >><br>
> >> ><br>
> >> > POSIX/C need to acknowledge that multibyte encodings are reasonable<br>
> >> > defaults (not just recommended extensions, but plausible defaults).<br>
> >> Until<br>
> >> > then, no: the C standard does not deliver for all and actively harms the<br>
> >> > development and growth of international text processing on large and<br>
> >> small<br>
> >> > hardware systems.<br>
> >><br>
> >> I think you are not up to date. how can Linux and osx and other posix<br>
> >> os'es deliver<br>
> >> fully internationalized systems with support for more languages than<br>
> >> microsoft windows?<br>
> >> linux supports more than 100 languages, an mostly in utf-8.<br>
> >><br>
> >> keld<br>
> >> _______________________________________________<br>
> >> SG16 Unicode mailing list<br>
> >> <a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
> >> <a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
> >><br>
> ><br>
</blockquote></div>