<div dir="ltr">
<div dir="ltr"><div dir="ltr"><div dir="ltr">
<div dir="ltr"><div dir="ltr">
<div class="gmail_attr">I'm sure many people agree that UTF16 was a mistake. I'm not sure how many people agree that it deserves deprecation, or removal.<br></div><div dir="ltr" class="gmail_attr"><br>On Fri, Apr 12, 2019 at 4:46 PM Steve Downey <<a href="mailto:sdowney@gmail.com" target="_blank">sdowney@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I'm not placing ECS and WECS? </div></blockquote>
</div><div dir="ltr"><br></div><div>Presumably, Execution Character Set and Wide Execution Character Set.<br></div><div dir="ltr"><br></div><div dir="ltr">On Fri, Apr 12, 2019 at 3:00 PM Lyberta <<a href="mailto:lyberta@lyberta.net" target="_blank">lyberta@lyberta.net</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Previously it was suggested to focus on Unicode so I no longer propose<br>
std::text namespace but I think we should put Unicode into std::unicode.<br></blockquote><div><br></div><div>I ultimately don't have a horse in the race: I'll stick the code wherever the final bikeshed is built.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Is there a proposal for those?<br></blockquote><div><br></div><div>I am
working on a proposal; I believe someone else might be working on a
proposal for it as well. There is also an in-progress implementation.<br>WIP Proposal: <a href="https://thephd.github.io/vendor/future_cxx/papers/d1629.html">https://thephd.github.io/vendor/future_cxx/papers/d1629.html</a><br>WIP
Implementation (will be moved to separate repository in a few months):
<a href="https://github.com/ThePhD/phd/tree/master/include/phd/text" target="_blank">https://github.com/ThePhD/phd/tree/master/include/phd/text</a><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Maybe a bit offtopic but I don't think std::narrow_execution and<br>
std::wide_execution are good names. I think appending _character_set<br>
would make them less ambiguous.<br></blockquote><div><br></div><div>I
took a vote on narrow/wide vs. narrow_execution/wide_execution, but not
narrow_character_set/wide_character_set or narrow_execution_character_set/wide_execution_character_set. I'm down for making these names
as ugly and unpalatable and unspell-able as possible, because nobody
should be using them ever without compelling reason (e.g., interop with old code).<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> Regarding earlier points on what the standard does provide: the standard<br>
> needs to provide encodings for all the encoding types that are (currently)<br>
> pushed out by the standard, and nothing more. This includes: std::utf8,<br>
> std::utf16, std::utf32, std::wide_execution, and std::narrow_execution.<br>
<br>
I agree but I want to stress that this would be a good idea to provide<br>
only minimal support for ECS and WECS (i.e. transcoding only) and just<br>
let users migrate to Unicode.<br>
<br></blockquote><div><br></div><div>I agree. The entire unicode library
will only work with unicode_code_point/scalar_value (char32_t or a
strong typedef, whatever people decide). However, in order to compensate
for the fact that the stored text sequences in many places will not be
able to use this library, we need robust transcoding (encode/decode)
support. The default is encodings that:<br><br>1. pipe things from code_unit_t ->
unicode_code_point_t;<br>2. (do all your work here);<br>3. then, pipe things from
unicode_code_point_t
-> code_unit_t<br><br>If
you specify the inner bit to not be Unicode, the library should (and
will) loudly and noisily fail you for not providing Unicode it can use.
But maybe someone just wants ebcdic -> wide_ebcdic with some strange
non-unicode intermediary encoding. That's fine too; it just won't work
with all of the Standard because it is Sufficiently Weird. Your
encode/decode will work, and your transcode within that boundary, but
not transcoding outside of it without some way to go from what you have
to Unicode.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> The standard should not vend any other encodings...<br>
<br>
Again, it's been suggested to provide full-fledged API to Unicode only.<br></blockquote><div><br></div><div>I
agree; the core of the library will be built on Unicode and Unicode
Algorithms that work on Unicode Code Points/Unicode Scalar Values. However, there are one too
many text encodings in the wild and serving up production data --
including obscene amounts of Financial and Government data -- that is <i><b>not</b></i>
in a Unicode Format of any kind. Telling these industries that they
will not be apart of the new world does not sound like a useful business
proposition; therefore, they will pay the cost of (lazy, eager)
transcoding as described above, and then use the Unicode Algorithms once
they transcode. (They can then optionally translate back down to
whatever they want; e.g., when they're sending it out of their program.)<br><br></div><div>Note that only the people who do not
keep Unicode around will need to pay the cost of transcoding. If your
data is already Unicode-friendly, then the standard and the interfaces
we provide will support you fully. This means that any hard-coded
algorithms that are not templated on encoding / decoding must provide a
range to Unicode Codepoints to work on (or straight up take char8_t,
char16_t, and char32_t, all of which are assumed by compile-time
conventions to be valid Unicode).<br></div><div> </div><div>ECS and WCS must be transcoded. (Or cast/handled in some similar manner.)<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div></div>
</div></div></div>
</div>