<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 28 Mar 2019 at 08:49 Lyberta <<a href="mailto:lyberta@lyberta.net">lyberta@lyberta.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Corentin:<br>
> As requested by Tom, please find attach D1628R0 which will be discussed<br>
> during today's meeting \N{WHITE EXCLAMATION MARK ORNAMENT}<br>
> <br>
> Feedback welcome :)<br>
<br>
Do we really want std::uni? std::unicode seems much better.<br></blockquote><div><br></div><div>Yes - The longer the namespace, the more likely people are to write "use namespace std::unicode;"</div><div>which defeats the purpose - we have bad precedent with std::filesystem. </div><div>Uni is sweet and short, I guess something like uncd would work too, </div><div>it's not as much about the name as it is about the number of letters</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Unicode always uses the term "code point", not "code point":<br>
<a href="https://www.unicode.org/glossary/#code_point" rel="noreferrer" target="_blank">https://www.unicode.org/glossary/#code_point</a><br>
<br>
So the name should be std::uni[code]::code_point.</blockquote><div><br></div><div>Bike-shedding and while that might be true, is there any gain in information ?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
In my experience, I never need the code point because surrogates are not<br>
allowed in valid UTF. I only ever need unicode scalar values:<br>
<a href="https://www.unicode.org/glossary/#unicode_scalar_value" rel="noreferrer" target="_blank">https://www.unicode.org/glossary/#unicode_scalar_value</a></blockquote><div><br></div><div><br></div><div>This api (and TR44) is defined in term of code points</div><div>it's actually well behave from all integers from 0 to 0xFFFFFFFF</div><div><br></div><div><br></div><div>The whole reason I am using that codepoint type (which is more a __codepoint_hack type) here is to delete</div><div>use with char and wchar_t which is non nonsensical.</div><div>Aka a code point type is not part of this proposal.</div><div><br></div><div>The feedback I got is to just not care and just use uint32_t instead and let people</div><div>shoot themselves in the foot.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
Hence I think using code point interfaces should be discouraged.<br>
<br>
I think constructing code points or scalar values from char8_t or<br>
char16_t makes no sense. They are at the different levels.<br>
<br>
I'm writing a competing proposal where I want to propose<br>
std::unicode_code_point and std::unicode_scalar_value that have explicit<br>
constructors from char32_t and explicit member function .value() to get<br>
char32_t back. I think this is the only way forward. char8_t, char16_t<br>
and char32_t are dumb types that have horrible names, we should o.nly<br>
use them as a transition mechanism.<br></blockquote><div><br></div><div>In my experience, you will find that it is a very difficult and verbose api to use,</div><div>especially that explicit value method.</div><div>I do think char32_t is fine as it was always supposed to be a code-point (or even, a code unit which also happens to be a codepoint, it's really the most basic building bloc) which it is.</div><div>I do not think scalar value are that important as it is difficult to form something that it is not a scalar value as soon as we have the right "unicode sandwich" model</div><div>where encoding or input that may produce non-scalar value code point have to be decoded at i/o boundary</div><div>then your scalar value just becomes a contract that you can sprinkle everywhere.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I'm gonna try to finish the early draft of my proposal and after release<br>
of GCC 9 I'm gonna port my entire code base on its design so I will have<br>
usage experience with it.<br></blockquote><div><br></div><div>Great ! </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div></div>