<br><br><div class="gmail_quote"><div dir="ltr">On Thu, Mar 28, 2019, 9:33 AM Lyberta <<a href="mailto:lyberta@lyberta.net">lyberta@lyberta.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> Yes - The longer the namespace, the more likely people are to write "use<br>
> namespace std::unicode;"<br>
> which defeats the purpose - we have bad precedent with std::filesystem.<br>
> Uni is sweet and short, I guess something like uncd would work too,<br>
> it's not as much about the name as it is about the number of letters<br>
<br>
Uni is too ambiguous, uncd is better but very ugly. I have no problem<br>
with std::filesystem.<br>
<br>
> <br>
>><br>
>> Unicode always uses the term "code point", not "code point":<br>
>> <a href="https://www.unicode.org/glossary/#code_point" rel="noreferrer" target="_blank">https://www.unicode.org/glossary/#code_point</a><br>
>><br>
>> So the name should be std::uni[code]::code_point.<br>
> <br>
> <br>
> Bike-shedding and while that might be true, is there any gain in<br>
> information ?<br>
<br>
"Codepoint" feels very wrong, almost as wrong as strlen and the rest of<br>
C library.<br>
<br>
>> In my experience, I never need the code point because surrogates are not<br>
>> allowed in valid UTF. I only ever need unicode scalar values:<br>
>> <a href="https://www.unicode.org/glossary/#unicode_scalar_value" rel="noreferrer" target="_blank">https://www.unicode.org/glossary/#unicode_scalar_value</a><br>
> <br>
> <br>
> <br>
> This api (and TR44) is defined in term of code points<br>
> it's actually well behave from all integers from 0 to 0xFFFFFFFF<br>
<br>
I guess, but do we really want our users to shove random integers in it<br></blockquote></div><div><br></div><div>Yes. I really want a wide contract there</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> The whole reason I am using that codepoint type (which is more a<br>
> __codepoint_hack type) here is to delete<br>
> use with char and wchar_t which is non nonsensical.<br>
> Aka a code point type is not part of this proposal.<br>
<br>
That's why my design intended those functions to be member functions of<br>
code point (or scalar value) type. Since constructor is explicit, you<br>
can't shove char or wchar_t in there<br></blockquote></div><div><br></div><div>That gives the impression these type may have state or caching which they really shouldn't have. But otherwise yes, if your objects have a wide contract all the way through - which they won't - having these methods in a type is possible. I don't think we gain in usability thought, especially it makes it harder to use these query in ranges.</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> <br>
> The feedback I got is to just not care and just use uint32_t instead and<br>
> let people<br>
> shoot themselves in the foot.<br>
<br>
What about systems where CHAR_BIT != 8, 16 or 32? std::uint32_t is<br>
optional, do we want Unicode on such systems? I'm myself on the edge<br>
between char32_t and std::uint_least32_t.<br></blockquote></div><div><br></div><div>Good point</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
>> I'm writing a competing proposal where I want to propose<br>
>> std::unicode_code_point and std::unicode_scalar_value that have explicit<br>
>> constructors from char32_t and explicit member function .value() to get<br>
>> char32_t back. I think this is the only way forward. char8_t, char16_t<br>
>> and char32_t are dumb types that have horrible names, we should o.nly<br>
>> use them as a transition mechanism.<br>
>><br>
> <br>
> In my experience, you will find that it is a very difficult and verbose api<br>
> to use,<br>
> especially that explicit value method.<br>
> I do think char32_t is fine as it was always supposed to be a code-point<br>
> (or even, a code unit which also happens to be a codepoint, it's really the<br>
> most basic building bloc) which it is.<br>
> I do not think scalar value are that important as it is difficult to form<br>
> something that it is not a scalar value as soon as we have the right<br>
> "unicode sandwich" model<br>
> where encoding or input that may produce non-scalar value code point have<br>
> to be decoded at i/o boundary<br>
> then your scalar value just becomes a contract that you can sprinkle<br>
> everywhere.<br>
<br>
Yes, contract or invariant means strong type, not dumb char32_t<br></blockquote></div><div><br></div><div>TR 44 is purposefully dumb by design too. </div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div>