<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 10/16/2018 05:58 PM, Markus Scherer
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Tue, Oct 9, 2018 at 8:57 PM Tom Honermann
<<a href="mailto:tom@honermann.net"
moz-do-not-send="true">tom@honermann.net</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="m_3185301157471047293moz-cite-prefix">The C
standard defines a (very) few functions in terms of the
C <tt>char16_t</tt> typedef (<tt>mbrtoc16</tt>, <tt>c16rtomb</tt>).
Within C++, those functions are exposed in the <tt>std</tt>
namespace as though they were declared with the C++
builtin <tt>char16_t</tt> type. Has there been much
consideration for similarly exposing ICU's C APIs to C++
consumers?</div>
</div>
</blockquote>
<div><br>
</div>
<div>C++ code calls ICU C APIs all the time.</div>
</div>
</div>
</blockquote>
<br>
Of course, sorry, I wasn't very clear with that question. Let me
try again. I was responding to this quote:<br>
<br>
> Unfortunately, if UChar is configured != char16_t, you need
casts or cast helpers for using C APIs from C++ code.<br>
<br>
The question is, effectively, whether consideration has been given
to providing cast helpers in a manner similar to how standard C++
provides access to standard C functions; e.g., by exposing cast
helpers in a C++ namespace. More concretely, whether something like
the following has been considered:<br>
<blockquote><tt>U_STABLE UChar * U_EXPORT2</tt><tt><br>
</tt><tt>u_strchr(const UChar *s, UChar c);</tt><tt><br>
</tt><tt><br>
</tt><tt>#if defined(__cplusplus)</tt><tt><br>
</tt><tt>namespace icu {</tt><tt><br>
</tt><tt> char16_t * U_EXPORT2</tt><tt><br>
</tt><tt> u_strchr(const char16_t *s, char16_t c);</tt><tt><br>
</tt><tt>};</tt><tt><br>
</tt><tt>#endif /* __cplusplus */<br>
</tt></blockquote>
Noting that there are methods on at least some platforms that avoid
having to actually write a definition for the namespace scoped
signature when the functions have compatible calling conventions.<br>
<br>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div>People use C APIs because they can be binary stable, and
they want to be able to link with multiple versions of the
ICU DLL.</div>
</div>
</div>
</blockquote>
<br>
Indeed.<br>
<br>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>People who call C++ APIs either tightly control DLL
versions or link everything statically.</div>
</div>
</div>
</blockquote>
<br>
Despite not wanting to...<br>
<br>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>It would be really nice if it was feasible to provide
stable C++ API from a shared library.</div>
</div>
</div>
</blockquote>
<br>
but having to because of this :)<br>
<br>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="m_3185301157471047293moz-cite-prefix">(This
technique is not without complexities. For example,
attempting to take the address of an overloaded function
without a cast may be ambiguous. I'm just curious how
much this or similar techniques were explored and what
the conclusions were)<br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>Not sure what the question is.</div>
<div>There is of course no overloading on C APIs.</div>
</div>
</div>
</blockquote>
<br>
Hopefully I've clarified this above.<br>
<br>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>If u"literals" had just been uint16_t* without
a new type, then we could have used string
literals without changing API and breaking call
sites, on most platforms anyway. And if
uint16_t==wchar_t on Windows, then that would have
been fine, too.<br>
</div>
</div>
</div>
</blockquote>
<br>
How would that have been fine on Windows? The reinterpret
casts would still have been required.<br>
</div>
</blockquote>
<div><br>
</div>
<div>Why? If the two types had been typedefs of each other,
there would need not be any casts.</div>
</div>
</div>
</blockquote>
<br>
I overlooked your mention of <tt>uint16_t==wchar_t</tt>. However,
<tt>uint16_t</tt> was added in C99 and I suspect it would have
already been too late to define it as <tt>wchar_t</tt> when <tt>u"literals"</tt>
were adopted. Additionally, that would have resulted in the same
problems that we now face with <tt>int8_t</tt> commonly being
defined in terms of a character type.<br>
<br>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">Lyberta provided one
example, but there are others. For example, serialization
and logging libraries. Consider a modern JSON library; it
is convenient to be able to write code like the following
that just works.<br>
<br>
<tt><tt>json_object player;</tt></tt><br>
<tt><tt><tt>uint16_t scores[] = { 16, 27, 13 };<br>
</tt>player["id"] = 42;<br>
</tt>player["name"] = std::u16string("Skipper McGoof");<br>
player["nickname"] = u"Goofy"; // stores a string<br>
player["scores"] = scores; // stores an array of
numbers.<br>
</tt><br>
Note that the above works because <tt>uint16_t</tt> is
effectively never defined in terms of a character type.</div>
</blockquote>
<div><br>
</div>
<div>Sure, but that feels like cherry-picking: You introduce
one new type for one specific kind of thing (a pointer to
certain units holding a string), but every other data that's
a vector of essentially the same base units is still not
distinguishable -- you wouldn't be able to distinguish
scores from coordinates from other lists of numbers etc.</div>
</div>
</div>
</blockquote>
<br>
That is a fair criticism. The trend is to improve the ability to
distinguish such unit kinds. We see this in the C++20 std::chrono
library and other libraries like
<a class="moz-txt-link-freetext" href="https://github.com/nholthaus/units">https://github.com/nholthaus/units</a>. C++11 user defined literals
(despite some usability issues) are intended to help in this
respect. Where we have core language features (e.g., string
literals), I think it is reasonable to be able to differentiate them
without having to further decorate them.<br>
<br>
Tom.<br>
<br>
<blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">Having different types
for character data makes the above possible without having
to hard-code for specific string types. In the concepts
enabled world that we are moving into, this enables us to
write concepts like the following that can then be used to
constrain functions intended to work only on string-like
types.<br>
</div>
</blockquote>
<div><br>
</div>
<div>I take your word for it. I know nothing about "concepts".</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div>In ICU, when I get to actual UTF-8 processing,
I tend to either cast each byte to uint8_t or cast
the whole pointer to uint8_t* and call an internal
worker function.</div>
<div>Somewhat ironically, the fastest way to test
for a UTF-8 trail byte is via the opposite cast,
testing if (int8_t)b<-0x40.</div>
</div>
</div>
</blockquote>
<br>
Assuming a 2s complement representation, which we're
nearly set to be able to assume in C++20 (<a
class="m_3185301157471047293moz-txt-link-freetext"
href="http://wg21.link/p0907" target="_blank"
moz-do-not-send="true">http://wg21.link/p0907</a>)!<br>
</div>
</blockquote>
<div><br>
</div>
<div>Well, this is nice! Especially</div>
</div>
<blockquote style="margin:0 0 0 40px;border:none;padding:0px">
<div class="gmail_quote">
<div><em
style="color:rgb(0,0,0);font-family:sans-serif;font-size:medium">Change</em><span
style="color:rgb(0,0,0);font-family:sans-serif;font-size:medium"> Right-shift
is an arithmetic right shift which performs
sign-extension.</span></div>
</div>
</blockquote>
<div class="gmail_quote">
<div>which should get static-analysis tools off our backs.</div>
<div><br>
</div>
<div>Only because those have complained about code where we
use arithmetic right shifts did I have to make a macro that
does the normal (signed>>num_bits) on normal
compilers, and a manual sign extension when compiling for
static analysis...</div>
<div>I don't think it's been an issue on any real compiler.
All machines that anyone ever ported ICU to seem to use
two's-complement integers of 8/16/32/... bits.</div>
<div><br>
</div>
<div>markus</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>