<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 10/16/2018 05:58 PM, Markus Scherer
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr">On Tue, Oct 9, 2018 at 8:57 PM Tom Honermann
            &lt;<a href="mailto:tom@honermann.net"
              moz-do-not-send="true">tom@honermann.net</a>&gt; wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div class="m_3185301157471047293moz-cite-prefix">The C
                standard defines a (very) few functions in terms of the
                C <tt>char16_t</tt> typedef (<tt>mbrtoc16</tt>, <tt>c16rtomb</tt>). 
                Within C++, those functions are exposed in the <tt>std</tt>
                namespace as though they were declared with the C++
                builtin <tt>char16_t</tt> type.  Has there been much
                consideration for similarly exposing ICU's C APIs to C++
                consumers?</div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>C++ code calls ICU C APIs all the time.</div>
        </div>
      </div>
    </blockquote>
    <br>
    Of course, sorry, I wasn't very clear with that question.  Let me
    try again.  I was responding to this quote:<br>
    <br>
    &gt; Unfortunately, if UChar is configured != char16_t, you need
    casts or cast helpers for using C APIs from C++ code.<br>
    <br>
    The question is, effectively, whether consideration has been given
    to providing cast helpers in a manner similar to how standard C++
    provides access to standard C functions; e.g., by exposing cast
    helpers in a C++ namespace.  More concretely, whether something like
    the following has been considered:<br>
    <blockquote><tt>U_STABLE UChar * U_EXPORT2</tt><tt><br>
      </tt><tt>u_strchr(const UChar *s, UChar c);</tt><tt><br>
      </tt><tt><br>
      </tt><tt>#if defined(__cplusplus)</tt><tt><br>
      </tt><tt>namespace icu {</tt><tt><br>
      </tt><tt>  char16_t * U_EXPORT2</tt><tt><br>
      </tt><tt>  u_strchr(const char16_t *s, char16_t c);</tt><tt><br>
      </tt><tt>};</tt><tt><br>
      </tt><tt>#endif /* __cplusplus */<br>
      </tt></blockquote>
    Noting that there are methods on at least some platforms that avoid
    having to actually write a definition for the namespace scoped
    signature when the functions have compatible calling conventions.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div>People use C APIs because they can be binary stable, and
            they want to be able to link with multiple versions of the
            ICU DLL.</div>
        </div>
      </div>
    </blockquote>
    <br>
    Indeed.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>People who call C++ APIs either tightly control DLL
            versions or link everything statically.</div>
        </div>
      </div>
    </blockquote>
    <br>
    Despite not wanting to...<br>
    <br>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>It would be really nice if it was feasible to provide
            stable C++ API from a shared library.</div>
        </div>
      </div>
    </blockquote>
    <br>
    but having to because of this :)<br>
    <br>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div class="m_3185301157471047293moz-cite-prefix">(This
                technique is not without complexities.  For example,
                attempting to take the address of an overloaded function
                without a cast may be ambiguous.  I'm just curious how
                much this or similar techniques were explored and what
                the conclusions were)<br>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Not sure what the question is.</div>
          <div>There is of course no overloading on C APIs.</div>
        </div>
      </div>
    </blockquote>
    <br>
    Hopefully I've clarified this above.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div>If u"literals" had just been uint16_t* without
                      a new type, then we could have used string
                      literals without changing API and breaking call
                      sites, on most platforms anyway. And if
                      uint16_t==wchar_t on Windows, then that would have
                      been fine, too.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <br>
              How would that have been fine on Windows?  The reinterpret
              casts would still have been required.<br>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Why? If the two types had been typedefs of each other,
            there would need not be any casts.</div>
        </div>
      </div>
    </blockquote>
    <br>
    I overlooked your mention of <tt>uint16_t==wchar_t</tt>.  However,
    <tt>uint16_t</tt> was added in C99 and I suspect it would have
    already been too late to define it as <tt>wchar_t</tt> when <tt>u"literals"</tt>
    were adopted.  Additionally, that would have resulted in the same
    problems that we now face with <tt>int8_t</tt> commonly being
    defined in terms of a character type.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">Lyberta provided one
              example, but there are others.  For example, serialization
              and logging libraries.  Consider a modern JSON library; it
              is convenient to be able to write code like the following
              that just works.<br>
              <br>
              <tt><tt>json_object player;</tt></tt><br>
              <tt><tt><tt>uint16_t scores[] = { 16, 27, 13 };<br>
                  </tt>player["id"] = 42;<br>
                </tt>player["name"] = std::u16string("Skipper McGoof");<br>
                player["nickname"] = u"Goofy"; // stores a string<br>
                player["scores"] = scores;     // stores an array of
                numbers.<br>
              </tt><br>
              Note that the above works because <tt>uint16_t</tt> is
              effectively never defined in terms of a character type.</div>
          </blockquote>
          <div><br>
          </div>
          <div>Sure, but that feels like cherry-picking: You introduce
            one new type for one specific kind of thing (a pointer to
            certain units holding a string), but every other data that's
            a vector of essentially the same base units is still not
            distinguishable -- you wouldn't be able to distinguish
            scores from coordinates from other lists of numbers etc.</div>
        </div>
      </div>
    </blockquote>
    <br>
    That is a fair criticism.  The trend is to improve the ability to
    distinguish such unit kinds.  We see this in the C++20 std::chrono
    library and other libraries like
    <a class="moz-txt-link-freetext" href="https://github.com/nholthaus/units">https://github.com/nholthaus/units</a>.  C++11 user defined literals
    (despite some usability issues) are intended to help in this
    respect.  Where we have core language features (e.g., string
    literals), I think it is reasonable to be able to differentiate them
    without having to further decorate them.<br>
    <br>
    Tom.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAN49p6raVXu8roWpKLzBRFfZqwWGNwyP+LyYP5EauhkfqTf8Pw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">Having different types
              for character data makes the above possible without having
              to hard-code for specific string types.  In the concepts
              enabled world that we are moving into, this enables us to
              write concepts like the following that can then be used to
              constrain functions intended to work only on string-like
              types.<br>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>I take your word for it. I know nothing about "concepts".</div>
          <div><br>
          </div>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div>In ICU, when I get to actual UTF-8 processing,
                      I tend to either cast each byte to uint8_t or cast
                      the whole pointer to uint8_t* and call an internal
                      worker function.</div>
                    <div>Somewhat ironically, the fastest way to test
                      for a UTF-8 trail byte is via the opposite cast,
                      testing if (int8_t)b&lt;-0x40.</div>
                  </div>
                </div>
              </blockquote>
              <br>
              Assuming a 2s complement representation, which we're
              nearly set to be able to assume in C++20 (<a
                class="m_3185301157471047293moz-txt-link-freetext"
                href="http://wg21.link/p0907" target="_blank"
                moz-do-not-send="true">http://wg21.link/p0907</a>)!<br>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>Well, this is nice! Especially</div>
        </div>
        <blockquote style="margin:0 0 0 40px;border:none;padding:0px">
          <div class="gmail_quote">
            <div><em
                style="color:rgb(0,0,0);font-family:sans-serif;font-size:medium">Change</em><span
style="color:rgb(0,0,0);font-family:sans-serif;font-size:medium"> Right-shift
                is an arithmetic right shift which performs
                sign-extension.</span></div>
          </div>
        </blockquote>
        <div class="gmail_quote">
          <div>which should get static-analysis tools off our backs.</div>
          <div><br>
          </div>
          <div>Only because those have complained about code where we
            use arithmetic right shifts did I have to make a macro that
            does the normal (signed&gt;&gt;num_bits) on normal
            compilers, and a manual sign extension when compiling for
            static analysis...</div>
          <div>I don't think it's been an issue on any real compiler.
            All machines that anyone ever ported ICU to seem to use
            two's-complement integers of 8/16/32/... bits.</div>
          <div><br>
          </div>
          <div>markus</div>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>