<div dir="ltr"><div><br></div><div><br></div><div>If we think that in 5 or 10 or 15 years the world (ie platforms we care about) will finally realize UTF-8 is the right answer, maybe we should just support that, and just leave enough space that makes other encodings possible, but not required.<br></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Jan 9, 2019 at 11:21 PM Tom Honermann <<a href="mailto:tom@honermann.net">tom@honermann.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail-m_-3809803011841335108moz-cite-prefix">Thank you, Howard! A few inline
comments below...<br>
</div>
<div class="gmail-m_-3809803011841335108moz-cite-prefix"><br>
</div>
<div class="gmail-m_-3809803011841335108moz-cite-prefix">On 1/9/19 2:34 PM, Howard Hinnant
wrote:<br>
</div>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">Below is the Directions Group’s response to P1238R0 (<a class="gmail-m_-3809803011841335108moz-txt-link-freetext" href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1238r0.html" target="_blank">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1238r0.html</a>).
We welcome this effort. For now, we prefer to comment on general principles and direction only, rather than on technical details:</pre>
</blockquote>
<p>Understood. Do you have any feedback regarding the constraints
listed in the paper? Any desire to challenge one or more of them?</p>
<p>The constraint I'd most like feedback on is 1.1 (The ordinary and
wide execution encodings are implementation defined). If
Microsoft were to support use of UTF-8 as the execution encoding
(something they are making steps towards), it may be conceivable
that we could standardize the execution encoding as UTF-8 and have
that actually reflect existing practice (implementations would
presumably continue to offer support for legacy encodings as an
extension). However, this would leave some platforms behind; z/OS
being the primary example. z/OS continues to maintain a
significant presence in the industry (as I understand it, good
numbers are hard to find), but IBM has not been keeping up with
C++ standards. Some guidance regarding how to think about
platforms that are not keeping up with the standard would be
appreciated.<br>
</p>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">• The list of authors is suitably long. The task of formally bringing Unicode into C++ requires a breath of experience. Someone must look out for the interests of the various platforms (Linux, Windows, embedded, HPC, etc.) and the various groups of developers (OS, foundation library, end-user, etc.). We recommend trying to keep constant contact with people with current practical experiences in all of those fields. Also, Bob Steagall has done some work in this area based on his CPPCON talk; and IBM directions should be obtained from IBM representatives in the committee. Could you recruit them for this?</pre>
</blockquote>
I have reached out to such representatives. Hubert follows along
and chimes in from time-to-time. Bob Steagall has joined us at
times and he and I recently coordinated scheduling to better enable
him to join our meetings. I'd love to have more representation from
platform vendors and probably do need to spend some more time
recruiting again.<br>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">• §3. Direction: We feel that the scope and end goal of this work is not crystal clear: what is the goal of this SG? What is its deliverables?
        • Is it trying to unify the many wide character sets into ISO C++ or trying to add more of         the varying wide character sets into ISO C++, or even something else?
        • And maybe your goal is to give feedbacks and small tweaks to all these different wide character standards and see how they can best fit in ISO C++
        • Maybe §3 could be clarified?</pre>
</blockquote>
<p>We're still working to define our deliverables. As noted in the
paper, our short-term focus has been on small features for C++20.
Now that we're wrapping C++20 up, we'll need to get more focused
on the big picture for C++23 and beyond. While in San Diego we
identified a set of priorities for further work. Those can be
found near the bottom of the page at
<a class="gmail-m_-3809803011841335108moz-txt-link-freetext" href="http://wiki.edg.com/bin/view/Wg21sandiego2018/P1238R0" target="_blank">http://wiki.edg.com/bin/view/Wg21sandiego2018/P1238R0</a> (higher
numbers are higher priority). This list probably won't make much
sense in isolation though; we'll need to incorporate it into
future direction papers.</p>
<p>I'm not quite sure what is meant by "wide character sets" in the
questions above. Perhaps this is referring to legacy encodings
like Shift-JIS? Adding additional support for specific legacy
encodings is not something that we plan to work on. However, to
the extent that we interoperate with the implementation defined
execution encoding (which could be Shift-JIS or any other legacy
encoding), then the interfaces we design may have to accommodate
such encodings to some extent. Our focus is primarily providing
feature support for Unicode encodings, the Unicode character set,
and Unicode algorithms.<br>
</p>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">• §2. Guidelines: Beware of adjectives: “Avoid excessive inventiveness” and “avoid gratuitous departure from C”. These are good and necessary guidelines, but those adjectives can be awfully slippery. In particular, there is a danger of lowering the level of interfaces to the C level, causing verbosity and creating error- and security-hazards. Note: for about a decade, checking function argument in C++ was widely condemned as a gratuitous incompatibility with C.</pre>
</blockquote>
Agreed, thanks for pointing this out.<br>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">• §4.1. We like the idea of std::text and std::text_view with more suitable interfaces than the (bloated) std::string one. We wonder how encodings will be presented to/in the type system.</pre>
</blockquote>
How encodings are presented in the type system is TBD. P0244
(text_view) proposes encoding classes that also serve as encoding
tags. However, we also have proponents for std::text and
std::text_view being UTF-8 only. More work is needed to drive
consensus here.<br>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">• §4.1. Without saying how, we suggest that std::text and std::text_view should be usable as ranges (the Ranges TS and moved to the WP). Wherever possible, the technical issue should be resolved in favor of simple, elegant, and correct use, rather than consistency with older STL rules crafted for fixed-sized elements.</pre>
</blockquote>
Strongly agreed.<br>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">We hope you find these brief comments constructive.</pre>
</blockquote>
<p>Yes, thank you!</p>
<p>Tom.<br>
</p>
<blockquote type="cite">
<pre class="gmail-m_-3809803011841335108moz-quote-pre">DG
</pre>
<br>
<fieldset class="gmail-m_-3809803011841335108mimeAttachmentHeader"></fieldset>
<pre class="gmail-m_-3809803011841335108moz-quote-pre">_______________________________________________
Direction mailing list
<a class="gmail-m_-3809803011841335108moz-txt-link-abbreviated" href="mailto:Direction@lists.isocpp.org" target="_blank">Direction@lists.isocpp.org</a>
Subscription: <a class="gmail-m_-3809803011841335108moz-txt-link-freetext" href="http://lists.isocpp.org/mailman/listinfo.cgi/direction" target="_blank">http://lists.isocpp.org/mailman/listinfo.cgi/direction</a>
Searchable archives: <a class="gmail-m_-3809803011841335108moz-txt-link-freetext" href="http://lists.isocpp.org/direction/2019/01/index.php" target="_blank">http://lists.isocpp.org/direction/2019/01/index.php</a></pre>
</blockquote>
<p><br>
</p>
</div>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Be seeing you,<br></div>Tony<br></div></div>