<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 3/8/19 10:31 AM, Mathias Stearn
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAH4rMhgz1YCoqDm8-RuxrtYn1-x_AmJRqz=d0M8p9a=z_ygr7g@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="auto">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Mar 7, 2019 at
7:19 PM Tom Honermann <<a
href="mailto:tom@honermann.net" target="_blank"
rel="noreferrer" moz-do-not-send="true">tom@honermann.net</a>>
wrote:</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>I think the committee currently has a UTF-8 bias
that doesn't necessarily reflect the global C++
community. We don't have much representation from
Japan or China where, as I understand it, Shift-JIS
and GB18030 still have significant usage. We also
have few, if any, z/OS users in the committee
outside of IBM representatives.</p>
</div>
</blockquote>
<div>Not to be dismissive, but z/OS developers are a tiny
subset of C++ developers.</div>
</div>
</div>
</div>
</div>
</blockquote>
This is true, but they also service an important market and already
face challenges due to being in a more niche space. If we can
reasonably make things easier for them, I think we should.<br>
<blockquote type="cite"
cite="mid:CAH4rMhgz1YCoqDm8-RuxrtYn1-x_AmJRqz=d0M8p9a=z_ygr7g@mail.gmail.com">
<div dir="auto">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div>Even when targeting z series hardware (we have for a
few years now), there is the option of using linux which
seems to be a fully supported platform.</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>Linux on z is great, but not helpful for those that have actual
z/OS requirements.</p>
<blockquote type="cite"
cite="mid:CAH4rMhgz1YCoqDm8-RuxrtYn1-x_AmJRqz=d0M8p9a=z_ygr7g@mail.gmail.com">
<div dir="auto">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div>If supporting z/OS makes the experience worse or more
complicated for other users, then I think the best
option for the broader ecosystem is to leave it out of
scope for the TR. That platform can offer an equivalent
mechanism that better fits its eccentricities. I want to
point out that EBCDIC seems to be the only remaining
encoding that isn't an ASCII-superset (shift-jis
replaces 2 characters in ASCII, but they don't matter
for our purposes), so to support it we would be taking
on substantial additional complexity that is only needed
for that one niche platform.<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
I don't consider any of what we've discussed so far as proposing
substantial additional complexity. In fact, what we've discussed is
also relevant to ASCII platforms.<br>
<blockquote type="cite"
cite="mid:CAH4rMhgz1YCoqDm8-RuxrtYn1-x_AmJRqz=d0M8p9a=z_ygr7g@mail.gmail.com">
<div dir="auto">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p> UTF-8 dominates the web, no one questions that.
But within the C++ ecosystem, I don't think UTF-8
dominates to a similar degree, at least not outside
of the US and Europe. I wish I had data to back
that up.</p>
</div>
</blockquote>
<div>From <a href="http://www.tomazos.com/actcd16.pdf"
target="_blank" rel="noreferrer"
moz-do-not-send="true">http://www.tomazos.com/actcd16.pdf</a>:
"We executed standard C++ translation phase 1 through 3
on the source files assuming a UTF8encoding. We found
that 99.0% of the source files tokenized successfully.
Of the remaining1.0% the majority of the errors were
decoding problems (most likely from ISO8859 /
Latin1encoding)"</div>
<div><br>
</div>
<div>This was a scan of all C and C++ packages in Ubuntu.
While that obviously only represents the open source,
unix-targetting subset of the C++ community, this seems
to imply that for that sub-community utf-8 (and the
ascii subset) dominates the source content. On top of
that, I would expect file names to have even less
non-ascii characters that file content, since it is
common to limit non-ascii characters to comments and
strings.<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>For that subset, I agree and those results match my expectations
for that subset. Worth noting that the survey doesn't answer the
question of what might break if characters outside the ASCII range
were introduced into that 99% of source files. e.g., those files
aren't necessarily consumed as UTF-8.<br>
</p>
<p>Tom.<br>
</p>
</body>
</html>