<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 9/9/19 10:31 AM, Tony V E wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Sep 9, 2019 at 3:31
AM Corentin <<a href="mailto:corentin.jabot@gmail.com"
moz-do-not-send="true">corentin.jabot@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 9 Sep 2019 at
01:25, Tom Honermann <<a
href="mailto:tom@honermann.net" target="_blank"
moz-do-not-send="true">tom@honermann.net</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="auto"><br>
<div dir="ltr">On Sep 8, 2019, at 3:31 PM, Tony V E
via Lib <<a href="mailto:lib@lists.isocpp.org"
target="_blank" moz-do-not-send="true">lib@lists.isocpp.org</a>>
wrote:<br>
<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div>Do we have / could we have / should we
have</div>
<div>a clear long term (20 years) direction
for text in C++?<br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
I would like that very much, but we don’t control
the ecosystem, and will have to, to some degree,
roll with where the community takes us. </div>
</blockquote>
<div><br>
</div>
<div>The community is waiting for us to catch up and i
do believe we have some control</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>yep, every other language just decided for the community.</div>
</div>
</div>
</blockquote>
<p>That is not correct. Examples include C, Fortran, and COBOL. In
general, I think languages that decided for the community had a
few advantages that we do not:</p>
<ol>
<li>Less history and legacy code to support.<br>
</li>
<li>Fewer implementations.</li>
<li>Designed with more abstractions (e.g., VM languages) that
enabled sandboxing the language environment (with associated
performance costs).<br>
</li>
<li>Designed after Unicode was standardized.<br>
</li>
</ol>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>As C++, we have to allow the user to do _anything_, but
they already can. And they will still be able to.</div>
</div>
</div>
</blockquote>
Indeed, but as a standard, one of our responsibilities is to produce
a specification that reflects existing practice. We can (and
should) lead, but need to remain focused on support for existing
code as well. I worry about repeating the Python 2->3 experience
if we aren't careful.<br>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="auto">
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div><br>
</div>
<div>ie the long term direction is unicode.</div>
<div>and/or specifically the long term
direction is UTF8.</div>
</div>
</div>
</blockquote>
<div><br>
</div>
I think we do have wide spread agreement on that,
though UTF-16 is likely to remain strongly
relevant in some niches. </div>
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div>We expect everyone to use char8_t
then? Or we expect char to become utf8
someday?</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>I think it is very unlikely that there will
be a mass migration to char8_t. My expectation
is that it will be used for the internal
encoding within some percentage of new projects
and components. </div>
<div><br>
</div>
<div>With regard to char, I expect it to remain
the type used for text that may or may not be
UTF-8.</div>
<div><br>
</div>
<div>I think Microsoft will eventually provide
(non-experimental) means to use UTF-8 with Win32
and that this will likely come in three forms </div>
</div>
</div>
</blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="auto">
<div>
<div><br>
</div>
<div>1) support for UTF-8 as the system wide
Active Code Page (ACP). This is already
available as an experimental option. </div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>They di</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px
0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="auto">
<div>
<div><br>
</div>
<div>2) support for executables to opt-in to a
per-process override of the system wide ACP. In
this mode, stdio would presumably traffic in the
system wide ACP and require transcoding (I don’t
think implicit transcoding is realistic). This
is already available as an experimental option. </div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<div>They do</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>How does "override system wide ACP" and "stdio traffic in
system wide ACP" fit together? Either my process thinks the
world is on the UTF8 ACP, or it doesn't. I would expect
transcoding or whatever else is required. I would expect
fopen to work, etc.<br>
</div>
</div>
</div>
</blockquote>
Basically, the option (a declaration in a manifest file) causes the
Win32 "ANSI" APIs to work in UTF-8 mode for that process only.
Other processes on the system that don't opt-in to the option run
with whatever the system ACP is. So, any information exchanged
between them will require transcoding. I would expect implicit
transcoding for command line options and environment variables
(those are already implicitly transcoded from their wide variants),
but stdio is unaffected. So, piped data between processes that both
adhere to (their perception of) the ACP would require intervention.
But, stdio can be binary anyway. And executable written in some
other languages expect UTF-8 regardless, so I don't think this is a
significant issue.<br>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>If that works, I believe almost every Windows developer
will turn this on, and char will be utf8 (as it is on linux,
IIUC).</div>
<div>Most code will "just work".</div>
</div>
</div>
</blockquote>
<p>Quite possibly.</p>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>In 10 years, it will be the assumption.</div>
</div>
</div>
</blockquote>
Representatives at Microsoft have so far stated that their testing
of the UTF-8 ACP option revealed that it breaks too many widely
deployed applications for them to make it a default at this point.
And their strong commitment to backward compatibility may invite a
longer migration period.<br>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>I think we sure steer in the direction that char becomes
UTF8.</div>
</div>
</div>
</blockquote>
I agree, and that is what is already happening.<br>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>In the short term we could say char is whatever the
system is in, but we encourage UTF8. Or something like
that. Maybe the standard "assumes" UTF8, but
implementations are allowed to vary. Whatever "assumes"
means for a given API.</div>
</div>
</div>
</blockquote>
I think that is the status quo. We could add a non-normative note
encouraging UTF-8, but I think the likelihood of any greenfield
project picking anything else is highly unlikely.<br>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div>We could define things like fmt to be "if the system is
UTF8, then behaviour is X, otherwise YMMV (ie implementation
defined)".</div>
</div>
</div>
</blockquote>
<p>We could. But that makes the behavior locale dependent because,
on most platforms, that is the reality.<br>
</p>
<p>Tom.<br>
</p>
<blockquote type="cite"
cite="mid:CAOHCbiuG3+6nF3q8oxoWM-AqDEH2p80_DjoToz9Xk07+VW-_1A@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote"><br clear="all">
</div>
<br>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>Be seeing you,<br>
</div>
Tony<br>
</div>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>