<div dir="ltr"><div>First of all</div><div><br></div><div>It seems Option 2b is a superset of Option 2a, and is just more work for everyone, with no work saved. ie Windows still needs to support single-bytes, but can use also use dual-bytes.</div><div>Are we encouraging Windows tools to *only* use dual-bytes and not support single-bytes (ie not have full support)? What's the benefit of 2b?</div><div>Can we narrow our choices by agreeing 2b isn't worthwhile?<br></div><div><br></div><div>Now, overall, if I understand the discussion correctly:</div><div><br></div><div><div>- if you encode the raw bytes (narrow or wide), you should add the encoding as well (ie "EBCIDIC", etc).</div><div>This implies every tool needs to support (and translate) every encoding, or accept that we will have non-interoperable tools, platform specific tools.</div><div>Also, is the set of encodings finite, or can I add the "TONY" encoding?<br></div><div><br></div><div>- if you encode the raw bytes, there might still be cases not covered, might need to fall back to UTF8. It sounds like *no* answer will be guaranteed to work.</div><div><br></div><div>So let's go with UTF8, and tell tools not to spit out files that can't be found via UTF8. How many of the tools we currently use already have those limitations?</div><div><br></div><div>Lastly,</div><div><br></div><div>I think, since C++ is a "systems" language, there may be value in APIs that expose the full range of filenames that the OS can handle. But that's a separate discussion, I think.</div><div>The filenames for tool interchange don't need to support everything. They only need to support what is actually used.</div><div><br></div><div>Are there systems where filenames *that developers use* can't be found via UTF8?<br></div><div><br></div><div>P.S. I think that if we say UTF8, all the tools will fall into line, and warn users if they ever encounter a filename that can't be found via UTF8.</div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 6, 2019 at 1:23 PM Thiago Macieira <<a href="mailto:thiago@macieira.org">thiago@macieira.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Friday, 6 September 2019 06:38:45 PDT Brad King wrote:<br>
> - UTF-8. This is allowed *only if a lossless round trip* is possible<br>
> between the filesystem's native binary sequence and UTF-8. E.g. on<br>
> Windows we should not have to require the full general format to represent<br>
> a simple path like "a.cxx" just because the filesystem APIs use wide chars.<br>
<br>
Hello Brad<br>
<br>
The problem is that the filesystem's native binary sequence is unspecified and <br>
can fail to match between programs running at the same time as well as <br>
different invocations of the same program. So your requirement that it be <br>
lossless is insufficient to ensure reproduceability.<br>
<br>
So I repeat what I said to Niall: choose one only. If you allow the Unicode <br>
text to be authoritative under any scenario, that means you're allowing <br>
failures to occur. In that case, I recommend choosing Option 1 and using <br>
*only* Unicode text and "damn the torpedoes".<br>
<br>
-- <br>
Thiago Macieira - thiago (AT) <a href="http://macieira.info" rel="noreferrer" target="_blank">macieira.info</a> - thiago (AT) <a href="http://kde.org" rel="noreferrer" target="_blank">kde.org</a><br>
Software Architect - Intel System Software Products<br>
<br>
<br>
<br>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Be seeing you,<br></div>Tony<br></div></div>