<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 6, 2019 at 3:52 PM Thiago Macieira <<a href="mailto:thiago@macieira.org">thiago@macieira.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Friday, 6 September 2019 10:49:56 PDT Tony V E wrote:<br>
> First of all<br>
> <br>
> It seems Option 2b is a superset of Option 2a, and is just more work for<br>
> everyone, with no work saved. ie Windows still needs to support<br>
> single-bytes, but can use also use dual-bytes.<br>
> Are we encouraging Windows tools to *only* use dual-bytes and not support<br>
> single-bytes (ie not have full support)? What's the benefit of 2b?<br>
> Can we narrow our choices by agreeing 2b isn't worthwhile?<br>
<br>
Indeed, it's a superset that spreads the pain by making everyone have to <br>
implement conversions, for the benefit of the case where a _WIN32 tool <br>
produces a file that is read by another _WIN32 tool: then it can do pass-<br>
through.<br>
<br>
> Now, overall, if I understand the discussion correctly:<br>
> <br>
> - if you encode the raw bytes (narrow or wide), you should add the encoding<br>
> as well (ie "EBCIDIC", etc).<br>
> This implies every tool needs to support (and translate) every encoding, or<br>
> accept that we will have non-interoperable tools, platform specific tools.<br>
> Also, is the set of encodings finite, or can I add the "TONY" encoding?<br>
<br>
There's no need to indicate which encoding was used because the options 2 <br>
encode the raw bytes that are used with the filesystem API. The data is an <br>
opaque bag of bits.<br></blockquote><div><br></div><div>but it is only valid if you use those bits with the same API and encoding that they came from (if you don't know the encoding).</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
If you want to *display* that to the user, then converting to text is <br>
necessary. But all the tools that display file names have such functionality, <br>
since they already deal with file names obtained from the FS API.<br>
<br>
> - if you encode the raw bytes, there might still be cases not covered,<br>
> might need to fall back to UTF8. It sounds like *no* answer will be<br>
> guaranteed to work.<br>
<br>
Which case could there be that the raw bytes fail but UTF-8 supports? I would <br>
think it's the other way around.<br></blockquote><div><br></div><div>the case where the encoding changed. Or the raw bytes are being used with the wrong FS API.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> Are there systems where filenames *that developers use* can't be found via<br>
> UTF8?<br>
<br>
The problem is what happens when the locale isn't UTF-8, which is common <br>
enough when LC_ALL=C was set in the environment.<br>
<br></blockquote><div><br></div><div>And how common is that (besides you :-)</div><div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
But I repeat what I said: I am fine with Option 1 ("file names are text"), <br>
knowing that there are failure modes. This has been the case for Qt for two <br>
decades. We call those "filesystem corruption" and tell our users to go fix <br>
with a system tool.<br>
<br>
-- <br>
Thiago Macieira - thiago (AT) <a href="http://macieira.info" rel="noreferrer" target="_blank">macieira.info</a> - thiago (AT) <a href="http://kde.org" rel="noreferrer" target="_blank">kde.org</a><br>
Software Architect - Intel System Software Products<br>
<br>
<br>
<br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div>Be seeing you,<br></div>Tony<br></div></div></div>