[SG16-Unicode] P1689: Encoding of filenames for interchange

Thiago Macieira thiago at macieira.org
Fri Sep 6 19:23:40 CEST 2019


On Friday, 6 September 2019 06:38:45 PDT Brad King wrote:
> - UTF-8.  This is allowed *only if a lossless round trip* is possible
>   between the filesystem's native binary sequence and UTF-8.  E.g. on
>   Windows we should not have to require the full general format to represent
> a simple path like "a.cxx" just because the filesystem APIs use wide chars.

Hello Brad

The problem is that the filesystem's native binary sequence is unspecified and 
can fail to match between programs running at the same time as well as 
different invocations of the same program. So your requirement that it be 
lossless is insufficient to ensure reproduceability.

So I repeat what I said to Niall: choose one only. If you allow the Unicode 
text to be authoritative under any scenario, that means you're allowing 
failures to occur. In that case, I recommend choosing Option 1 and using 
*only* Unicode text and "damn the torpedoes".

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products





More information about the Unicode mailing list