[SG16-Unicode] P1689: Encoding of filenames for interchange

Tom Honermann tom at honermann.net
Thu Sep 5 17:34:33 CEST 2019


On 9/5/19 8:57 AM, Niall Douglas wrote:
> On 05/09/2019 12:11, Lyberta wrote:
>> Do we really expect C++20 build systems to run on filesystem paths not
>> representable as UTF-8? Users who do that really shoot themselves in the
>> foot.
> As Tom likes to say, EBCDIC.

Lyberta stated "representable as UTF-8" and as far as I know, all EBCDIC 
code pages support an isomorphic translation to Unicode.

That being said, there is no guarantee that a filename is valid EBCDIC.  
If EBCDIC code pages exist that don't assign characters for all code 
point values (I don't know if any such code pages exist; I would guess 
not), then use of such code points in a filename would not specify a 
character.

The more complicated scenario is where filenames have no associated 
encoding and must be interpreted according to the current locale and the 
locale specifies a non-trivial encoding (like UTF-8 or Shift-JIS) where 
decoding errors are possible.  In this case, the only solution for 
accurate filename transmission is a binary encoding.

> P1689 ought to be a taker and support filenames which are invalid UTF.
> I'm particularly thinking of temporary file names which build systems
> often deal with, as some temporary file name generators make invalid
> UTF. And that's totally legal on the filesystem.

+1

Tom.



More information about the Unicode mailing list