[SG16-Unicode] P1689: Encoding of filenames for interchange

Tom Honermann tom at honermann.net
Thu Sep 5 17:27:33 CEST 2019


On 9/5/19 6:51 AM, Niall Douglas wrote:
> Firstly, NUL is a valid filesystem path codepoint on some platforms, and
> I'd like to get the standard fixed on that incorrectness in the near
> future. I think that we can reasonably declare the native path separator
> codepoint the only invalid filesystem path codepoint, as otherwise
> filesystem::path doesn't work.
Please don't try and fix this.  I don't believe there is any use case 
for support of NUL characters within a path component and, clearly, C, 
C++, POSIX, and Win32 APIs have *never* supported this and existing 
interfaces obviously cannot be updated to accommodate embedded NUL 
characters.  Supporting this effectively breaks all code in existence 
that deals with file names with no motivation.
>
> Secondly, as I've often told you Thiago, the native Windows filesystem
> API is also byte based. struct UNICODE takes a *byte* length, not a
> wchar_t length. I'll agree that the Win32 path translation layer
> complicates that, but underneath it's all byte based, and I would like
> to hope that whatever modern i/o proposal WG21 chooses will expose
> reality on Windows.

I strongly disagree with the view point that NTFS is a byte based 
filesystem.  The fact that part of a filesystem neutral interface is 
weirdly designed (perhaps because it supports different filesystems, 
some of which might actually be byte based) does not mean that NTFS 
doesn't store 16-bit code units.

Tom.


More information about the Unicode mailing list