[SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++
Thiago Macieira
thiago at macieira.org
Tue Apr 30 17:17:20 CEST 2019
On Tuesday, 30 April 2019 07:49:50 PDT Tom Honermann wrote:
> Can you elaborate on this? What do you mean by the "kernel assuming
> your userspace is UTF-8"? Do you mean that the filesystem driver will
> attempt to, by default, present file names composed of 16-bit code units
> transcoded to UTF-8 by default? Given that file names do not have an
> explicit encoding, this seems reasonable to me and even necessary to
> avoid name conflicts from otherwise lossy transcoding operations.
That's exactly what I meant. Both VFAT and NTFS store filenames in UTF-16, so
the kernel must translate to and from that to some 8-bit encoding chosen at
mount time so those names can be presented to userspace. Actually, the driver
must translate because the *kernel* VFS layer requires 8-bit filenames anyway.
This means filenames on VFAT and NTFS *do* have an encoding. You cannot use
arbitrary binary file names since those wouldn't convert to UTF-16 and
couldn't be saved. Quite frankly, you shouldn't choose any iocharset=
different from UTF-8, since there could be file names on disk that wouldn't
convert and couldn't be represented.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel System Software Products
More information about the Unicode
mailing list