[SG16-Unicode] P1689: Encoding of filenames for interchange

Lyberta lyberta at lyberta.net
Sat Sep 7 17:26:00 CEST 2019


Thiago Macieira:
> On Friday, 6 September 2019 19:17:00 PDT Lyberta wrote:
>> I think if the machine-readable output depends on locale, the author of
>> the program seriously messed up.
> 
> Oh, I agree with you. The problem is that the standard C library (as extended 
> by POSIX) does not provide the API to make that happen *and* support 
> internationalisation. And that's assuming the tool even have a "machine 
> readable" format in the first place. In the Unix tradition, you just scrape 
> the output of tools.

Then don't use standard C library. On POSIX use open(), read() and
write(), have your own Unicode layer on top and read/write UTF-8 JSON if
you want to output anything machine-readable.

There is no such thing as plain text and Unix philosophy is dead.

I have a C++ proposal for binary IO/serialization here:

https://github.com/Lyberta/cpp-io

It was already reviewed by Niall twice and hopefully by C++23 we'll have
sane binary IO in the standard. I don't have to plans to fix C at this
point though because it doesn't have an analog of std::byte yet.

> But the input is not Unicode, it's file paths. On Unix, it is possible to pass 
> binary input in the command-line. With some effort, you can even pass NULs to 
> specially crafted receiver applications. The std::filesystem API appears to 
> have a way to retrieve the native raw format, which some application may need.

Yeah, it's all because C decided to have char as both bytes and
characters and doomed us all for ~50 years of pain. We need to decide if
main() should get text or characters and fix it.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://www.open-std.org/pipermail/unicode/attachments/20190907/9de963bd/attachment.bin 


More information about the Unicode mailing list