[SG16-Unicode] P1689: Encoding of filenames for interchange
Lyberta
lyberta at lyberta.net
Fri Sep 6 17:44:00 CEST 2019
Niall Douglas:
> (I might add that I don't think WTF valid in RFC conforming JSON.
> Strings are in UTF, or they are not JSON strings and need to be byte
> arrays. The only RFC compliant way of storing potentially invalid UTF
> strings is as a byte array, to my best knowledge).
I said array of numbers, numbers can be 8bit for ASCII and UTF-8, 16 bit
for UTF-16, WTF-16 and others and 32-bit for UTF-32.
{
"encoding":"WTF-16",
"units":[ 84, 101, 115, 116 ]
}
This is how you would encode path "Test" on NTFS, as example.
If such JSON would be opened on z/OS with EBCDIC, then the WTF-16 will
be conceptually converted to this:
{
"encoding":"EBCDIC",
"units":[ 228, 133, 162, 163 ]
}
> That's called URL.
>
> These two files in my filesystem:
>
> $ ls -1ib /tmp/*.c
> 5303210 /tmp/\351.c
> 5303209 /tmp/é.c
>
> Are uniquely identified by these normalised IRIs:
> file:///tmp/%E9.c
> file:///tmp/é.c
>
> According to RFC 3987, é is the same as %C3%A9.
RFC 3987 uses UTF-8 for numeric values. That means it is as useful as
UTF-8. My proposal supports EBCDIC and other non-ASCII compatible
encodings by transferring numbers + metadata.
That said, I think going UTF-8 only JSON strings for paths should cover
99% of cases and since the format in question is versioned, we can
always add non-UTF-8 paths later.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://www.open-std.org/pipermail/unicode/attachments/20190906/0cd49336/attachment.bin
More information about the Unicode
mailing list