<div dir="ltr"><div dir="ltr"><div>(Forking thread)</div><div>was: Dependency information for module-aware build tools</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 7, 2019 at 12:15 AM Tom Honermann <<a href="mailto:tom@honermann.net">tom@honermann.net</a>> wrote:</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<p>I find myself thinking (as I so often do these days much to the
surprise of my past self), how does EBCDIC and z/OS fit in here?
If we stick to JSON and require the dependency file to be UTF-8
encoded, would all file names in these files be raw8 encoded and
effectively unreadable (by humans) on z/OS? Perhaps we could
allow more flexibility, but doing so necessarily invites locales
into the discussion (for those that are unaware, EBCDIC has code
pages too). For example, we could require that the selected
locale match between the producers and consumers of the file (UB
if they don't) and permit use of the string representation by
transcoding from the locale interpreted physical file name to
UTF-8, but only if reverse-transcoding produces the same physical
file name, otherwise the appropriate raw format must be used.</p></div></blockquote><div><br></div><div>I thought one of the reasons we are going the TR route rather than TS or IS is to allow recommending 99% solutions that provide the best experience for the vast majority of users while not necessarily being applicable to everyone. Platforms and codebases where the TR recommendations don't make sense are free to alter them for their platform, or just come up with completely different solutions to the problem. To me, this also implies that we are allowed to say that this TR doesn't support files with invalid unicode names, however that is best expressed on your platform. On windows, that means that the path must meet the requirements of UTF-16, not just UCS-2. On utf8-native platforms that have "bag-o-bytes" file names, it means that we don't support files with invalid utf8 in their names. On non-unicode platforms, that means either transcoding to/from utf8 on the way in and out of the json format, or coming up a different format, accepting that it will be specific to your platform.</div><div><br></div><div>I also thought one of our goals was to describe a subset of what is technically supported by the IS, that if you stay within these bounds, you will have the least trouble on a majority of platforms. This means that we may want to recommend additional restrictions on file names than just "well formed unicode", such as:</div><div>* Don't have files that differ only by case (broken on case-insensitive filesystems)</div><div>* Don't have files that differ only by normalization form (broken on at least OSX)</div>* Stick to a small set of characters as word separators (maybe any of " .-_", definitely not ':')</div><div class="gmail_quote">* Avoid "poisoned" pathnames like PRN and CON<br><div><br></div><div>And perhaps we should also make recommendations that are likely to increase sanity, such as:<br></div><div>* Don't use characters that are squashed by the NFKC/NFKD transformation (eg the Angstrom character)</div><div>* Don't have control characters in file names</div><div>* Don't mix scripts within a single path component or module identifier<br></div><div>* Don't start source file names with a dot</div><div>* Use one of the "blessed" file extensions for your source code (we can have a big tent of blessed extensions, but naming a C++ source file haha.py is just dumb)</div><div><br></div><div>To be clear, I'm not suggesting we go as far as the "pitchfork" proposal in dictating a project layout. More like discouraging obviously bad things that would get you yelled at in code review in basically all non-troll projects.</div></div></div></div>