[Tooling] [isocpp-modules] Filename requirements for the SG15 TR

Tom Honermann tom at honermann.net
Fri Mar 8 06:19:06 CET 2019


On 3/7/19 11:17 AM, Mathias Stearn wrote:
> (Forking thread)
> was: Dependency information for module-aware build tools
>
> On Thu, Mar 7, 2019 at 12:15 AM Tom Honermann <tom at honermann.net 
> <mailto:tom at honermann.net>> wrote:
>
>     I find myself thinking (as I so often do these days much to the
>     surprise of my past self), how does EBCDIC and z/OS fit in here? 
>     If we stick to JSON and require the dependency file to be UTF-8
>     encoded, would all file names in these files be raw8 encoded and
>     effectively unreadable (by humans) on z/OS?  Perhaps we could
>     allow more flexibility, but doing so necessarily invites locales
>     into the discussion (for those that are unaware, EBCDIC has code
>     pages too). For example, we could require that the selected locale
>     match between the producers and consumers of the file (UB if they
>     don't) and permit use of the string representation by transcoding
>     from the locale interpreted physical file name to UTF-8, but only
>     if reverse-transcoding produces the same physical file name,
>     otherwise the appropriate raw format must be used.
>
>
> I thought one of the reasons we are going the TR route rather than TS 
> or IS is to allow recommending 99% solutions that provide the best 
> experience for the vast majority of users while not necessarily being 
> applicable to everyone. Platforms and codebases where the TR 
> recommendations don't make sense are free to alter them for their 
> platform, or just come up with completely different solutions to the 
> problem. To me, this also implies that we are allowed to say that this 
> TR doesn't support files with invalid unicode names, however that is 
> best expressed on your platform. On windows, that means that the path 
> must meet the requirements of UTF-16, not just UCS-2. On utf8-native 
> platforms that have "bag-o-bytes" file names, it means that we don't 
> support files with invalid utf8 in their names. On non-unicode 
> platforms, that means either transcoding to/from utf8 on the way in 
> and out of the json format, or coming up a different format, accepting 
> that it will be specific to your platform.

I think platform specific differences are acceptable, but we should 
strive for general solutions.

I think the committee currently has a UTF-8 bias that doesn't 
necessarily reflect the global C++ community.  We don't have much 
representation from Japan or China where, as I understand it, Shift-JIS 
and GB18030 still have significant usage.  We also have few, if any, 
z/OS users in the committee outside of IBM representatives.  UTF-8 
dominates the web, no one questions that. But within the C++ ecosystem, 
I don't think UTF-8 dominates to a similar degree, at least not outside 
of the US and Europe.  I wish I had data to back that up.

>
> I also thought one of our goals was to describe a subset of what is 
> technically supported by the IS, that if you stay within these bounds, 
> you will have the least trouble on a majority of platforms.  This 
> means that we may want to recommend additional restrictions on file 
> names than just "well formed unicode", such as:
> * Don't have files that differ only by case (broken on 
> case-insensitive filesystems)
> * Don't have files that differ only by normalization form (broken on 
> at least OSX)
> * Stick to a small set of characters as word separators (maybe any of 
> " .-_", definitely not ':')
> * Avoid "poisoned" pathnames like PRN and CON
I think these are good guidelines and agree with recommending them.
>
> And perhaps we should also make recommendations that are likely to 
> increase sanity, such as:
> * Don't use characters that are squashed by the NFKC/NFKD 
> transformation (eg the Angstrom character)
> * Don't have control characters in file names
> * Don't mix scripts within a single path component or module identifier
> * Don't start source file names with a dot
> * Use one of the "blessed" file extensions for your source code (we 
> can have a big tent of blessed extensions, but naming a C++ source 
> file haha.py is just dumb)
Also good guidelines in my opinion.
>
> To be clear, I'm not suggesting we go as far as the "pitchfork" 
> proposal in dictating a project layout. More like discouraging 
> obviously bad things that would get you yelled at in code review in 
> basically all non-troll projects.

+1.

Tom.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/tooling/attachments/20190308/f92bb541/attachment.html 


More information about the Tooling mailing list