<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 8 Mar 2019 at 06:19 Tom Honermann <<a href="mailto:tom@honermann.net">tom@honermann.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<div class="m_-3774127142734537496moz-cite-prefix">On 3/7/19 11:17 AM, Mathias Stearn
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div>(Forking thread)</div>
<div>was: Dependency information for module-aware build tools</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Mar 7, 2019 at
12:15 AM Tom Honermann <<a href="mailto:tom@honermann.net" target="_blank">tom@honermann.net</a>>
wrote:</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>I find myself thinking (as I so often do these days
much to the surprise of my past self), how does EBCDIC
and z/OS fit in here? If we stick to JSON and require
the dependency file to be UTF-8 encoded, would all
file names in these files be raw8 encoded and
effectively unreadable (by humans) on z/OS? Perhaps
we could allow more flexibility, but doing so
necessarily invites locales into the discussion (for
those that are unaware, EBCDIC has code pages too).
For example, we could require that the selected locale
match between the producers and consumers of the file
(UB if they don't) and permit use of the string
representation by transcoding from the locale
interpreted physical file name to UTF-8, but only if
reverse-transcoding produces the same physical file
name, otherwise the appropriate raw format must be
used.</p>
</div>
</blockquote>
<div><br>
</div>
<div>I thought one of the reasons we are going the TR route
rather than TS or IS is to allow recommending 99%
solutions that provide the best experience for the vast
majority of users while not necessarily being applicable
to everyone. Platforms and codebases where the TR
recommendations don't make sense are free to alter them
for their platform, or just come up with completely
different solutions to the problem. To me, this also
implies that we are allowed to say that this TR doesn't
support files with invalid unicode names, however that is
best expressed on your platform. On windows, that means
that the path must meet the requirements of UTF-16, not
just UCS-2. On utf8-native platforms that have
"bag-o-bytes" file names, it means that we don't support
files with invalid utf8 in their names. On non-unicode
platforms, that means either transcoding to/from utf8 on
the way in and out of the json format, or coming up a
different format, accepting that it will be specific to
your platform.</div>
</div>
</div>
</div>
</blockquote>
</div><div bgcolor="#FFFFFF" text="#000000"><p>I think platform specific differences are acceptable, but we
should strive for general solutions.</p>
<p>I think the committee currently has a UTF-8 bias that doesn't
necessarily reflect the global C++ community. We don't have much
representation from Japan or China where, as I understand it,
Shift-JIS and GB18030 still have significant usage. We also have
few, if any, z/OS users in the committee outside of IBM
representatives. UTF-8 dominates the web, no one questions that.
But within the C++ ecosystem, I don't think UTF-8 dominates to a
similar degree, at least not outside of the US and Europe. I wish
I had data to back that up.<br>
</p></div><div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>I also thought one of our goals was to describe a
subset of what is technically supported by the IS, that if
you stay within these bounds, you will have the least
trouble on a majority of platforms. This means that we
may want to recommend additional restrictions on file
names than just "well formed unicode", such as:</div>
<div>* Don't have files that differ only by case (broken on
case-insensitive filesystems)</div>
<div>* Don't have files that differ only by normalization
form (broken on at least OSX)</div>
* Stick to a small set of characters as word separators
(maybe any of " .-_", definitely not ':')</div>
<div class="gmail_quote">* Avoid "poisoned" pathnames like PRN
and CON<br>
</div>
</div>
</div>
</blockquote></div><div bgcolor="#FFFFFF" text="#000000">
I think these are good guidelines and agree with recommending them.</div><div bgcolor="#FFFFFF" text="#000000"><br>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>And perhaps we should also make recommendations that
are likely to increase sanity, such as:<br>
</div>
<div>* Don't use characters that are squashed by the
NFKC/NFKD transformation (eg the Angstrom character)</div>
<div>* Don't have control characters in file names</div>
<div>* Don't mix scripts within a single path component or
module identifier<br>
</div>
<div>* Don't start source file names with a dot</div>
<div>* Use one of the "blessed" file extensions for your
source code (we can have a big tent of blessed extensions,
but naming a C++ source file haha.py is just dumb)</div>
</div>
</div>
</div>
</blockquote></div><div bgcolor="#FFFFFF" text="#000000">
Also good guidelines in my opinion.</div><div bgcolor="#FFFFFF" text="#000000"><br></div></blockquote><div><br></div><div>I think we could even recommend plain ASCII (or something that can be mapped to) in file names - even a subset of ASCII.</div><div>It doesn't remove anything from users IMO, avoid a lot of issues and is portable so that the code can actually be shared across platforms.</div><div><br></div><div>But there may be various levels of specifications:</div><div><br></div><div>DON'T have control characters in file names</div><div>PREFER only using characters in the ASCII character set</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>To be clear, I'm not suggesting we go as far as the
"pitchfork" proposal in dictating a project layout. More
like discouraging obviously bad things that would get you
yelled at in code review in basically all non-troll
projects.</div>
</div>
</div>
</div>
</blockquote>
</div><div bgcolor="#FFFFFF" text="#000000"><p>+1.<br>
</p>
<p>Tom.<br>
</p>
</div>
_______________________________________________<br>
Modules mailing list<br>
<a href="mailto:Modules@lists.isocpp.org" target="_blank">Modules@lists.isocpp.org</a><br>
Subscription: <a href="http://lists.isocpp.org/mailman/listinfo.cgi/modules" rel="noreferrer" target="_blank">http://lists.isocpp.org/mailman/listinfo.cgi/modules</a><br>
Link to this post: <a href="http://lists.isocpp.org/modules/2019/03/0216.php" rel="noreferrer" target="_blank">http://lists.isocpp.org/modules/2019/03/0216.php</a><br>
</blockquote></div></div>