<div dir="ltr">I'm all for fixing that personally with restrictions probably<div><br></div><div>1/ Let's not change the character set acceptable in include directives. Mapping non-ascii to filename is a portability nightmare</div><div>2/ I am not confortable with allowing that in module names (in part for the same reasons), but i don't think it should be restricted at the language level either</div><div>3/ TR31 is a very good start but probably too lenient about mixed script identifiers (see for example <a href="http://perl11.org/blog/unicode-identifiers.html">http://perl11.org/blog/unicode-identifiers.html</a> ) - however we should defer to the Unicode TR as much as possible rather to pretend we have a better understanding of the issue than they do<br></div><div><br></div><div>Overall, it's something that I wish was implemented but that I think people should not use outside of novelty.</div><div><br></div><div>Implementations would have to respect TR31, which implies to use icu until we actually ship unicode support.<br></div><div><br></div><div>I'm afraid such changes will make it harder for developers to work on system that do not support Unicode, which is a good reason to mandate it, especially given it would have no bearing on the platforms the code can run on :)</div><div><br></div><div><br></div><div><br></div><div>As for module names, i don't think we get to chose</div><div><br></div><div>There is some file, with a given name which is a bag of bytes and which we don't get to rename</div><div>There is some module with some name which is the basic character set (or with your proposal at hand a Unicode identifier), which we don't get to rename.</div><div>The two must match _somehow_</div><div><br></div><div>So either we limit the identifiers to ASCII or we "enforce" (by the way of the TR) that filenames must be valid utf8-encoded Unicode matching the module name.</div><div>Unfortunately, not all file-systems will support that.</div><div>The limitation is more related to file systems that it is related to C++ and we have virtually no control beside restricting the set of filesystems that are able to store</div><div>C++. Which I am all for but I don't think people will go for that.</div><div><br></div><div>Alternatively we don't try to put any restrictions and people will ultimately realize what and what doesn't work or let tools set their own restrictions. Which doesn't help</div><div>the ecosystem at all - but it's basically what we have always done</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 10 May 2019 at 18:43, JF Bastien <<a href="mailto:cxx@jfbastien.com">cxx@jfbastien.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><font face="arial, helvetica, sans-serif">Hi C++ <span style="color:rgb(0,0,0);font-size:medium">પกٱƈѻɗﻉ</span> <span style="color:rgb(0,0,0);font-size:medium">ḟäṅṡ 👋</span>!</font><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">The current list of valid identifier characters is pretty silly (see [<b><a href="http://lex.name" target="_blank">lex.name</a></b>] 5.10 Identifiers or <a href="https://en.cppreference.com/w/cpp/language/identifiers" target="_blank">cppreference summary</a>). It allows characters such as zero-width joiner and zero-width space among a few silly things (see <a href="https://godbolt.org/z/sBJk1k" target="_blank">how bad this can get</a>, h/t Richard Kogelnig</font><span style="font-family:arial,helvetica,sans-serif">).</span></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">I asked where it came from, and IIUC John looked at Unicode and cobbled the list of valid ranges manually. That ain't great.</font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">Is this group interested in fixing things?</font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">There's already an existing standard for this, maybe it's a thing we can adopt as-is or use as a starting point:</font></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><a href="https://unicode.org/reports/tr31/" target="_blank"><font face="arial, helvetica, sans-serif">https://unicode.org/reports/tr31/</font></a></div></blockquote><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">Further, the tooling group was just talking about module names. I think we should allow any valid identifier name as module name, and look at how this could map to file names for a tooling TR's purpose.</font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">Thanks,</font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><span style="color:rgb(0,0,0);font-size:medium"><font face="arial, helvetica, sans-serif">JF</font></span><br></div></div></div>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div>