<div dir="auto">I'm bringing a late paper to Belfast that will propose adopting UAX31 in its simplest form. Identifiers as XID_START + _ followed by XID_CONTINUE. Portable source required to be NFC. Using unassigned code points ill-formed. <div dir="auto">That would be mean no control characters embedded in identifiers, and also no emoji. That's in addition to a paper proposing that the wording around character sets and encodings be modernized. </div><div dir="auto"><br></div><div dir="auto">There are some implications for reflection, too, as we will have to deal with translation from internal representation to something in a portable way that does not lose fidelity, as narrow string literals may not support the full range. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 28, 2019, 16:37 JF Bastien <<a href="mailto:cxx@jfbastien.com" target="_blank" rel="noreferrer">cxx@jfbastien.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 28, 2019 at 1:26 PM Mathias Stearn <<a href="mailto:redbeard0531%2Bisocpp@gmail.com" rel="noreferrer noreferrer" target="_blank">redbeard0531+isocpp@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 28, 2019 at 12:58 PM Richard Smith <<a href="mailto:richardsmith@google.com" rel="noreferrer noreferrer" target="_blank">richardsmith@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Mon, Oct 28, 2019 at 9:39 AM Mathias Stearn via Core <<a href="mailto:core@lists.isocpp.org" rel="noreferrer noreferrer" target="_blank">core@lists.isocpp.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr">Is it just uppercase letters in the basic source character set, or anything considered an uppercase letter in the universal character set after phase 1 transcoding and universal-character-name resolution? Or is there some other definition of uppercase?</div></div></div></blockquote><div><br></div><div>My interpretation:</div><div><br></div><div>* We don't resolve universal-character-names; rather, we *form* them. (Eg, int façade; is converted into int fa\u00e7ade;) So for example _Ç becomes _\u00c7, which doesn't start with an underscore followed by an uppercase letter (it's an underscore followed by a slash).<br></div></div></div></blockquote><div><br></div><div>I considered that but it felt like an overly legalistic reading at the time. It also seems to be counter to <a href="http://eel.is/c++draft/lex.name#1" rel="noreferrer noreferrer" target="_blank">http://eel.is/c++draft/lex.name#1</a>. On the other hand, that first sentence "An identifier is an arbitrarily long sequence of letters and digits." is clearly incorrect because many of the allowed code points (including all emoji) are neither letters nor digits.</div><div><br></div><div>It also seems vaguely counter to my reading of the "spirit" of <a href="http://eel.is/c++draft/lex.phases#1.1.sentence-4" rel="noreferrer noreferrer" target="_blank">http://eel.is/c++draft/lex.phases#1.1.sentence-4</a>, but I have no idea what the normative impact of that sentence is. (I hope compilers internal encoding choices are not observable...)</div><div><br></div><div>I guess [lex] needs some cleanup in general.</div></div></div></blockquote><div><br></div><div>Details like these are why we really should address <a href="https://github.com/sg16-unicode/sg16/issues/48" rel="noreferrer noreferrer" target="_blank">https://github.com/sg16-unicode/sg16/issues/48</a></div><div>instead of doing point solutions for every single issue.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div></div><div>* Unicode (to which we have a normative reference) defines uppercase, and we follow that, but we happen to only ever apply it to the basic source character set because of the above rewriting.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr"><div><div>I have a slight preference for restricting to just A-Z so that it doesn't require humans or tools to consult the unicode data tables to decide if an identifier is safe to use.</div></div></div></div></div></blockquote><div><br></div><div>Regardless of how we express the rule, I agree with this direction.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr"><div><div>Proposed resolution:</div><div><br></div><div>Replace [lex.names]/3.2 with:</div><div><br></div><div>Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase <del>letter</del><ins><i>nondigit</i></ins> is reserved to the implementation for any use.</div></div></div></div></div></blockquote><div><br></div><div>... and I think this is a fine wording improvement, whether or not we think it's formally necessary.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div dir="ltr"><div><div>Alternatively we could either create a new grammar production for uppercase <i>nondigit</i>s, or just say something like "one of the universal characters in the range 0041-005A (A-Z)"</div><div><br></div><div><br></div></div></div>
</div></div>
_______________________________________________<br>
Core mailing list<br>
<a href="mailto:Core@lists.isocpp.org" rel="noreferrer noreferrer" target="_blank">Core@lists.isocpp.org</a><br>
Subscription: <a href="https://lists.isocpp.org/mailman/listinfo.cgi/core" rel="noreferrer noreferrer noreferrer" target="_blank">https://lists.isocpp.org/mailman/listinfo.cgi/core</a><br>
Link to this post: <a href="http://lists.isocpp.org/core/2019/10/7541.php" rel="noreferrer noreferrer noreferrer" target="_blank">http://lists.isocpp.org/core/2019/10/7541.php</a><br>
</blockquote></div></div>
</blockquote></div></div>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" rel="noreferrer noreferrer" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer noreferrer noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div></div>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" rel="noreferrer noreferrer" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer noreferrer noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div>