<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 07/06/2018 05:37 PM, Hubert Tong
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACvkUqawz8er5ToDEYn57+T8RxKSeJgkPfL7be3w1jA__iTtXw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Fri, Jul 6, 2018 at 5:31 PM, Tom
Honermann <span dir="ltr"><<a
href="mailto:tom@honermann.net" target="_blank"
moz-do-not-send="true">tom@honermann.net</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">On 07/06/2018 05:16 PM, Hubert Tong wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I am wondering if accepting U+(4-6 hex digits) in
\N{...} as Perl does can be considered.<br>
</blockquote>
<br>
</span>
It certainly can be, but what is the motivation given that
we already have \u and \U? Why is supporting both \u1234
and \N{U+1234} helpful?<span class="HOEnZb"><font
color="#888888"><br>
</font></span></blockquote>
Do stylistic choices count? I happen to like naming Unicode
characters as U+NNNN.<br>
</div>
</div>
</div>
</blockquote>
<br>
Certainly! Getting everyone to agree on a stylistic choice is
always fun though ;)<br>
<br>
<blockquote type="cite"
cite="mid:CACvkUqawz8er5ToDEYn57+T8RxKSeJgkPfL7be3w1jA__iTtXw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote"><br>
There is also a possible semantic difference to explore
between \u/\U and \N{U+...}:<br>
</div>
<div class="gmail_quote">The \N form should certainly require
that a character is assigned in Unicode; however, I think
assigning a more "raw" meaning to \u/\U could make sense.</div>
</div>
</div>
</blockquote>
<br>
I think you might be on to something here. Martinho was recently
lamenting the following wording from [lex.ccon]p9
(<a class="moz-txt-link-freetext" href="http://eel.is/c++draft/lex.ccon#9">http://eel.is/c++draft/lex.ccon#9</a>):<br>
<br>
> A <i>universal-character-name</i> is translated to the
encoding, in the appropriate
execution character set, of the character named. <b>If there is no
such
encoding, the </b><b><i>universal-character-name</i></b><b> is
translated to an
</b><b><span class="indexparent"></span></b><b>implementation-defined
encoding</b><b>.</b> ...<br>
<br>
Specifically, he observed that translation to some implementation
defined representation (presumably some replacement character) is
actively harmful. Making such mappings ill-formed would catch
problems that can, and should, be diagnosed at compile-time. We
could, of course, consider a change to the wording above, but that
would have backward compatibility impact. Your suggestion of
different semantics could allow us to retain the current
implementation-defined behavior for \u1234, but make \N{U+1234}
ill-formed if the target encoding can't represent U+1234. Good
justification for your stylistic preference? ;)<br>
<br>
Tom.<br>
</body>
</html>