[SG16-Unicode] Feedback on P1097R1: U+NNNNNN syntax

Tom Honermann tom at honermann.net
Sat Jul 7 02:33:54 CEST 2018


On 07/06/2018 05:37 PM, Hubert Tong wrote:
> On Fri, Jul 6, 2018 at 5:31 PM, Tom Honermann <tom at honermann.net 
> <mailto:tom at honermann.net>> wrote:
>
>     On 07/06/2018 05:16 PM, Hubert Tong wrote:
>
>         I am wondering if accepting U+(4-6 hex digits) in \N{...} as
>         Perl does can be considered.
>
>
>     It certainly can be, but what is the motivation given that we
>     already have \u and \U?  Why is supporting both \u1234 and
>     \N{U+1234} helpful?
>
> Do stylistic choices count? I happen to like naming Unicode characters 
> as U+NNNN.

Certainly!  Getting everyone to agree on a stylistic choice is always 
fun though ;)

>
> There is also a possible semantic difference to explore between \u/\U 
> and \N{U+...}:
> The \N form should certainly require that a character is assigned in 
> Unicode; however, I think assigning a more "raw" meaning to \u/\U 
> could make sense.

I think you might be on to something here.  Martinho was recently 
lamenting the following wording from [lex.ccon]p9 
(http://eel.is/c++draft/lex.ccon#9):

 > A /universal-character-name/ is translated to the encoding, in the 
appropriate execution character set, of the character named. *If there 
is no such encoding, the **/universal-character-name/**is translated to 
an ****implementation-defined encoding**.* ...

Specifically, he observed that translation to some implementation 
defined representation (presumably some replacement character) is 
actively harmful.  Making such mappings ill-formed would catch problems 
that can, and should, be diagnosed at compile-time.  We could, of 
course, consider a change to the wording above, but that would have 
backward compatibility impact.  Your suggestion of different semantics 
could allow us to retain the current implementation-defined behavior for 
\u1234, but make \N{U+1234} ill-formed if the target encoding can't 
represent U+1234.  Good justification for your stylistic preference? ;)

Tom.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20180706/a29f66df/attachment.html 


More information about the Unicode mailing list