[SG16-Unicode] NL 029 : Disallow zero-width and control characters
Corentin
corentin.jabot at gmail.com
Fri Oct 25 08:58:37 CEST 2019
On Fri, Oct 25, 2019, 02:18 Zach Laine <whatwasthataddress at gmail.com> wrote:
> Is this a real problem that is biting people right now? Are people using
> these characters in identifiers and causing great upheaval? This seems of
> the lowest possible priority to me, and not at all C++20-related.
>
Completely agree, with both of you.
I would be deeply unsatisfied with a solution that would:
* Not follow TR31 recommandations
* Not address the fact that you can only have Unicode identifiers if the
compiler knows that your file id
>
> Zach
>
> On Thu, Oct 24, 2019 at 5:25 PM Steve Downey <sdowney at gmail.com> wrote:
>
>> SG16 has an NB comment to deal with! Tom has already scheduled it for
>> Belfast. It's basically that the list of allowed code points have some
>> interesting control characters like zero width joiners and RTL modifiers.
>>
>> https://github.com/cplusplus/nbballot/issues/28
>>
>> There's also an issue that JF raised earlier:
>> https://github.com/sg16-unicode/sg16/issues/48
>> Improve support for Unicode characters in identifiers
>>
>> Relevant unicode standard:
>> https://unicode.org/reports/tr31/ UNICODE IDENTIFIER AND PATTERN SYNTAX
>>
>> Which is complicated because it allows things like identifiers written in
>> Farsi which requires zwj for disambiguation, and suggests regex to detect
>> particular allowed identifiers. It's fairly dense, and I haven't digested
>> it yet, but it looks like there might be allowed ways to exclude that.
>>
>> Plus tailoring would be needed because C++ disallows some characters such
>> as '$' which might otherwise be allowed. This is also discussed in TR31.
>>
>>
>> My feeling on the comment is that it's not a new issue for C++20, so it's
>> not clear that it has to be fixed for C++20. I believe it should be fixed,
>> but it ought to be fixed in a principled manner, and that likely means
>> TR31.
>>
>> We would also have to discuss if emoji are allowed in identifiers. TR31
>> does not strictly disallow them. The TonyTable shall be interesting.
>>
>>
>>
>> _______________________________________________
>> SG16 Unicode mailing list
>> Unicode at isocpp.open-std.org
>> http://www.open-std.org/mailman/listinfo/unicode
>>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/unicode
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20191025/9c6ef33d/attachment.html
More information about the Unicode
mailing list