[SG16-Unicode] P1208R3 / source_location

Axel Naumann Axel.Naumann at cern.ch
Wed Feb 20 01:34:26 CET 2019


Hi,

I believe this awfulness reflects reality.

Use ASCII printable characters and all will be fine? :)

Axel.

On 19.02.19 14:31, Robert Douglas wrote:
> So filename and functionname would neccessarily have different
> encodings? Does that not seem awful?
>
> On Tue, Feb 19, 2019, 6:25 PM Axel Naumann <Axel.Naumann at cern.ch
> <mailto:Axel.Naumann at cern.ch>> wrote:
>
>     Thanks everyone, this is what I'll take to Core.
>     Axel.
>
>     On 19.02.19 13:58, Corentin wrote:
>>     After talking with Tom, I'd like to modify function_name to be a
>>     NTMBS as it is something we can actually guarantee and I don't
>>     think __func__ should constrain the design of source location. It
>>     would consistent with thTstatisfy the NB comment (whose
>>     resolution was adopted in that direction this morning)
>>
>>     Tom convinced me that filename cannot and should not be a NTMBS
>>
>>
>>     On Tue, 19 Feb 2019 at 13:22 Robert Douglas <rwdougla at gmail.com
>>     <mailto:rwdougla at gmail.com>> wrote:
>>
>>         Agree.
>>
>>         On Tue, Feb 19, 2019 at 5:17 PM Tom Honermann
>>         <tom at honermann.net <mailto:tom at honermann.net>> wrote:
>>
>>             On 2/18/19 1:17 PM, Robert Douglas wrote:
>>>             Historical footnote, these are intended to be as drop-in
>>>             as possible for existing facilities. __FILE__ is a
>>>             "character string literal," which gets it's null
>>>             termination in phase 7. Since we are accessing these at
>>>             run-time, we should thus expect these to be NTBS.
>>>             Changes to this expectation would be a deviation from
>>>             these being a drop-in replacement to __FILE__ and
>>>             __func__. Note that [dcl.fct.def.general]
>>>              p 8 defines __func__ as an implementation-defined
>>>             string as if static const char __func__[] =
>>>             "function-name "; which implies, also, an NTBS. This is
>>>             the reasoning for NTBS. To do otherwise, would deviate
>>>             this feature from __FILE__ and __func__, which it is
>>>             designed to replace.
>>
>>             Agreed.  Certainly guaranteeing that these have a null
>>             terminator is required given that file_name() returns
>>             const char*.  I don't agree with associating these with
>>             NTMBSs though since multi-byte has encoding implications.
>>
>>             Tom.
>>
>>>
>>>
>>>             On Mon, Feb 18, 2019 at 11:20 AM Corentin
>>>             <corentin.jabot at gmail.com
>>>             <mailto:corentin.jabot at gmail.com>> wrote:
>>>
>>>                 Quick reply : display only, no expectation the file
>>>                 can be open, or exists, or is a file. It's purely
>>>                 informative. But expectation it can be displayed,
>>>                 the main use cases being logging. Otherwise I agree
>>>                 with you.
>>>
>>>                 On Mon, Feb 18, 2019, 7:16 AM Tom Honermann
>>>                 <tom at honermann.net <mailto:tom at honermann.net>> wrote:
>>>
>>>
>>>                     On Feb 18, 2019, at 10:04 AM, Corentin
>>>                     <corentin.jabot at gmail.com
>>>                     <mailto:corentin.jabot at gmail.com>> wrote:
>>>
>>>>
>>>>                     Very good points. 
>>>>                     Wouldn't it be sufficient to specify that the
>>>>                     strings are NTMBS encoded using the execution
>>>>                     character set?
>>>>                     source_location currently avoids making any
>>>>                     assumption about how these strings are formed,
>>>>                     including that they are derived from a source file.
>>>>                     So since the value is implementation-defined,
>>>>                     so should be the way it's constructed. 
>>>>                     However, it is reasonable to assume that these
>>>>                     things are valid text and therefore have a
>>>>                     known encoding.
>>>>
>>>>                     Adding Tom, because this is borderline SG16
>>>>                     territory. 
>>>
>>>                     This isn’t borderline as we have (recently)
>>>                     requested review of anything involving file names. 
>>>
>>>>
>>>>
>>>>                     @Tom: Do you want to see source_location this
>>>>                     week knowing that I'd hope it would get through
>>>>                     LWG before the end of the week?
>>>>                     Or do you think having function_name / filename
>>>>                     as multi-bytes strings encoded using the
>>>>                     execution character set is reasonable?
>>>>                     The alternative I see are
>>>>
>>>>                       * Leave it unspecified
>>>>                       * Force a specific character set... which the
>>>>                         world is not ready for
>>>>
>>>                     I think there is a higher level question to
>>>                     answer. Are the provided file names display
>>>                     only, or should one expect to be able to open
>>>                     the file using the provided name?
>>>
>>>                     If they are display only, then we can specify an
>>>                     encoding for them similarly to what is done for
>>>                     member functions of std::filesystem::path. In
>>>                     this case, we must explicitly acknowledge that
>>>                     the names do not roundtrip through the
>>>                     filesystem (though typically will in practice).
>>>                     Note that, on Windows, file names cannot be
>>>                     represented accurately using char based strings,
>>>                     so unless we want to add wchar_t support, these
>>>                     names will be technically display only. 
>>>
>>>                     If they are potentially not display only, then
>>>                     we can’t associate an encoding and the names are
>>>                     bags-of-bytes. This is a limitation of POSIX.
>>>                     But then we need wchar_t support for Windows. 
>>>
>>>                     In San Diego, the guidance we gave for the
>>>                     stacktrace proposal is that file names are
>>>                      implementation defined bags-of-bytes. If we
>>>                     advised otherwise for source location, we would
>>>                     be giving inconsistent guidance. 
>>>
>>>                     I think we should discuss this in SG16 this
>>>                     week. Not necessarily to propose changes for the
>>>                     proposal, but to solidify our collective
>>>                     thinking around file names. 
>>>
>>>                     Tom. 
>>>>
>>>>                     Thanks, 
>>>>                     Corentin
>>>>
>>>>
>>>>
>>>>                     On Mon, 18 Feb 2019 at 03:56 Axel Naumann
>>>>                     <Axel.Naumann at cern.ch
>>>>                     <mailto:Axel.Naumann at cern.ch>> wrote:
>>>>
>>>>                         Hi Robert,
>>>>
>>>>                         Regarding your P1208R3:
>>>>
>>>>                         Nit: it's titled "D1208R3", it doesn't
>>>>                         mention email addresses.
>>>>
>>>>                         Not-so-nit: a NB comment on the reflection
>>>>                         TS asks to not use NTBS but
>>>>                         NTMBS and "Where NTBS is mentioned in the
>>>>                         document under ballot, the
>>>>                         encoding used for the string’s value is
>>>>                         unspecified." Jens agrees that
>>>>                         the proposed solution should be applied:
>>>>                         "Specify that the strings are
>>>>                         first formed using the basic source
>>>>                         character set (with
>>>>                         universal-character-names as necessary)
>>>>                         then mapped in the manner
>>>>                         applied to string literals with no encoding
>>>>                         prefix in phases 5 and 6 of
>>>>                         translation."
>>>>
>>>>                         I would very much hope that both changes
>>>>                         are also applied to P1208R3. I
>>>>                         call this out explicitly in our recommended
>>>>                         NB comment response paper.
>>>>
>>>>                         Cheers, Axel.
>>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190219/d00e52a9/attachment-0001.html 


More information about the Unicode mailing list