[SG16-Unicode] P1208R3 / source_location

Tom Honermann tom at honermann.net
Wed Feb 20 00:17:07 CET 2019


On 2/18/19 1:17 PM, Robert Douglas wrote:
> Historical footnote, these are intended to be as drop-in as possible 
> for existing facilities. __FILE__ is a "character string literal," 
> which gets it's null termination in phase 7. Since we are accessing 
> these at run-time, we should thus expect these to be NTBS. Changes to 
> this expectation would be a deviation from these being a drop-in 
> replacement to __FILE__ and __func__. Note that [dcl.fct.def.general]
>  p 8 defines __func__ as an implementation-defined string as if static 
> const char __func__[] = "function-name "; which implies, also, an 
> NTBS. This is the reasoning for NTBS. To do otherwise, would deviate 
> this feature from __FILE__ and __func__, which it is designed to replace.

Agreed.  Certainly guaranteeing that these have a null terminator is 
required given that file_name() returns const char*.  I don't agree with 
associating these with NTMBSs though since multi-byte has encoding 
implications.

Tom.

>
>
> On Mon, Feb 18, 2019 at 11:20 AM Corentin <corentin.jabot at gmail.com 
> <mailto:corentin.jabot at gmail.com>> wrote:
>
>     Quick reply : display only, no expectation the file can be open,
>     or exists, or is a file. It's purely informative. But expectation
>     it can be displayed, the main use cases being logging. Otherwise I
>     agree with you.
>
>     On Mon, Feb 18, 2019, 7:16 AM Tom Honermann <tom at honermann.net
>     <mailto:tom at honermann.net>> wrote:
>
>
>         On Feb 18, 2019, at 10:04 AM, Corentin
>         <corentin.jabot at gmail.com <mailto:corentin.jabot at gmail.com>>
>         wrote:
>
>>
>>         Very good points.
>>         Wouldn't it be sufficient to specify that the strings are
>>         NTMBS encoded using the execution character set?
>>         source_location currently avoids making any assumption about
>>         how these strings are formed, including that they are derived
>>         from a source file.
>>         So since the value is implementation-defined, so should be
>>         the way it's constructed.
>>         However, it is reasonable to assume that these things are
>>         valid text and therefore have a known encoding.
>>
>>         Adding Tom, because this is borderline SG16 territory.
>
>         This isn’t borderline as we have (recently) requested review
>         of anything involving file names.
>
>>
>>
>>         @Tom: Do you want to see source_location this week knowing
>>         that I'd hope it would get through LWG before the end of the
>>         week?
>>         Or do you think having function_name / filename as
>>         multi-bytes strings encoded using the execution character set
>>         is reasonable?
>>         The alternative I see are
>>
>>           * Leave it unspecified
>>           * Force a specific character set... which the world is not
>>             ready for
>>
>         I think there is a higher level question to answer. Are the
>         provided file names display only, or should one expect to be
>         able to open the file using the provided name?
>
>         If they are display only, then we can specify an encoding for
>         them similarly to what is done for member functions of
>         std::filesystem::path. In this case, we must explicitly
>         acknowledge that the names do not roundtrip through the
>         filesystem (though typically will in practice). Note that, on
>         Windows, file names cannot be represented accurately using
>         char based strings, so unless we want to add wchar_t support,
>         these names will be technically display only.
>
>         If they are potentially not display only, then we can’t
>         associate an encoding and the names are bags-of-bytes. This is
>         a limitation of POSIX. But then we need wchar_t support for
>         Windows.
>
>         In San Diego, the guidance we gave for the stacktrace proposal
>         is that file names are  implementation defined bags-of-bytes.
>         If we advised otherwise for source location, we would be
>         giving inconsistent guidance.
>
>         I think we should discuss this in SG16 this week. Not
>         necessarily to propose changes for the proposal, but to
>         solidify our collective thinking around file names.
>
>         Tom.
>>
>>         Thanks,
>>         Corentin
>>
>>
>>
>>         On Mon, 18 Feb 2019 at 03:56 Axel Naumann
>>         <Axel.Naumann at cern.ch <mailto:Axel.Naumann at cern.ch>> wrote:
>>
>>             Hi Robert,
>>
>>             Regarding your P1208R3:
>>
>>             Nit: it's titled "D1208R3", it doesn't mention email
>>             addresses.
>>
>>             Not-so-nit: a NB comment on the reflection TS asks to not
>>             use NTBS but
>>             NTMBS and "Where NTBS is mentioned in the document under
>>             ballot, the
>>             encoding used for the string’s value is unspecified."
>>             Jens agrees that
>>             the proposed solution should be applied: "Specify that
>>             the strings are
>>             first formed using the basic source character set (with
>>             universal-character-names as necessary) then mapped in
>>             the manner
>>             applied to string literals with no encoding prefix in
>>             phases 5 and 6 of
>>             translation."
>>
>>             I would very much hope that both changes are also applied
>>             to P1208R3. I
>>             call this out explicitly in our recommended NB comment
>>             response paper.
>>
>>             Cheers, Axel.
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190219/efc114e0/attachment.html 


More information about the Unicode mailing list