[SG16-Unicode] P1208R3 / source_location

Axel Naumann Axel.Naumann at cern.ch
Mon Feb 18 18:51:49 CET 2019


Hi Tom,

> In San Diego, the guidance we gave for the stacktrace proposal is that
> file names are implementation defined bags-of-bytes.

How does that compare to

>> "Specify that the strings are
>>     first formed using the basic source character set (with
>>     universal-character-names as necessary) then mapped in the manner
>>     applied to string literals with no encoding prefix in phases 5
>>     and 6 oftranslation."

Or in other words, what does the wording look like for SG16's guidance?

Axel.

On 18.02.19 07:16, Tom Honermann wrote:
> 
> On Feb 18, 2019, at 10:04 AM, Corentin <corentin.jabot at gmail.com
> <mailto:corentin.jabot at gmail.com>> wrote:
> 
>>
>> Very good points. 
>> Wouldn't it be sufficient to specify that the strings are NTMBS
>> encoded using the execution character set?
>> source_location currently avoids making any assumption about how these
>> strings are formed, including that they are derived from a source file.
>> So since the value is implementation-defined, so should be the way
>> it's constructed. 
>> However, it is reasonable to assume that these things are valid text
>> and therefore have a known encoding.
>>
>> Adding Tom, because this is borderline SG16 territory. 
> 
> This isn’t borderline as we have (recently) requested review of anything
> involving file names. 
> 
>>
>>
>> @Tom: Do you want to see source_location this week knowing that I'd
>> hope it would get through LWG before the end of the week?
>> Or do you think having function_name / filename as multi-bytes strings
>> encoded using the execution character set is reasonable?
>> The alternative I see are
>>
>>   * Leave it unspecified
>>   * Force a specific character set... which the world is not ready for
> I think there is a higher level question to answer. Are the provided
> file names display only, or should one expect to be able to open the
> file using the provided name?
> 
> If they are display only, then we can specify an encoding for them
> similarly to what is done for member functions of std::filesystem::path.
> In this case, we must explicitly acknowledge that the names do not
> roundtrip through the filesystem (though typically will in practice).
> Note that, on Windows, file names cannot be represented accurately using
> char based strings, so unless we want to add wchar_t support, these
> names will be technically display only. 
> 
> If they are potentially not display only, then we can’t associate an
> encoding and the names are bags-of-bytes. This is a limitation of POSIX.
> But then we need wchar_t support for Windows. 
> 
> In San Diego, the guidance we gave for the stacktrace proposal is that
> file names are  implementation defined bags-of-bytes. If we advised
> otherwise for source location, we would be giving inconsistent guidance. 
> 
> I think we should discuss this in SG16 this week. Not necessarily to
> propose changes for the proposal, but to solidify our collective
> thinking around file names. 
> 
> Tom. 
>>
>> Thanks, 
>> Corentin
>>
>>
>>
>> On Mon, 18 Feb 2019 at 03:56 Axel Naumann <Axel.Naumann at cern.ch
>> <mailto:Axel.Naumann at cern.ch>> wrote:
>>
>>     Hi Robert,
>>
>>     Regarding your P1208R3:
>>
>>     Nit: it's titled "D1208R3", it doesn't mention email addresses.
>>
>>     Not-so-nit: a NB comment on the reflection TS asks to not use NTBS but
>>     NTMBS and "Where NTBS is mentioned in the document under ballot, the
>>     encoding used for the string’s value is unspecified." Jens agrees that
>>     the proposed solution should be applied: "Specify that the strings are
>>     first formed using the basic source character set (with
>>     universal-character-names as necessary) then mapped in the manner
>>     applied to string literals with no encoding prefix in phases 5 and
>>     6 of
>>     translation."
>>
>>     I would very much hope that both changes are also applied to
>>     P1208R3. I
>>     call this out explicitly in our recommended NB comment response paper.
>>
>>     Cheers, Axel.
>>


More information about the Unicode mailing list