[SG16-Unicode] [isocpp-lib] [isocpp-lib-ext] [time.duration.io] : Is stream insertion behavior locale dependent when Period::type is micro?

Tom Honermann tom at honermann.net
Thu Nov 7 12:37:38 CET 2019


On 11/7/19 11:23 AM, Billy O'Neal (VC LIBS) wrote:
>
> > The library doesn't need to assume.  An example implementation 
> (ignoring support for non-char types) could be: […]
>
> That does not do the correct thing because the locale on the target is 
> often not the locale when compiling. At compile time we usually 
> consider our ‘execution character set’ to be the ASCII subset for 
> maximum resistance to changes in locale at runtime, but the compiler 
> will generally pass through more strict settings if the user has set them.
>
This is exactly why the original wording I proposed stated that the 
result is unspecified if the run-time locale encoding is not compatible 
with the encoding used for the execution character set.
>
> > I think the Windows 10 comment is only relevant with respect to the 
> run-time locale and choice of encoding for the console/terminal.  
> Execution character set is independent of both of those.
>
> It is dependent with both of those in that the choice of execution 
> character set is constrained by the environment in which the program 
> will run.
>
Indeed.  But if a programmer compiles their code with 
/execution-charset:utf-8, it seems a clear indication that they intend 
to constrain the environment in which the program is run to one that 
supports UTF-8 (e.g., Windows 10, with UTF-8 ACP, and the new Windows 
Terminal).  I recognize that such a deployment target is an uncommon 
reality today, but that is a direction to be encouraged.

Tom.

> Billy3
>
> ------------------------------------------------------------------------
> *From:* Tom Honermann <tom at honermann.net>
> *Sent:* Wednesday, November 6, 2019 10:58:17 PM
> *To:* Billy O'Neal (VC LIBS) <bion at microsoft.com>; 
> lib at lists.isocpp.org <lib at lists.isocpp.org>; Corentin 
> <corentin.jabot at gmail.com>
> *Cc:* C++ Library Evolution Working Group <lib-ext at lists.isocpp.org>; 
> unicode at isocpp.open-std.org <unicode at open-std.org>
> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] 
> [time.duration.io] : Is stream insertion behavior locale dependent 
> when Period::type is micro?
> On 11/6/19 10:20 PM, Billy O'Neal (VC LIBS) wrote:
>>
>> > That isn't what it (is intended to) say, nor how I read it.
>>
>> Then remove the qualifications about terminals or codecvt facets and 
>> talk only about the execution character set, and things are OK. (As 
>> Corentin’s PR does)
>>
>> > The intent of the wording was to allow Microsoft to use "µs" when 
>> the compiler is invoked with /execution-charset:utf-8 and to use "us" 
>> otherwise.
>>
>> Given that UTF-8 support is still a rarely used user opt-in at this 
>> time only available on recent versions of Windows 10, it isn’t an 
>> assumption the library is going to be able to make soon (i.e. the 
>> next decade)
>>
> The library doesn't need to assume.  An example implementation 
> (ignoring support for non-char types) could be:
>
> template<class traits, class Rep, class Period>void print_fancy_suffix(basic_ostream<char, traits>& os, const 
> duration<Rep, Period>& d){  static const char micro_sign[] = 
> "\u00B5s";  if (as_unsigned(micro_sign[0]) == 0xC2u &&      
> as_unsigned(micro_sign[1]) == 0xB5u)  {    // execution character set 
> smells like UTF-8.    os << d.count() << micro_sign;  } else {    // 
> execution character set smells like bad.os << d.count() << "us";  }}
>
> There are, of course, better ways to do this if the compiler has the 
> ability to inform the library what the execution character set really 
> is (e.g., a predefined macro).
>
> I'm not arguing for any particular choice on Microsoft's part.
>
> I think the Windows 10 comment is only relevant with respect to the 
> run-time locale and choice of encoding for the console/terminal.  
> Execution character set is independent of both of those.
>
> Tom.
>
>> Billy3
>>
>> ------------------------------------------------------------------------
>> *From:* Tom Honermann <tom at honermann.net> <mailto:tom at honermann.net>
>> *Sent:* Wednesday, November 6, 2019 5:38:34 PM
>> *To:* Billy O'Neal (VC LIBS) <bion at microsoft.com> 
>> <mailto:bion at microsoft.com>; lib at lists.isocpp.org 
>> <mailto:lib at lists.isocpp.org> <lib at lists.isocpp.org> 
>> <mailto:lib at lists.isocpp.org>; Corentin <corentin.jabot at gmail.com> 
>> <mailto:corentin.jabot at gmail.com>
>> *Cc:* C++ Library Evolution Working Group <lib-ext at lists.isocpp.org> 
>> <mailto:lib-ext at lists.isocpp.org>; unicode at isocpp.open-std.org 
>> <mailto:unicode at isocpp.open-std.org> <unicode at open-std.org> 
>> <mailto:unicode at open-std.org>
>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] 
>> [time.duration.io] : Is stream insertion behavior locale dependent 
>> when Period::type is micro?
>> On 11/6/19 5:30 PM, Billy O'Neal (VC LIBS) wrote:
>>>
>>> Corentin’s PR says “if char (the execution encoding) can always 
>>> represent µ for your implementation, use that. Otherwise use u.” 
>>> Which means on my implementation where char can’t always represent 
>>> such a thing as that is locale dependent. we will statically use u 
>>> (and µ for wchar_t); but an implementation that assumes char is 
>>> UTF-8 could use µ.
>>>
>>> The LWG issue’s PR says “if the stream can detect that it is 
>>> targeting a console or codecvt facet that don’t support µ, an 
>>> implementation may use u, otherwise they use µ”. But streams have no 
>>> means of doing that detection. (And the answer can even change if 
>>> someone changes the streambuf)
>>>
>> That isn't what it (is intended to) say, nor how I read it.  It 
>> states that the suffix is determined by the execution character set 
>> (the character set used for string literals and known at compile 
>> time); that is in the first sentence.  The second sentence 
>> acknowledges that if the native character set (the run-time locale 
>> dependent character set) lacks representation for the character, then 
>> all bets are off with regard to how the character is actually 
>> displayed (or converted by a codecvt facet).
>>
>> The intent of the wording was to allow Microsoft to use "µs" when the 
>> compiler is invoked with /execution-charset:utf-8 and to use "us" 
>> otherwise.
>>
>> Tom.
>>
>>> Billy3
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Tom Honermann <tom at honermann.net> <mailto:tom at honermann.net>
>>> *Sent:* Wednesday, November 6, 2019 5:14:18 PM
>>> *To:* Billy O'Neal (VC LIBS) <bion at microsoft.com> 
>>> <mailto:bion at microsoft.com>; lib at lists.isocpp.org 
>>> <mailto:lib at lists.isocpp.org> <lib at lists.isocpp.org> 
>>> <mailto:lib at lists.isocpp.org>; Corentin <corentin.jabot at gmail.com> 
>>> <mailto:corentin.jabot at gmail.com>
>>> *Cc:* C++ Library Evolution Working Group <lib-ext at lists.isocpp.org> 
>>> <mailto:lib-ext at lists.isocpp.org>; unicode at isocpp.open-std.org 
>>> <mailto:unicode at isocpp.open-std.org> <unicode at open-std.org> 
>>> <mailto:unicode at open-std.org>
>>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] 
>>> [time.duration.io] : Is stream insertion behavior locale dependent 
>>> when Period::type is micro?
>>> On 11/6/19 4:30 PM, Billy O'Neal (VC LIBS) wrote:
>>>>
>>>> > Please read the wording again. Note that it says that, if those 
>>>> conditions are true, then the result is unspecified.
>>>>
>>>> If “the wording” means the P/R of 
>>>> https://cplusplus.github.io/LWG/issue3314 
>>>> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033889953&sdata=gfv7uzxwY5Ol8guxD0C179G6xnDBcdbt2qA%2FVrE3AyU%3D&reserved=0>, 
>>>> the wording there implies that we must make some effort to 
>>>> determine that the condition is true, which in practice we cannot 
>>>> do because the interface between streams and streambufs is public.
>>>>
>>> Yes, that is the wording I meant.  The intent is to ensure the 
>>> implementation does *not* have to put forth such effort.  I don't 
>>> understand where such an implication is coming from, but that 
>>> wording has confused at least three experienced wordsmiths, so I 
>>> acknowledge there is an issue, but I don't understand what it is.
>>>
>>> I think it is important to say something here. Otherwise, one could 
>>> claim that the terminal failing to display "μs" because it is 
>>> configured for an incompatible encoding is non-conforming.  Well, to 
>>> the extent that the standard addresses such devices.
>>>
>>> Tom.
>>>
>>>> Corentin’s P/R below seems to not have this concern.
>>>>
>>>> Billy3
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* Lib <lib-bounces at lists.isocpp.org> 
>>>> <mailto:lib-bounces at lists.isocpp.org> on behalf of Tom Honermann 
>>>> via Lib <lib at lists.isocpp.org> <mailto:lib at lists.isocpp.org>
>>>> *Sent:* Wednesday, November 6, 2019 1:12:48 PM
>>>> *To:* Corentin <corentin.jabot at gmail.com> 
>>>> <mailto:corentin.jabot at gmail.com>
>>>> *Cc:* Tom Honermann <tom at honermann.net> <mailto:tom at honermann.net>; 
>>>> C++ Library Evolution Working Group <lib-ext at lists.isocpp.org> 
>>>> <mailto:lib-ext at lists.isocpp.org>; Library Working Group 
>>>> <lib at lists.isocpp.org> <mailto:lib at lists.isocpp.org>; 
>>>> unicode at isocpp.open-std.org <mailto:unicode at isocpp.open-std.org> 
>>>> <unicode at open-std.org> <mailto:unicode at open-std.org>
>>>> *Subject:* Re: [isocpp-lib] [SG16-Unicode] [isocpp-lib-ext] 
>>>> [time.duration.io] : Is stream insertion behavior locale dependent 
>>>> when Period::type is micro?
>>>> The intent of the wording is to say that implementors do *not* need 
>>>> to be aware of terminals or codecvt facets. Without this, the 
>>>> wording could be read that implementations must implement magic to 
>>>> make the character display correctly.
>>>>
>>>> Please read the wording again. Note that it says that, if those 
>>>> conditions are true, then the result is unspecified.
>>>>
>>>> Tom.
>>>>
>>>> On Nov 6, 2019, at 12:07 PM, Corentin <corentin.jabot at gmail.com 
>>>> <mailto:corentin.jabot at gmail.com>> wrote:
>>>>
>>>>> Then I would just say associated execution encoding with charT
>>>>>
>>>>> Extremely uncomfortable with involving stream, console or anything 
>>>>> else not known at compile time
>>>>>
>>>>> On Wed, 6 Nov 2019 at 04:51, Tom Honermann <tom at honermann.net 
>>>>> <mailto:tom at honermann.net>> wrote:
>>>>>
>>>>>     On 11/6/19 8:30 AM, Howard Hinnant wrote:
>>>>>>     You can comment the LWG issue (if you want) by emailing said comment tolwgchair at gmail.com  <mailto:lwgchair at gmail.com>, specifying which issue you wish to comment and supplying the comment.
>>>>>>
>>>>>>     Howard
>>>>>>
>>>>>>     On Nov 5, 2019, at 10:32 PM, Corentin via Lib-Ext<lib-ext at lists.isocpp.org>  <mailto:lib-ext at lists.isocpp.org>  wrote:
>>>>>>>     Not sure how to do that proceduraly but here is some alternative wording.
>>>>>>>     The "runtime" locale-tied encoding is *assumed to be* a super set of the execution encoding - to the extent the standard doesn't distinguish between the two
>>>>>>>
>>>>>>>
>>>>>>>     If Period::type is micro, but the <ins>abstract</ins> character <ins>µ , which has the universal character name </ins> U+00B5 cannot be represented in the <ins>execution</ins> encoding <del>used for</del><ins> associated with the character type </ins> charT, the unit suffix "us" is used instead of "µs".
>>>>>
>>>>>     Howard and I discussed the wording I proposed today and we're
>>>>>     now on the same page with regard to the intent.
>>>>>
>>>>>     With regard to Corentin's suggested wording above, "abstract
>>>>>     character" and "execution encoding" are not current terms in
>>>>>     the standard (well, the former is inherited from our reference
>>>>>     to the Unicode standard but is otherwise unused at present).
>>>>>     P1859R0
>>>>>     <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwg21.link%2Fp1859r0&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033899949&sdata=oRDqgPM%2BQYpE7tvZ%2FNdfTgdtQfJ4IlCfccsiCFj3aWU%3D&reserved=0>
>>>>>     does intend to standardize new terminology, but we don't yet
>>>>>     have consensus for what the new terms should be named.  I
>>>>>     think we should avoid using candidate names until we have such
>>>>>     consensus.
>>>>>
>>>>>     Tom.
>>>>>
>>>>>>>>     On Mon, 4 Nov 2019 at 15:42, Tom Honermann via Lib-Ext<lib-ext at lists.isocpp.org>  <mailto:lib-ext at lists.isocpp.org>  wrote:
>>>>>>>>     A new LWG issue was filed for this question today:
>>>>>>>>     -https://cplusplus.github.io/LWG/issue3314  <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcplusplus.github.io%2FLWG%2Fissue3314&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033899949&sdata=U5A%2BsZ8XsYQl6KIQpM%2FdifLb70Hs3igIHBHVdsMPFyI%3D&reserved=0>
>>>>>>>>
>>>>>>>>     This issue concerns the ostream inserters added for std::chrono::duration in C++20 and what the intended behavior is for a duration when period::type is micro.
>>>>>>>>
>>>>>>>>     [time.duration.io  <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftime.duration.io&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033909944&sdata=GX4dXIwJ%2FLbhh%2BIOPS8nm0WqPZDRGbW38BEd450UsFw%3D&reserved=0>]p4 states:
>>>>>>>>
>>>>>>>>
>>>>>>>>>     If Period​::​type is micro, but the character U+00B5 cannot be represented in the encoding used for charT,           the unit suffix "us" is used instead of "μs".
>>>>>>>>>
>>>>>>>>     The question is with regard to which one of the encodings used for charT is referred to here; the compile-time execution character set or the run-time locale dependent native character set?
>>>>>>>>
>>>>>>>>     The proposed resolution specifies that the compile-time execution character set is the intended one.  My expectation is that this aligns with existing implementations, but I haven't checked.
>>>>>>>>
>>>>>>>>     Tom.
>>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     Lib-Ext mailing list
>>>>>>>     Lib-Ext at lists.isocpp.org  <mailto:Lib-Ext at lists.isocpp.org>
>>>>>>>     Subscription:https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext  <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033909944&sdata=ChXa5r4gFfKFLSCp5W0r5KxJp2wQXITkyc%2Fl4qj7T%2FU%3D&reserved=0>
>>>>>>>     Link to this post:http://lists.isocpp.org/lib-ext/2019/11/13309.php  <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13309.php&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033919938&sdata=LvQSK0LbtvfCPYA%2BJEUGBQcc4xgqYrqIVOW%2BfzKZFNA%3D&reserved=0>
>>>>>>>     _______________________________________________
>>>>>>>     Lib-Ext mailing list
>>>>>>>     Lib-Ext at lists.isocpp.org  <mailto:Lib-Ext at lists.isocpp.org>
>>>>>>>     Subscription:https://lists.isocpp.org/mailman/listinfo.cgi/lib-ext  <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isocpp.org%2Fmailman%2Flistinfo.cgi%2Flib-ext&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033919938&sdata=HwWb%2F5ULhnKvs1vwyWfcE4fOrit5SFLKBLIyJp13VHA%3D&reserved=0>
>>>>>>>     Link to this post:http://lists.isocpp.org/lib-ext/2019/11/13325.php  <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.isocpp.org%2Flib-ext%2F2019%2F11%2F13325.php&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033929933&sdata=zNBIMvgu6Y7ljSTA37qaM%2Fs6n7hs4CKqXLplDGiQ0TY%3D&reserved=0>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     SG16 Unicode mailing list
>>>>>>     Unicode at isocpp.open-std.org  <mailto:Unicode at isocpp.open-std.org>
>>>>>>     http://www.open-std.org/mailman/listinfo/unicode  <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.open-std.org%2Fmailman%2Flistinfo%2Funicode&data=02%7C01%7Cbion%40microsoft.com%7C371c6ae112934c8e66ca08d7630cd250%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637086779033929933&sdata=ggFj6DMw%2FETMywUoNGjMBw1Fp5ZsWRJHDmCf05Kohtg%3D&reserved=0>
>>>>>
>>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20191107/74e3aaab/attachment-0001.html 


More information about the Unicode mailing list