[SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

Tom Honermann tom at honermann.net
Thu Aug 15 14:34:30 CEST 2019


On 8/15/19 7:12 AM, Steve Downey wrote:
> Execution encoding is a term we use in conversation, it's not actually 
> a term in the standard. The standard speaks of execution character 
> sets, the values of which are determined by locale. Which locale is 
> not specified.

Indeed.  I just can't bring myself to use "character set" when the 
context calls for "encoding".  This is something else I'd like to clean 
up in the standard.

Tom.

>
> On Wed, Aug 14, 2019, 23:21 Tom Honermann via Core 
> <core at lists.isocpp.org <mailto:core at lists.isocpp.org>> wrote:
>
>     On 8/14/19 10:57 AM, Peter Dimov wrote:
>     > Tom Honermann wrote:
>     >> On 8/14/19 3:54 AM, Peter Dimov wrote:
>     >>> Tom Honermann wrote:
>     >>>
>     >>>>   I think we *might* be successful in using "execution
>     encoding" to
>     >>>> apply to both the compile-time and run-time encodings by
>     extending the
>     >>>> term with specific qualifiers; e.g., "presumed execution
>     encoding" and
>     >>>> "run-time/system/native execution encoding".
>     >>> This would be implying that there's a single "execution" or
>     "native"
>     >>> encoding, whereas there are many.
>     >>>
>     >>> - encoding used for character literals
>     >> I made the "presumed execution encoding" distinction
>     specifically for this
>     >> case.
>     > Right, and I am saying that calling all the encodings
>     "<adjective> execution
>     > encoding" implies that they are if not the same, then somehow
>     related, and
>     > they aren't.
>     Ok, that is a fair critique.
>     >
>     > I would call the encoding used for narrow character literals
>     "narrow literal
>     > encoding" and the encoding used for wide character literals
>     "wide literal
>     > encoding". This is what they are.
>
>     I feel some reluctance to changing a term that has been around for so
>     long, and this strikes me as too specific.  There are other
>     constructs
>     that are also encoded according to the (presumed) execution encoding.
>     For example source locations exposed via the __FILE__ macro, function
>     names exposed via __func__, etc..
>
>     We don't know at compile-time how encoded literals will be used at
>     run-time.  They may be passed to the locale sensitive character
>     conversion functions, used as filenames, written to a terminal,
>     etc...
>     All of these encodings are not known until run-time.  I kind of
>     like the
>     use of "presumed execution encoding" as indicating a compatible
>     subset
>     of all of the encodings used at run-time.
>
>     >
>     > "Execution encoding" made sense when a program was, say, written in
>     > Krasnoyarsk and intended to be executed in Kuala Lumpur. A
>     Krasnoyarsk
>     > machine used the Krasnoyarsk encoding for everything, and a
>     Kuala Lumpur
>     > machine used the Kuala Lumpur encoding for everything. Hence
>     source and
>     > execution.
>
>     It still very much makes sense when cross-compiling today.
>
>     Tom.
>
>     _______________________________________________
>     Core mailing list
>     Core at lists.isocpp.org <mailto:Core at lists.isocpp.org>
>     Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
>     Link to this post: http://lists.isocpp.org/core/2019/08/7062.php
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190815/4081ae68/attachment.html 


More information about the Unicode mailing list