[SG16-Unicode] Do we really need basic_text_view?

Tom Honermann tom at honermann.net
Sat Aug 4 21:39:29 CEST 2018


On 08/03/2018 10:41 PM, Lyberta wrote:

>
>> I think the type aliases are useful for non-deduced contexts.  For
>> example, when declaring function parameters.
> Right, then we need some good names. I think we should break the
> convention established by basic_string. I suggest these:
>
> ecs_text_view, wecs_text_view, utf8_text_view, utf16_text_view,
> utf32_text_view. That is assuming the paper that establishes UTF-16 and
> UTF-32 as encoding for char16/32_t literals is accepted.

Strong motivation would be needed to break with existing conventions.  
Support for CTAD might be enough to consider renaming 'basic_text_view' 
to 'text_view' and renaming the 'text_view' type alias to 'ntext_view', 
but I think such naming decisions should be made with LEWG guidance.  I 
don't see motivation for breaking with the common 'w', 'u8', 'u16', and 
'u32' prefixed names.

>
>> I don't think it is feasible to avoid the execution character encoding
>> given that it is the encoding used for I/O.  Eventually, we may be able
>> to add I/O interfaces that implicitly transcode at program boundaries,
>> but we don't have that yet.  I think beginners should be able to write
>> hello world without having to (explicitly) deal with transcoding.  For
>> many applications, the execution character encoding is the right
>> encoding to target.
> I think we should carefully consider what a modern I/O library should
> look like and then design for it. I think I/O should be in terms of
> std::byte. I hope integers will be 2s complement soon so serialization
> of integers won't be a problem. Since code units are just integers, we
> should just work on top of that.

I don't think redefining I/O in terms of std::byte would help solve text 
related problems.  For console based programs, stdin and stdout will 
continue to have an associated encoding that is necessarily determined 
(for interoperability purposes) by the environment the program is 
running in.  We could, of course, design an I/O library that implicitly 
transcodes from the externally determined encoding to a program 
determined internal encoding.  Whether that would be a good thing to do 
or not is not something I've developed strong opinions about yet.  There 
are significant challenges here since native I/O on most platforms uses 
the execution character encoding, but Windows' native I/O uses the wide 
execution character encoding (narrow interfaces implicitly transcode; in 
ways that don't always work as expected).  Bridging these differences 
may require defining a "native" or "system" encoding that is used for 
stdin, stdout, environment variables, command line options, etc...  
Separate encodings may be necessary for file names and text file 
contents since those may differ from other I/O.

Tom.


More information about the Unicode mailing list