[SG16-Unicode] Unicode streams

Tom Honermann tom at honermann.net
Sun Oct 20 22:03:17 CEST 2019


On 10/19/19 12:53 PM, Victor Zverovich wrote:
> STATICALLY-WIDEN is not a hack, it's just a convenient pseudo-function 
> that simplifies and formalizes a bunch of wording originated from 
> p0355. In case of chrono it will work identically for Unicode and the 
> only reason we didn't mention charX_t is that neither std::format nor 
> iostreams support those. As Corentin wrote you could provide formatter 
> specializations including for Unicode types and once we have charX_t 
> overloads of std::format (I plan to propose these for C++23) it will 
> just pick up those specialization. In your case STATICALLY-WIDEN won't 
> help because you have different representation of units for different 
> character types.

I think STATICALLY-WIDEN will work for his use case as he intends to 
provide distinct partial specializations for char, wchar_t, and 
charN_t.  Basically, Mat needs a STATICALLY-WIDEN-UTF variant that takes 
a string literal of some kind (presumably UTF-8) and "widens" it to 
UTF-16 or UTF-32.  Note the char partial specialization below (that I 
think should not be parameterized on CharT, just Traits).

What isn't clear to me is how implementors will implement 
STATICALLY-WIDEN.  Victor, do you know what techniques implementors are 
expected to employ?

Tom.

>
> Cheers,
> Victor
>
> On Thu, Oct 17, 2019 at 12:21 PM Mateusz Pusz <mateusz.pusz at gmail.com 
> <mailto:mateusz.pusz at gmail.com>> wrote:
>
>     Hi everyone,
>
>     Right now I am in the process of designing and implementing a
>     Physical Units library that hopefully will be a start for having
>     such a feature in the C++ Standard Library. You can find more info
>     on the library here: https://github.com/mpusz/units.
>
>     Recently, I started to work on the text output of quantities.
>     Quantities consist of value and a unit symbol. The latter is a
>     perfect use case for Unicode. Consider:
>
>     10 us        vs  10 μs
>     2 kg*m/s^2   vs  2 kg⋅m/s²
>
>     Before C++20 we could get away with a hack by providing Unicode
>     characters to `char`-based types and streams, but with the
>     introduction of `char8_t` in C++20 it seems it will be a bigger
>     issue from now on. The library implementors will have to provide 2
>     separate implementations:
>     1. For `char`-based types (string_view, ostream) without Unicode signs
>     2. For Unicode char based types
>
>     However, there are a few issues here:
>     1. As of now, we do not have std::u8cout or even std::u8ostream.
>     So there is really no easy way to create and use a stream for
>     Unicode characters. So even if I implement
>
>     template<class CharT, class Traits>
>     friend std::basic_ostream<CharT, Traits>&
>     operator<<(std::basic_ostream<CharT, Traits>& os, const quantity& q)
>
>     correctly, we do not have an easy way to use it.
>
>     2. In order to implement the above, I could imagine such an
>     interface for a symbol prefix:
>
>     template<typename CharT, typename Traits, typename Prefix,
>     typename Ratio>
>     inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol;
>
>     and its partial specializations for different prefixes/ratios:
>
>     template<typename CharT, typename Traits>
>     inline constexpr std::basic_string_view<char, Traits>
>     prefix_symbol<char, Traits, si_prefix, std::micro> = "u";
>     template<typename CharT, typename Traits>
>     inline constexpr std::basic_string_view<CharT,
>     Traits> prefix_symbol<CharT, Traits, si_prefix, std::micro> =
>     u8"\u00b5";  // µ
>     template<typename CharT, typename Traits>
>     inline constexpr std::basic_string_view<CharT,
>     Traits> prefix_symbol<CharT, Traits, si_prefix, std::milli> = "m";
>
>     The problem is that the above code will not compile.
>     Specialization for all `CharT` will not be possible to be
>     initialized with a literal like "m". Also, there is no generic
>     mechanism to initialize all Unicode-based versions of the type
>     with the same literal as each of them requires a different prefix
>     (u8, u, U). Providing a specialization for every character type
>     here is going to be a nightmare for library authors.
>
>     To solve the second problem fmt and chrono defined something
>     called STATICALLY-WIDEN (http://wg21.link/time.general) but it
>     seems that it is more a specification hack rather than the
>     implementation technique. I call it a hack as it currently
>     addresses only `char` and `wchar_t` and does not mention Unicode
>     characters at all as of now.
>
>     Dear SG16 members, do you have any BKMs or suggestions on how to
>     write a library that is Unicode aware and safe in an easy and
>     approachable way? Should we strive to provide a nice-looking
>     representation of units for outputs that support Unicode (console,
>     files, etc) or should we, as ever before, just support only `char`
>     and `wchar_t` and ignore the existence of Unicode in C++?
>
>     Please keep in mind that the library is hoped to target C++23.
>
>     Best
>
>     Mat
>     _______________________________________________
>     SG16 Unicode mailing list
>     Unicode at isocpp.open-std.org <mailto:Unicode at isocpp.open-std.org>
>     http://www.open-std.org/mailman/listinfo/unicode
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/unicode


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20191020/be76b51b/attachment.html 


More information about the Unicode mailing list