<div dir="ltr"><div>> do you know what techniques
implementors are expected to employ?</div><div><br></div><div>One obvious technique that comes to mind is providing specializations of some class template parameterized on the character type with static members returning necessary strings (general solution). If wide and narrow encodings have compatible representations of characters used then all of this can be replaced with a simple copy. Might be worth checking what the date library does.</div><div><br></div><div>- Victor<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Oct 20, 2019 at 1:03 PM Tom Honermann <<a href="mailto:tom@honermann.net">tom@honermann.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>On 10/19/19 12:53 PM, Victor Zverovich
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>STATICALLY-WIDEN is not a hack, it's just a convenient
pseudo-function that simplifies and formalizes a bunch of
wording originated from p0355. In case of chrono it will work
identically for Unicode and the only reason we didn't mention
charX_t is that neither std::format nor iostreams support
those. As Corentin wrote you could provide formatter
specializations including for Unicode types and once we have
charX_t overloads of std::format (I plan to propose these for
C++23) it will just pick up those specialization. In your case
STATICALLY-WIDEN won't help because you have different
representation of units for different character types.</div>
</div>
</blockquote>
<p>I think STATICALLY-WIDEN will work for his use case as he intends
to provide distinct partial specializations for char, wchar_t, and
charN_t. Basically, Mat needs a STATICALLY-WIDEN-UTF variant that
takes a string literal of some kind (presumably UTF-8) and
"widens" it to UTF-16 or UTF-32. Note the char partial
specialization below (that I think should not be parameterized on
CharT, just Traits).<br>
</p>
<p>What isn't clear to me is how implementors will implement
STATICALLY-WIDEN. Victor, do you know what techniques
implementors are expected to employ?</p>
<p>Tom.<br>
</p>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Cheers,</div>
<div>Victor<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Oct 17, 2019 at 12:21
PM Mateusz Pusz <<a href="mailto:mateusz.pusz@gmail.com" target="_blank">mateusz.pusz@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Hi everyone,
<div><br>
</div>
<div>Right now I am in the process of designing and
implementing a Physical Units library that hopefully will
be a start for having such a feature in the C++ Standard
Library. You can find more info on the library here: <a href="https://github.com/mpusz/units" target="_blank">https://github.com/mpusz/units</a>.</div>
<div><br>
</div>
<div>Recently, I started to work on the text output of
quantities. Quantities consist of value and a unit symbol.
The latter is a perfect use case for Unicode. Consider:</div>
<div><br>
</div>
<div><font face="monospace" color="#660000">10 us vs
10 μs</font></div>
<div><font face="monospace" color="#660000">2 kg*m/s^2 vs
2 kg⋅m/s²<br>
</font></div>
<div><br>
</div>
<div>Before C++20 we could get away with a hack by providing
Unicode characters to `char`-based types and streams, but
with the introduction of `char8_t` in C++20 it seems it
will be a bigger issue from now on. The library
implementors will have to provide 2 separate
implementations:</div>
<div>1. For `char`-based types (string_view, ostream)
without Unicode signs</div>
<div>2. For Unicode char based types</div>
<div><br>
</div>
<div>However, there are a few issues here:</div>
<div>1. As of now, we do not have <font face="monospace">std::u8cout</font>
or even <font face="monospace">std::u8ostream</font>. So
there is really no easy way to create and use a stream for
Unicode characters. So even if I implement</div>
<div><br>
</div>
<div><font face="monospace" color="#660000">template<class
CharT, class Traits><br>
friend std::basic_ostream<CharT, Traits>&
operator<<(std::basic_ostream<CharT,
Traits>& os, const quantity& q)<br>
</font></div>
<div><br>
</div>
<div>correctly, we do not have an easy way to use it.</div>
<div><br>
</div>
<div>2. In order to implement the above, I could imagine
such an interface for a symbol prefix:</div>
<div><br>
</div>
<div><font face="monospace" color="#660000">template<typename
CharT, typename Traits, typename Prefix, typename
Ratio><br>
inline constexpr std::basic_string_view<CharT,
Traits> prefix_symbol;<br>
</font></div>
<div><br>
</div>
<div>and its partial specializations for different
prefixes/ratios:</div>
<div><br>
</div>
<div><font face="monospace" color="#660000">template<typename
CharT, typename Traits></font></div>
<div><font face="monospace" color="#660000">inline constexpr
std::basic_string_view<char, Traits> </font><span style="color:rgb(102,0,0);font-family:monospace">prefix_symbol<</span><span style="color:rgb(102,0,0);font-family:monospace">char,
Traits, </span><span style="color:rgb(102,0,0);font-family:monospace">si_prefix,
std::micro> = "u";</span></div>
<div><span style="color:rgb(102,0,0);font-family:monospace">template<typename
CharT, typename Traits></span><br>
</div>
<div><font face="monospace" color="#660000">inline constexpr
std::basic_string_view<CharT,
Traits> prefix_symbol<CharT, Traits, si_prefix,
std::micro> = u8"\u00b5"; // µ</font></div>
<div><font face="monospace" color="#660000">template<typename
CharT, typename Traits></font></div>
<div><font face="monospace" color="#660000">inline constexpr
std::basic_string_view<CharT,
Traits> prefix_symbol<CharT, Traits, si_prefix,
std::milli> = "m";</font></div>
<div><br>
</div>
<div>The problem is that the above code will not compile.
Specialization for all `CharT` will not be possible to be
initialized with a literal like "m". Also, there is no
generic mechanism to initialize all Unicode-based versions
of the type with the same literal as each of them requires
a different prefix (u8, u, U). Providing a specialization
for every character type here is going to be a nightmare
for library authors.</div>
<div><br>
</div>
<div>To solve the second problem fmt and chrono defined
something called STATICALLY-WIDEN (<a href="http://wg21.link/time.general" target="_blank">http://wg21.link/time.general</a>)
but it seems that it is more a specification hack rather
than the implementation technique. I call it a hack as it
currently addresses only `char` and `wchar_t` and does not
mention Unicode characters at all as of now.</div>
<div><br>
</div>
<div>Dear SG16 members, do you have any BKMs or suggestions
on how to write a library that is Unicode aware and safe
in an easy and approachable way? Should we strive to
provide a nice-looking representation of units for outputs
that support Unicode (console, files, etc) or should we,
as ever before, just support only `char` and `wchar_t` and
ignore the existence of Unicode in C++?</div>
<div><br>
</div>
<div>Please keep in mind that the library is hoped to target
C++23.</div>
<div><br>
</div>
<div>Best</div>
<div><br>
</div>
<div>Mat</div>
</div>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
SG16 Unicode mailing list
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a>
<a href="http://www.open-std.org/mailman/listinfo/unicode" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a>
</pre>
</blockquote>
<p><br>
</p>
</div>
</blockquote></div>