<div dir="ltr">Awesome, thanks!<div><br></div><div>Just please note that this is not a thread about the Physical Units library in general. For this, we have one already on the SG6 reflector started after the evening session in Cologne. Also, I bring a big paper to Belfast about it (P1935R0) but due to some technical issues it did not land in the initial Belfast mailing. It should be added by Hal soon.</div><div><br></div><div>Let's scope on Unicode related issues here.<br><div><br></div><div>Best</div></div><div><br></div><div>Mat</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">pt., 18 paź 2019 o 11:17 Corentin Jabot <<a href="mailto:corentinjabot@gmail.com">corentinjabot@gmail.com</a>> napisał(a):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Also adding Vincent Reverdy who seems to be working in the same area (cf <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1930r0.pdf" target="_blank">http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1930r0.pdf</a> )</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 17 Oct 2019 at 22:30, Corentin Jabot <<a href="mailto:corentinjabot@gmail.com" target="_blank">corentinjabot@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Adding Victor directly<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 17 Oct 2019 at 21:21, Mateusz Pusz <<a href="mailto:mateusz.pusz@gmail.com" target="_blank">mateusz.pusz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi everyone,<div><br></div><div>Right now I am in the process of designing and implementing a Physical Units library that hopefully will be a start for having such a feature in the C++ Standard Library. You can find more info on the library here: <a href="https://github.com/mpusz/units" target="_blank">https://github.com/mpusz/units</a>.</div><div><br></div><div>Recently, I started to work on the text output of quantities. Quantities consist of value and a unit symbol. The latter is a perfect use case for Unicode. Consider:</div><div><br></div><div><font face="monospace" color="#660000">10 us vs 10 μs</font></div><div><font face="monospace" color="#660000">2 kg*m/s^2 vs 2 kg⋅m/s²<br></font></div><div><br></div><div>Before C++20 we could get away with a hack by providing Unicode characters to `char`-based types and streams, but with the introduction of `char8_t` in C++20 it seems it will be a bigger issue from now on. The library implementors will have to provide 2 separate implementations:</div><div>1. For `char`-based types (string_view, ostream) without Unicode signs</div><div>2. For Unicode char based types</div></div></blockquote><div><br></div><div>Yes, with the caveat that you can only output utf-8 to sink that expects it and conversion from Unicode to anything not Unicode will loose information</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>However, there are a few issues here:</div><div>1. As of now, we do not have <font face="monospace">std::u8cout</font> or even <font face="monospace">std::u8ostream</font>. So there is really no easy way to create and use a stream for Unicode characters. So even if I implement</div><div><br></div><div><font face="monospace" color="#660000">template<class CharT, class Traits><br>friend std::basic_ostream<CharT, Traits>& operator<<(std::basic_ostream<CharT, Traits>& os, const quantity& q)<br></font></div><div><br></div><div>correctly, we do not have an easy way to use it.</div><div><br></div><div>2. In order to implement the above, I could imagine such an interface for a symbol prefix:</div><div><br></div><div><font face="monospace" color="#660000">template<typename CharT, typename Traits, typename Prefix, typename Ratio><br>inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol;<br></font></div><div><br></div><div>and its partial specializations for different prefixes/ratios:</div><div><br></div><div><font color="#660000" face="monospace">template<typename CharT, typename Traits></font></div><div><font color="#660000" face="monospace">inline constexpr std::basic_string_view<char, Traits> </font><span style="color:rgb(102,0,0);font-family:monospace">prefix_symbol<</span><span style="color:rgb(102,0,0);font-family:monospace">char, Traits, </span><span style="color:rgb(102,0,0);font-family:monospace">si_prefix, std::micro> = "u";</span></div><div><span style="color:rgb(102,0,0);font-family:monospace">template<typename CharT, typename Traits></span><br></div><div><font color="#660000" face="monospace">inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol<CharT, Traits, si_prefix, std::micro> = u8"\u00b5"; // µ</font></div><div><font color="#660000" face="monospace">template<typename CharT, typename Traits></font></div><div><font color="#660000" face="monospace">inline constexpr std::basic_string_view<CharT, Traits> prefix_symbol<CharT, Traits, si_prefix, std::milli> = "m";</font></div><div><br></div><div>The problem is that the above code will not compile. Specialization for all `CharT` will not be possible to be initialized with a literal like "m". Also, there is no generic mechanism to initialize all Unicode-based versions of the type with the same literal as each of them requires a different prefix (u8, u, U). Providing a specialization for every character type here is going to be a nightmare for library authors.</div><div><br></div><div>To solve the second problem fmt and chrono defined something called STATICALLY-WIDEN (<a href="http://wg21.link/time.general" target="_blank">http://wg21.link/time.general</a>) but it seems that it is more a specification hack rather than the implementation technique. I call it a hack as it currently addresses only `char` and `wchar_t` and does not mention
Unicode characters
at all as of now.</div><div><br></div><div>Dear SG16 members, do you have any BKMs or suggestions on how to write a library that is Unicode aware and safe in an easy and approachable way? Should we strive to provide a nice-looking representation of units for outputs that support Unicode (console, files, etc) or should we, as ever before, just support only `char` and `wchar_t` and ignore the existence of Unicode in C++?</div></div></blockquote><div><br></div><div>I would forgo iostream and provide formatters for format.</div><div>All of that is locale specific (so the approach you describe above does not work in the general case, for example cm2 will be         τ.εκ. in greek [1])</div><div>Which means icu </div><div>The documentation is sparse [2], but you can play around with some test code</div><div><a href="https://github.com/unicode-org/icu/blob/e25796f6e545082af74f0017d55ec2d915c40a3d/icu4c/source/test/intltest/measfmttest.cpp" target="_blank">https://github.com/unicode-org/icu/blob/e25796f6e545082af74f0017d55ec2d915c40a3d/icu4c/source/test/intltest/measfmttest.cpp</a><br></div><div><br></div><div>OSX provide something similar <a href="https://developer.apple.com/documentation/foundation/nsmeasurementformatter?language=objc" target="_blank">https://developer.apple.com/documentation/foundation/nsmeasurementformatter?language=objc</a></div><div><br></div><div>It seems easy enough for simple units</div><div>For more complicated things that are compound units for example grams per cm2, the formatting might be a bit hairy</div><div><br></div><div>Ideally at a high level, </div><div><br></div><div>std::format(u8"{}", some_unit, std::locale("el_CY"));<br></div><div><br></div><div>would do the right thing.</div><div><br></div><div>I am not aware of SG-16 discussing measurements yet.</div><div><br></div><div>It's a bigger design space than just providing u8 overloads.</div><div>The question is not to provide a "nice" representation but the representation user expect in their preferred locale.</div><div>I don't think the committee should be in the business of specifying notation.</div><div><br></div><div><br></div><div>[1] <a href="https://www.unicode.org/cldr/charts/36/summary/root.html" target="_blank">https://www.unicode.org/cldr/charts/36/summary/root.html</a> You can explore the CLDR data to list units</div><div>[2] <a href="https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1MeasureFormat.html" target="_blank">https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1MeasureFormat.html</a></div><div><br></div><div><br></div><div>Sorry to drop a massive curve ball on you</div><div><br></div><div>Regards,</div><div><br></div><div>Corentin</div><div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>Please keep in mind that the library is hoped to target C++23.</div><div><br></div><div>Best</div><div><br></div><div>Mat</div></div>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div></div>
</blockquote></div>
</blockquote></div>