[SG16-Unicode] code_unit_sequence and code_point_sequence

R. Martinho Fernandes rmf at rmf.io
Wed Jun 20 11:10:19 CEST 2018



On June 20, 2018 7:52:00 AM GMT+02:00, Lyberta <lyberta at lyberta.net> wrote:
>I idea that programmers won't need to.
>
>std::text t = u8"Hello";
>
>Type of text will be
>std::text<std::code_point_sequence<std::code_unit_sequence<std::utf8,
>    std::endian::native, std::no_bom>>>;
>
>Here is standard library has chosen native endianness and no reading or
>writing of BOM - a sane default. Then we provide helpers such as:
>
>auto t = std::make_text<std::endian::big, std::bom>(u8"Hello");
>
>Type of text will be
>std::text<std::code_point_sequence<std::code_unit_sequence<std::utf8,
>    std::endian::big, std::bom>>>;
>
>Here programmer has explicitly requested for BE with reading and
>writing
>of BOM. std::bom and std::no_bom are just placeholders, this should be
>an enum class.

I'm sorry, these examples are bonkers again. They are not convincing because you used UTF-8. What does big endian UTF-8 even mean? Can you write the same with e.g. the UTF-16 variants instead? That would make much better examples. I've been trying but I don't understand what e.g. this should mean:

auto t = std::make_text<std::endian::big, std::bom>(u"Hello");


More information about the Unicode mailing list