[SG16-Unicode] [ #embed_str ] Unicode Input

JeanHeyd Meneide phdofthehouse at gmail.com
Fri Nov 8 00:48:19 CET 2019


On Wed, Nov 6, 2019 at 10:28 PM Thiago Macieira <thiago at macieira.org> wrote:

> On Wednesday, 6 November 2019 12:34:23 PST JeanHeyd Meneide wrote:
> > It is not exactly trivial for #embed or #embed_str. #embed generates a
> > brace-delimeted list of the bytes. It's as if the contents are directly
> > replaced by:
> >
> >      { 102, 111, 111 }
> >
> >      You cannot "just append" a null terminator in there, so it would
> > require a copy. If that's okay (copying things), then we can throw
> > #embed_str out the window. As far as requiring bytes, you would need to
> > generate a brace-delimeted list with all of the entries cast to the right
> > type, because each of those entries is not trivially convertible to a
> > std::byte: https://godbolt.org/z/NRkSfK
>
> It's easy to add the terminating null with constexpr. And that function
> should
> be provided. Similarly, it should be easy to concatenate such arrays.
>

Arrays in C++ (and C) do not have any syntax or behavior for compile-time
concatenation. String literals get away with it by having "foo" "bar" be
acceptable syntax, meaning someone could add a null terminator with "\0"
for #embed_str, but not #embed.

It should be easy to import non-terminated byte data, null-terminated byte
> data and UTF-8 text.
>
> SG16 should also provide a way to constexpr-time convert UTF-8 text to
> UTF-16
> or UTF-32
>

That is something I am already working on (and a separate proposal); all of
the UTF8/16/32 encoding objects are constexpr, and one of Corentin's
upcoming papers is a consteval ways to detect the compile-time literal
encoding. That should be enough.

I think this is highlighting that #embed is the only thing we need, and
that #embed_str only real benefit is a null terminating code unit and that
there should be better ways to provide that to the user.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20191107/da46fe42/attachment.html 


More information about the Unicode mailing list