<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 12/5/18 10:33 PM, Steve Downey
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAJEGDKq97e9PXwJ7ZNo=iK9WuUzyKNizozURc3EMMzO2UmDjPA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
All of the u8 strings I saw contained no escape sequences.
<div>Not that \u escapes would change the argument. They work
identically in source and explicit encoding.</div>
<div>Right now, u8"" means transcode from source encoding to UTF-8
rather than to execution encoding. </div>
<div>I suspect that there are often errors where if the source
encoding was not UTF-8, the result string would not be the
intended one. <br>
</div>
</blockquote>
<p>Would not be the intended one because the actual source encoding
doesn't match the encoding the compiler uses to read the source?
I'm not sure how to interpret "if the source encoding was not
UTF-8".<br>
</p>
<p>I think you're describing a situation something like this: Actual
source file encoding is UTF-8. Compiler reads the source as
"8-bit ASCII" and non-ASCII code unit values are just passed
through (since transcoding ASCII to UTF-8 is a no-op if not
checking for non-ASCII values), resulting in u8 literals happening
to have the UTF-8 contents the programmer expects despite the
source encoding mismatch. In this particular case though,
correcting the encoding mismatch would produce the same results
(for u8 literals and also for ordinary literals iff the presumed
execution encoding was also UTF-8).<br>
</p>
<p>Tom.<br>
</p>
<blockquote type="cite"
cite="mid:CAJEGDKq97e9PXwJ7ZNo=iK9WuUzyKNizozURc3EMMzO2UmDjPA@mail.gmail.com">
<div><br>
</div>
<div><br>
</div>
<div><br>
<br>
<div class="gmail_quote">
<div dir="ltr">On Wed, Dec 5, 2018, 22:19 Tom Honermann <<a
href="mailto:tom@honermann.net" moz-do-not-send="true">tom@honermann.net</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div class="m_-6168981040550869947moz-cite-prefix">On
12/5/18 8:31 PM, Markus Scherer wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">On Wed, Dec 5, 2018 at 3:34 PM Steve
Downey <<a href="mailto:sdowney@gmail.com"
target="_blank" moz-do-not-send="true">sdowney@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">How
many contain text that is not already UTF-8?</blockquote>
<div><br>
</div>
<div>I am not sure what you are asking. Most of the
u8"literals" I am seeing contain non-ASCII
characters. Many as literal characters, a bunch of
\uhhhh, and a few \U00hhhhhh.</div>
</div>
</div>
</blockquote>
<p>I was likewise uncertain about this question.<br>
</p>
<p>Steve, I'm guessing the question you're trying to get
at is, would there be any behavioral difference if the
u8 prefix was simply dropped? I think this is
equivalent to asking the question, are the source files
for these examples encoded as UTF-8 and is the compiler
invoked such that the source encoding and presumed
execution encoding are both UTF-8 (always the case for
Clang, the default for gcc unless -finput-charset or
-fexec-charset is used, and not the case for MSVC unless
/utf-8 is used).</p>
</div>
<div bgcolor="#FFFFFF" text="#000000">
<p>Tom.<br>
</p>
</div>
</blockquote>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>