<br><br><div class="gmail_quote"><div dir="ltr">On Wed, Aug 14, 2019, 5:59 PM Davis Herring <<a href="mailto:herring@lanl.gov">herring@lanl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> u8"é" is ambiguous. Both people and the compiler may interpret that in a<br>
> variety of ways. Notably if I have utf-8 in that file, which I wrote on<br>
> Linux, but then the msvc compiler thinks it's windows 1252...<br>
> Mojibake.<br>
<br>
We have a recursive example of bytes/characters confusion here. If you <br>
want to say that the bytes 75 38 22 c3 a9 22 (because you "have utf-8 in <br>
that file") are ambiguous, of course they are, but so is 5c 41 unless <br>
you restrict to ASCII/Latin-*/UTF-8. You always have to arrange for <br>
your compiler to know which characters are signified by the bytes in <br>
your source file, and having some of them be non-ASCII doesn't <br>
fundamentally change anything (even though in practice it makes it harder).<br>
<br>
Your message doesn't contain those bytes anyway; since it contains a header<br>
<br>
Content-Type: text/plain; charset="UTF-8"<br>
<br>
it's appropriate to say that you wrote 5 (abstract) characters: LATIN <br>
SMALL LETTER U, DIGIT EIGHT, QUOTATION MARK, LATIN SMALL LETTER E WITH <br>
ACUTE, and QUOTATION MARK again. (Of course, you could also have <br>
written LATIN SMALL LETTER E and COMBINING ACUTE ACCENT; that's a <br>
different sort of ambiguity.)<br></blockquote></div><div><br></div><div>Yet there was no ambiguity because as you mentioned the encoding information was not lost.</div><div>But yes, I have a tendency to assume utf8 :/</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Davis<br>
<br>
-- <br>
This product is sold by volume, not by mass. If it appears too dense or <br>
too sparse, it is because mass-energy conversion has occurred during <br>
shipping.<br>
</blockquote></div>