<div dir="auto">For formats that, for legacy reasons, support multiple encodings, the benefit is that iäthe BOM unambiguously signals UTF-8. For UTF-8-only formats, the benefit of not treating the BOM as an error is to allow authoring with tools designed for the kind of formats where the BOM actually signals UTF-8 relative to other possibilities.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Aug 19, 2019, 04:08 Tony V E <<a href="mailto:tvaneerd@gmail.com">tvaneerd@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Why bother then?<br>
<br>
What's the benefit of a BOM for UTF8? <br>
<br>
<br>
Sent from my BlackBerry portable Babbage Device<br>
Original Message <br>
From: Tom Honermann<br>
Sent: Sunday, August 18, 2019 8:06 PM<br>
To: Henri Sivonen<br>
Cc: <a href="mailto:ben.boeckel@kitware.com" target="_blank" rel="noreferrer">ben.boeckel@kitware.com</a>; <a href="mailto:unicode@isocpp.open-std.org" target="_blank" rel="noreferrer">unicode@isocpp.open-std.org</a><br>
Subject: Re: [SG16-Unicode] BOM in JSON (was: Re: SG16 meeting summary for July 31st, 2019)<br>
<br>
+ Ben.<br>
<br>
Thank you, Henri! This is very helpful!<br>
<br>
Ben, the context here is whether we’re ok with producers of the dependency format you specified producing a UTF-8 BOM. It looks like we should be ok to allow them to (optionally) do so given that both of the specs below allow consumers to remove one. <br>
<br>
Tom.<br>
<br>
> On Aug 15, 2019, at 2:12 PM, Henri Sivonen <<a href="mailto:hsivonen@hsivonen.fi" target="_blank" rel="noreferrer">hsivonen@hsivonen.fi</a>> wrote:<br>
> <br>
>> On Thu, Aug 15, 2019 at 5:16 AM Tom Honermann <<a href="mailto:tom@honermann.net" target="_blank" rel="noreferrer">tom@honermann.net</a>> wrote:<br>
>> - Are we ok with allowing a BOM (JSON doesn't permit one)?<br>
> <br>
> Consuming JSON from a byte source in the Web Platform only supports<br>
> UTF-8 but removes the BOM if there is one. There is no corresponding<br>
> authoring conformance requirement in the Infra Standard, but the<br>
> practical effect is that the BOM does not fail to parse but doesn't<br>
> signal anything.<br>
> <br>
> <a href="https://infra.spec.whatwg.org/#parse-json-from-bytes" rel="noreferrer noreferrer" target="_blank">https://infra.spec.whatwg.org/#parse-json-from-bytes</a><br>
> <br>
> The IETF wording requires producers to use UTF-8 without a BOM but<br>
> allows consumers to remove the BOM if it's there, so the Infra<br>
> Standard language and the IETF RFC are compatible on this point.<br>
> <br>
> <a href="https://tools.ietf.org/html/rfc8259#section-8.1" rel="noreferrer noreferrer" target="_blank">https://tools.ietf.org/html/rfc8259#section-8.1</a><br>
> <br>
> (Apologies if this distinction between producer and consumer<br>
> conformance requirements was already made in the meeting.)<br>
> <br>
> -- <br>
> Henri Sivonen<br>
> <a href="mailto:hsivonen@hsivonen.fi" target="_blank" rel="noreferrer">hsivonen@hsivonen.fi</a><br>
> <a href="https://hsivonen.fi/" rel="noreferrer noreferrer" target="_blank">https://hsivonen.fi/</a><br>
> _______________________________________________<br>
> SG16 Unicode mailing list<br>
> <a href="mailto:Unicode@isocpp.open-std.org" target="_blank" rel="noreferrer">Unicode@isocpp.open-std.org</a><br>
> <a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
<br>
_______________________________________________<br>
SG16 Unicode mailing list<br>
<a href="mailto:Unicode@isocpp.open-std.org" target="_blank" rel="noreferrer">Unicode@isocpp.open-std.org</a><br>
<a href="http://www.open-std.org/mailman/listinfo/unicode" rel="noreferrer noreferrer" target="_blank">http://www.open-std.org/mailman/listinfo/unicode</a><br>
</blockquote></div>