[SG16-Unicode] BOM in JSON (was: Re: SG16 meeting summary for July 31st, 2019)

Henri Sivonen hsivonen at hsivonen.fi
Mon Aug 19 20:36:38 CEST 2019


On Mon, Aug 19, 2019, 15:30 Ben Boeckel <ben.boeckel at kitware.com> wrote:

> On Mon, Aug 19, 2019 at 08:16:26 +0300, Henri Sivonen wrote:
> > For formats that, for legacy reasons, support multiple encodings, the
> > benefit is that iäthe BOM unambiguously signals UTF-8. For UTF-8-only
> > formats, the benefit of not treating the BOM as an error is to allow
> > authoring with tools designed for the kind of formats where the BOM
> > actually signals UTF-8 relative to other possibilities.
>
> The format specifies that it only accepts UTF-8. Within that context, is
> it sensible to expect implementations handle a BOM? Remember that it is
> mostly a format between tools and it is JSON because being able to debug
> it is very useful (without mandating even more code for tools to inspect
> yet another container format). These things should not be written by
> hand or edited manually, so what does one gain by allowing an encoded
> BOM?
>

Presumably the reason to use JSON instead of a custom format is to make the
format consumable with JSON libraries. Therefore, it makes sense for it not
to profile JSON but to work with off-the-shelf libraries. I haven't
actually surveyed JSON libraries for UTF-8 BOM acceptance, but there are
three reasons why UTF-8 BOM acceptance makes sense for a general-purpose
JSON parsing library:

1. Compatibility with Windows-ish text editors for those JSON formats that
_are_ edited with text editors.
2. Consistency with Web browsers.
3. Doing the MAY from the RFC aligns with Postel's Law (which admittedly
has lost quite a bit of its charm).

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190819/3bf3ab57/attachment.html 


More information about the Unicode mailing list