[SG16-Unicode] BOM in JSON

Tom Honermann tom at honermann.net
Mon Aug 19 23:25:23 CEST 2019


On 8/19/19 4:52 PM, Tony V E wrote:
> https://en.wikipedia.org/wiki/Byte_order_mark#Usage
>
> There is some pertinent advice on that page.

Indeed, some of which would benefit from a citation :)

Tom.

> There is also a note that Visual Studio uses/used the BOM to see if a 
> file is UTF8 vs whatever else.
>
>
> On Mon, Aug 19, 2019 at 3:46 PM Ben Boeckel <ben.boeckel at kitware.com 
> <mailto:ben.boeckel at kitware.com>> wrote:
>
>     On Mon, Aug 19, 2019 at 22:25:05 +0300, Henri Sivonen wrote:
>     > On Mon, Aug 19, 2019 at 9:57 PM Ben Boeckel
>     <ben.boeckel at kitware.com <mailto:ben.boeckel at kitware.com>> wrote:
>     > > Notepad?
>     >
>     > Yes, Notepad. It's generally easier to make parsers of all kinds
>     (XML
>     > before, JSON later) accept the UTF-8 BOM than to fight Notepad.
>     It'll
>     > take a long time for the existing installed base to get replaced
>     with
>     > the newest:
>     https://mobile.twitter.com/JenMsft/status/1163474010509701120
>
>     BOMs only make sense in an at-rest storage backed JSON file that the
>     parser reads directly. Given a string, a JSON parser should
>     *certainly*
>     not accept a BOM leader.
>
>     Quick survey:
>
>         % echo $'\xEF\xBB\xBF{}' > bom.json
>
>       - jsoncpp: no mention of a BOM in the source, probably unhappy about
>         it
>       - jq: fine
>       - python3:
>         json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode
>     using utf-8-sig): line 1 column 1 (char 0)
>       - ruby:
>         /usr/share/ruby/json/common.rb:156:in `parse': 765: unexpected
>     token at '\xEF\xBB\xBF{}' (JSON::ParserError)
>       - C#: https://jimmybogard.com/the-curious-case-of-the-json-bom/
>
>     I don't know that BOM support is actually all that wide-spread in
>     readers based on this short survey. And the solution seems to be
>     "don't
>     write the BOM" where the problem is encountered.
>
>     I think those sticking to their notepad guns are just going to have to
>     wait for something better because waiting for the libraries to
>     catch up
>     (and the relevant fixes to be backported to declared minimum supported
>     versions) is likely going to take *even longer*. Or they can
>     download a
>     real editor and actually contribute to whatever codebase they're
>     trying
>     to build.
>
>     > > > 2. Consistency with Web browsers.
>     > >
>     > > I don't see why a web browser would care about these files.
>     >
>     > Maybe not _these_ JSON files, but a general-purpose JSON parser can
>     > still care about consistency with Web browsers.
>
>     That's fine. They can then accept the not-BOM files that every writer
>     for this format would write just like every other BOM-less
>     network-transferred JSON content in the world.
>
>     --Ben
>
>
>
> -- 
> Be seeing you,
> Tony
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/unicode


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/unicode/attachments/20190819/e307b504/attachment-0001.html 


More information about the Unicode mailing list