<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Thank you for writing this up, Thiago!<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 9/5/19 12:12 AM, Thiago Macieira
wrote:
</div>
<blockquote type="cite" cite="mid:1925531.6oPTPzp6tz@tjmaciei-mobl1">
<pre class="moz-quote-pre" wrap="">== Transport ==
P1689 suggests using JSON. I'm comparing that in the context of the three
options with a binary format (CBOR).
One thing SG16 is completely in agreement of is that if you go with JSON, you
must obey RFC 8259: there must not be a BOM and the file must be encoded in
UTF-8.</pre>
</blockquote>
<p>We haven't polled anything, so saying we're all in agreement is
premature. Additionally, we discussed this further in the SG16
meeting yesterday and I think we determined that a BOM *may* be
present.</p>
<p>RFC 8259 section 8.1 states: (emphasis mine)<br>
</p>
<blockquote>
<p>JSON text exchanged between systems <b>that are not part of a
closed ecosystem</b> MUST be encoded using UTF-8 [RFC3629].<br>
<br>
Previous specifications of JSON have not required the use of
UTF-8 when transmitting JSON text. However, the vast majority
of JSON-based software implementations have chosen to use the
UTF-8 encoding, to the extent that it is the only encoding that
achieves interoperability.<br>
<br>
Implementations MUST NOT add a byte order mark (U+FEFF) to the
beginning of a <b>networked-transmitted JSON text</b>. In the
interests of interoperability, implementations that parse JSON
texts <b>MAY ignore the presence of a byte order mark</b>
rather than treating it as an error.<br>
</p>
</blockquote>
<p>My reading of this is that RFC 8259 permits use of non-UTF-8
encodings in some situations. Whether the situation that P1689 is
defined for qualifies is something that could be debated. If we
consider the build system and compiler invocations to form a
closed system, then the dependency file could be, for example,
EBCDIC encoded JSON and still conform to RFC 8259. I'm not
arguing for or against such a position at this time; but rather
noting that, if SG15 requires UTF-8 encoded JSON, that requirement
is arguably more restrictive than what RFC 8259 requires.<br>
</p>
<p>My reading of the BOM requirements is that they only apply to
UTF-8 data sent over the network and that use of a BOM in file
contents is permitted.</p>
<p>ECMA 404 does not specify any requirements on encoding of the
JSON content, nor the presence or absence of a BOM.</p>
<p>My conclusions are, if we choose to adopt either RFC 8259 or ECMA
404 as the JSON specification deferred to, and if we don't add
additional restrictions, that:</p>
<ol>
<li>Implementations could choose whatever encoding they like for
the JSON file.</li>
<li>Implementations could choose whether to produce and consume a
BOM.<br>
</li>
</ol>
<p>Tom.</p>
</body>
</html>