<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Feb 24, 2019, 7:39 PM Ben Boeckel <<a href="mailto:ben.boeckel@kitware.com">ben.boeckel@kitware.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
I have GCC writing out JSON-like syntax right now. It isn't 100% valid<br>
since it isn't UTF-8, but I don't want *that* in these files either.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">It seems reasonable to have non-ascii in user-provided data fields. We should figure out how to handle the case where the user's path is invalid utf8, like ok linux where it can be a random bag-o-byte or on UCS2 platforms that allow mismatched surrogates. If the compiledb format handles these cases, we should probably just do whatever they do.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Well, you can't know until you actually compile the BMI whether it has<br>
changed or not. The best we can ask for is "only update if contents are<br>
unchanged". Getting that for .o files would be nice as well. Ninja can<br>
then optimize no-change compilations via `restat`.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">I didn't just mean for the scan phase. The BMI can change in ways that don't require the downstream stuff to be recompiled, eg a comment string was changed on a line of source included only for better error reporting. Similarly, I could see that something like that happening with the .o and split-dwarf / osx-style unsplit-split-dwarf.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> And for the love of $diety, don't put any locale- sensitive strings in this<br>
> metadata!<br>
<br>
I'd rather have it just be "a series of bytes that is a valid lookup on<br>
the filesystem". The `\` and `"` characters are escaped using `\` for<br>
obvious reasons. Maybe we do it for control characters as well. Is that<br>
good enough for a specification?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">I think I made my point poorly, and was misinterpreted. I was just making a joke about /showIncludes. The equivalent behavior would be to make the field names in the json file match the user's language. I hope no vender is mean enough to actually do that! Obviously users need to be able to use their language in their files and paths. I'm not suggesting we limit that in any way, just that the field names are predictable.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
</blockquote></div></div></div>