<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi everyone!<br class=""><br class="">I'm working on CLion IDE at JetBrains, and, if everything will go well, I hope to attend the San-Diego meeting, that would me my first meeting ever. I'm mostly interesting in tooling, and I have several ideas about how the C++ tooling landscape could be improved, so I'd like to have some feedback if there problems hit home for anyone, and if it it's worth it for me to put together a paper.<br class=""><br class="">One of the more painful tasks when building a source-level tool (i.e. an IDE, or a source-to-source utility) is to collect the compiler/toolchain information required to properly analyze the source:<br class="">        - header search paths<br class="">        - built in and user-defined preprocessor definitions<br class="">        - language features available (i.e. for this specific gcc version with `-std=gnu2a`, is `requires` a keyword or still an identifier?)<br class="">        - compiler intrinsics (i.e. what is `__builtin_types_compatible_p` and how should we parse it?)<br class=""><br class="">Here is some tangentially related thread in cfe-dev, which describes some problems and partial solutions: <a href="http://lists.llvm.org/pipermail/cfe-dev/2018-April/057683.html" class="">http://lists.llvm.org/pipermail/cfe-dev/2018-April/057683.html</a><br class=""><br class=""><div class="">Figuring everything out requires either:<br class="">        1. intimate knowledge of the various compiler drivers (how to query it for features and extensions)<br class="">        2. a pre-populated database of such information, so you can pick one and hope it's correct (i.e. you can try to guess proper clang's target triple for a given toolchain, but you can't know beforehand if it's exist and if it's actually match the toolchain you're given).<br class=""><br class="">You might say: "everyone is using real compilers to parse C++, they know it all anyway" - they only do know it about themselves, or toolchains they're able cross compile to, for example:<br class="">- clang-based tool might have troubles with more exotic compilers like Intel, Green Hills, or clang version which is more recent than used in the tool<br class="">- InteliiSense in MSVC (which is, AFAIK, EDG-based) might have troubles with remote projects using a fairly old or fairly new gcc<br class=""><br class="">Another related problem is that currently there is very complicated to reason about conditionally-uncompiled code if you don't have access to required toolchain:<br class=""><br class="">        #ifdef _WIN32<br class="">                 int x = foo(); // it's complicated to find this usage when cross-referencing the `foo` symbol if you're in an IDE on Linux<br class="">        #endif<br class=""><br class="">So that's what I'm thinking of: it would be great to have a standardized and universally-agreed way to describe everything that is needed to parse a C++ file. This description could be generated eigher generated on demand using an actual compiler used for a specific file, or even distributed with a project (for toolchains that the IDE/tool might not have access to).<br class=""><br class="">As a very rough draft, it could look like a JSON object like:<br class=""><br class=""> {<br class=""> "file_path": "file.cpp",<br class=""> "user_macros": [<br class=""> { "X" : "", }<br class=""> { "Y" : "1" },<br class=""> ...<br class=""> ],<br class=""> "builtin_macros": [<br class=""> { "__GNUC__" : "4" },<br class=""> ...<br class=""> ],<br class=""> "builtin_macro_predicates": [<br class=""> {<br class=""> "__has_feature" : ["cxx_lambdas", "cxx_modules"],<br class=""> "__has_extension" : ["cxx_lambdas", "cxx_modules"],<br class=""> "__has_builtin" : ["__type_pack_element"]<br class=""> },<br class=""> ...<br class=""> ],<br class=""> "function_like_builtins": ["__builtin_offsetof", "__builtin_offsetof", ...],<br class=""> "template_alias_like_builtints": ["__type_pack_element", ...],<br class=""> "features": { "exceptions" : true, "concepts" : false, ... },<br class=""><br class=""> "type_sizes" : {<br class=""> "int" : 4,<br class=""> "long": 8,<br class=""> "char": 1,<br class=""> ...<br class=""> },<br class=""><br class=""> "header_search_paths": [<br class=""> { path: "target/p1", "builtin": 1, "quote": 0 },<br class=""> { path: "target/p2", "builtin": 0, "quote": 1 },<br class=""> ],<br class=""><br class=""> "compiler_version": "...",<br class=""> "compiler_executable": "...",<br class=""> "working_directory": "..."<br class=""> }<br class=""><br class="">Of course, it would be much bigger (it’ll contain roughly everything which is required for a syntax-only pass of a compiler frontend).<div class=""><br class=""></div><div class="">An interesting question is what to do with various intrinsics and builtins. For example, they could be mentioned, and also annotated with some properties (i.e. this one is function-like, and that one is a "function" that take types and return a value). So if a tool/IDE knows how to handle it exactly, it will; if not, it could at least recover during the parse way more gracefully than just treating it as an unknown identifier.<br class=""><br class="">Q: How can we get such data?<br class=""><br class="">1. For new and collaborating compilers, they can produce it themselves (i.e. this is a step in this direction: <a href="https://reviews.llvm.org/rL333653" class="">https://reviews.llvm.org/rL333653</a>)<br class="">2. For older or non-collaborating compilers, there could be a community-maintained tool which would aggregate all the knowledge about it's possible arguments, driver quirks, output formats, available builtins, etc. (I have a private prototype of this tool which I'm trying to use for IDE regression tests, however, it's very far from being useful yet. I hope to open source it sooner or later.)<br class=""><br class="">TLDR: What are the benefits?<br class=""><br class="">1. An arbitrary IDE would be able to work with an arbitrary compiler (given it provides all the required info, or someone (i.e. the compiler author themselves), had contributed everything required to a community-maintained tool). This would, hopefully, lead to better tools adoption, and will share some tool author's maintenance burden with the rest of the community :)<br class="">2. It opens a possibility to have some proper code insight for configurations you're not able to build locally.</div><div class=""><br class="">What do you think? Does it all make sense? Should I put more effort in it an try to compose a paper?</div><div class=""><br class=""></div></div><div class="">
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;">Best regards,<br class="">Dmitry Kozhevnikov<br class=""></div></div></body></html>