[Tooling] Modules and tooling: Resolving module import declarations

Tom Honermann tom at honermann.net
Wed Aug 29 19:06:46 CEST 2018


Per P0804 <http://wg21.link/p0804>, I've been investigating options for 
how tool implementors can work with the proposed C++ modules design.  
Consider the following code:

import std.core;
import widgets;

std::vector<widget>
get_some_widgets() {
   /* ... */
}

Now, consider what a tool, such as an editor, an indexer, a formatter, a 
static analyzer, a translation tool such as SWIG, a documentation 
generator, or any other tool that requires a semantic representation of 
source code, will require in order to perform its intended job.  How 
will such a tool parse this code?  Specifically, how will it resolve the 
module import declarations for std.core and widgets such that 
declarations for std::vector and widget are available in order to 
successfully parse the remainder of the code?  This email thread 
explores a few possible answers to this question with the intent of 
starting a discussion that, hopefully, will identify a common approach 
that all compiler and tool implementors can agree to implement (while 
still allowing for compiler/tool specific optimizations when available).

The TL;DR; summary of the remainder of this email is:

  * The modules TS doesn't (can't) specify how module imports are
    resolved leaving room for several implementation strategies.
  * Many tools can't require explicit integration with build systems or
    environment configuration.
  * Many tools can't depend on compiler specific module support (e.g.,
    won't be able to consume module artifacts produced by other tools).
  * Having to individually configure every tool with details of
    individual module requirements would be ... bad.
  * An industry standard mechanism for describing how to resolve module
    import declarations could foster tool support and ease migration to
    a modules enabled world.

The modules TS was designed to grant considerable implementation freedom 
in how module import declarations are resolved.  There are two basic models:

 1. Module import declarations are resolved to module interface unit
    source files that are then translated on demand.
 2. Module import declarations are resolved to module artifacts produced
    by a prior compilation of the module interface unit source code for
    the imported modules.

Such implementation freedom has benefits, but it comes with a cost.  If 
each tool imposes its own requirements for how module imports are 
resolved, what does that imply for their use?  Each tool will require an 
answer to "where is the module interface unit source code for module X 
and what preprocessor and language dialect options do I use to translate 
it (for build mode Y)?", or "where is my cached module artifact for 
module X (for build mode Y)?".  The answers to these questions will have 
to be supplied by a build system, a (generic or tool specific) 
environment configuration, or tool specific invocation options.

Build system support is a reasonable requirement for compilation, but is 
not a reasonable requirement for many other tools.  For example, it 
strikes me as unreasonable to require build systems to be augmented with 
explicit support for each of Vim, Emacs, Visual C++, VS Code, Xcode, 
CLion, Cevelop, Eclipse, etc... in order for the maintainers of any 
particular code base to use their preferred editor with advanced 
features like code completion.  Likewise, it seems unreasonable to 
require tools like editors to be able to query any particular build system.

I asked the Xcode and Visual C++ developers how their respective editors 
would handle the code above.  For Xcode, the answer is that, for 
features like code completion that depend on semantic analysis, the 
project will have to have been built first, and the editor will consume 
module artifacts produced during compilation; in other words, such 
features will only work when the code has been built and was built with 
a supported version of Clang.  Visual C++ will likewise support 
consumption of module artifacts produced by the Microsoft compiler, but 
will additionally support configuration options to resolve module import 
declarations without the need for module artifacts. Should we expect 
editors like Vim, Emacs, CLion, Cevelop, etc... to be able to consume 
module artifacts?  If so, for which (versions of which) compilers?

Some modules proponents have argued for a standardized module format 
that all tools could consume.  So far, only Microsoft has invested in 
such an effort.  Clang and gcc have both moved ahead with their own 
(highly optimized to their internal representation) module file 
formats.  Concerns have been expressed regarding the viability of a 
common format due to performance requirements and the fidelity of the 
saved semantic model.  Portions of the C++ language are implementation 
defined, so the semantic model stored by a producer may not match the 
model required by a consumer.  Tool requirements also differ; compilers 
require a semantic description of exported entities and sufficient 
detail to emit useful diagnostics, but tools like static analyzers 
require comments, accurate and precise source location ranges including 
macro expansion contexts, locations of macro (un)definitions, locations 
of redundant and unused declarations, and much more (and yes, this 
information will be required for imported modules; the form of the 
declaration affects the analysis).  A single format, even if limited in 
what it stores with fallback to textual analysis, is unlikely to be the 
best solution for all tools.  My personal impression of the SG15 evening 
session in Jacksonville earlier this year is that this direction will 
not have consensus.

It has been suggested that a standardized API might overcome some of the 
concerns expressed over a standardized format. However, I would expect 
the same concerns regarding performance and semantic models to apply 
here.  To my knowledge, no designs for such an API have been made 
public, nor has a collective effort to design such an API materialized.

I believe sharing module artifacts, in any form, will prove to be 
infeasible.  For tools that already have an established internal 
representation for C++ code, the cost of translating the internal 
representation of another implementation, whether via API or a common 
format, is very high (we know this from experience at Coverity).  For 
those familiar with the internal representations used by gcc and Clang, 
consider what it would take to translate one to the other.  If I were 
assigned such a task, the approach I would take is to use the internal 
representation to generate source that closely reflects the original 
source and that is then compiled by the other (this would not be an easy 
task, nor is it necessarily possible without loss of some information).  
I believe source code is a better portable format than any binary format.

The LSP (language server protocol; https://langserver.org) provides a 
tool agnostic approach to avoiding the parsing question altogether by 
providing a protocol by which a client can request some semantic 
information such as code completion, hover text, and location 
information.  The server (likely closely tied to a particular compiler) 
responds with information collected during a build (whether cached or on 
demand).  Vim, Emacs, VS Code, CLion, and other editors have added or 
are adding support for it.  While the LSP is useful for language 
agnostic tools, it isn't something that can scale to meet the semantic 
detail and performance requirements of language specific tools like 
static analyzers.

Many tools depend on the ability to consume standard library 
implementations produced by other vendors.  The C++ standard will 
eventually prescribe modules such as std.core for standard library 
components, but these modules may be composed from many dependent 
modules, the structure of which is implementation detail.  A separate 
configuration approach for each tool might require that each tool be 
configured for the internal module topology for each of the Microsoft, 
libstdc++, libc++, etc... standard library implementations.  Such an 
approach matches how we handle header files today; tools must be 
configured with include paths that include implementation dependent 
paths.  But what if an implementor were to make their standard library 
modules only available via module artifacts (as Microsoft does today, 
though this is expected to change).  The Modules TS specifies (5.2 
[lex.phases] p7) "It is implementation-defined whether the source for 
module interface units for modules on which the current translation unit 
has an interface dependency (10.7.3) is required to be available".  It 
seems to me that withholding standard library module interface unit 
source code would be rather user hostile and I don't expect any 
implementations to do so; I believe that addition in the Modules TS is 
intended more for build system flexibility. Nevertheless, the potential 
for module interface unit source code to be absent is a concern for 
tools that are unable to consume module artifacts.

Historically, we've taken the individual tool configuration approach for 
support of header files and, despite limitations, it has sufficed.  
However, modules changes one critical aspect of such configuration.  
Previously, header files needed to be consumable with the same set of 
include paths and macro definitions as is used for the primary source 
file.  Translating module interface unit source code may require 
different, even conflicting, include paths and macro definitions.  Thus, 
configuration will become more challenging.  I think we should strive 
for a better solution for modules.

If we can't require build system integration for all tools, and we can't 
rely on sharing module artifacts, and separate configuration for each 
tool would be challenging, where does this leave us?

I think we need an industry standard, tool agnostic solution that works 
for common environments (e.g., non-exotic environments in which source 
code is stored in files) and is supported by all compilers and tools.  
Tools can always offer opt-in features for build optimization that 
require build system augmentation (analogous to use of precompiled 
headers today).

What might such an industry standard approach look like?  Here is a 
sketch of a design:

 1. A (set of) module description file(s) that specifies:
     1. A map from a module name to the file name for the module
        interface unit source code.  A default naming convention could
        also be adopted, though we already have two competing
        conventions (.cppm vs .ixx).
     2. A set of requirements for translating the module interface unit
        source code (for one or more variations or build modes).  This
        includes preprocessor information (include paths, macro
        definitions, macro undefinitions), and, potentially, language
        dialect requirements (specified in a generic form and, perhaps,
        with the ability to customize for specific tools).
 2. A method of specifying a path to search for module description
    files, similar to existing include paths.

Note that such module description files need not be statically written 
and maintained.  They could be generated directly by a build system, or 
as a side effect of compilation.  If generated, tools dependent on them 
would be dependent on a (partial) build having been completed; as is the 
case today for build systems that generate header files.

Clearly, such a specification falls outside the scope of the C++ 
standard.  However, we could provide a specification in the form of a TS 
that implementors can adhere to.

So, what do you think?  Do you agree that there is a problem worth 
solving here?  Is a common specification a feasible solution?  Is 
standardizing such a specification useful and desirable?  What 
requirements should be placed on the design?  If you are a compiler or 
tool implementor, have you already been working on modules support?  If 
so, what approaches have you been considering?  Are they captured 
above?  What is your preferred solution?

Thank you to Gabriel Dos Reis, Nathan Burgers, Dmitry Kozhevnikov, 
Manuel Klimek, Peter Sommerlad, and Ville Voutilainen for corrections 
and suggestions they provided on preview drafts of this email.  (This 
thank you is in no way intended to reflect their support, or lack 
thereof, for anything suggested in this email).

Tom.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/tooling/attachments/20180829/4cfaf14c/attachment.html 


More information about the Tooling mailing list