1. Effects of This Paper
This paper makes the following ill-formed by forbidding macro expansion in the name of module declarations.
version.h:
#ifndef VERSION_H #define VERSION_H #define VERSION libv5 #endif
lib.cppm:
module ; #include "version.h"export module VERSION ;
This is still valid in
declarations, as are macros in the attribute following a module declaration.
2. The Issue
Given
, the implementation needs to know which TU contains
. There are many
possible ways to do this, but the current specification makes this diffcult in the general case.
module ; #include <ponies.h>export module creature ; // ...
In this example the implementation must either have an oracle, or preprocess up until the
preprocessing directive to determine which module this TU defines, as the pp-tokens that make up the module name are
themselves subject to macro replacement cpp.module/2, including any macros
brought in by
.
This means that build systems must either:
-
Do preprocessing up front to determine where modules are defined, adding latency to the build
-
Require explicit metadata for modules, even local to the project
-
Require module names to match file names
-
Not support such cases
2.1. Sketch of a Simple Build System
For a more concrete example of where this becomes a problem, here’s a sketch of a simple build system using ninja.
As input you have 100
and
files where
files are importable TUs, and a
file with rules for building each TU, but without module dependencies.
If you started a build with
, 16 of those TUs would start building, and start hitting
s which need to be
resolved. However, there are still 84 TUs that haven’t started building yet that likely contain the module declarations
to resolve these imports.
If we want as close to a zero-configuration build system as possible without also adding restrictions on module names,
we must add a module discovery phase that runs before the first dependent
is resolved. This can either be
explicit in the build system, or part of the module mapper. Currently this discovery phase is required to do
preprocessing which adds a delay before any real compilation can begin.
2.2. Caching Build Systems
Another case where latency is particularly important is in caching build systems. Let’s assume the same collection of 100 TUs as before, but this time our build system can return cached results for compilations. In order to do this in a reproducable manner the cache key must be dependent on the the full input to each compilation, including all source files and modules it depends on, including how they are built, recursively.
In a non-modules world this can be computed by minimal preprocessing;
however, while resolving
s to module declarations is not needed for discovering direct dependencies, it is
needed to determine the cache key for a compilation. Latency is important here because time spent discovering module
declarations delays time to first byte for any cache hits.
3. Module Declaration Discovery
Due to the structure of a preprocessing-file, the pp-module line is discoverable at the start of phase 4 of
translation without processing any
s or resolving any preprocessing conditionals. For some environments this
can be done without a command line at all, or with only a partial one. The only thing preventing this is that the module-name and module-partition tokens may be subject to macro replacement.
If this were not the case, then a reasonably simple parser can determine the module-name and module-partition of a source file without calling out to compiler specific tooling.
4. Compatibility
This is a breaking change with C++20 and C++23, however, given the limited current deployment of modules and rarity of such use cases, the breakage is expected to be minimal.
5. Wording
Apply the following wording as a DR:
Modify Module directive [cpp.module] inserting a paragraph after paragraph 1 as follows:
pp-module: opt
export pp-tokensopt
module new-line
; 1 A pp-module shall not appear in a context where
or (if it is the first token of the pp-module)
module is an identifier defined as an object-like macro.
export 2 The pp-tokens, if any, of a pp-module shall be of the form:
pp-module-name pp-module-partitionopt pp-tokensopt
where the pp-tokens (if any) shall not begin with apreprocessing token and the grammar non-terminals are defined as:
(
pp-module-name: pp-module-name-qualifieropt identifier pp-module-partition: pp-module-name-qualifieropt identifier
: pp-module-name-qualifier: identifier
. pp-module-name-qualifier identifier No identifier in the pp-module-name or pp-module-partition shall currently be defined as an object-like macro.
.