Document | P1842R0 |
---|---|
Audience | SG15 |
Authors | Boris Kolpackov (Code Synthesis / build2) |
Reply-To | boris@codesynthesis.com |
Date | 2019-08-04 |
Abstract
This paper suggests generalizing the module mapper protocol described
in P1184 to also handle headers as
well as potential future translation unit dependencies such as
std::embed
(P1040).
Contents
1 Background
Because header units affects the preprocessor, they introduce a
significant complication to the dependency graph discovery (refer to P1184 for details). A dynamic module
mapper is currently the only approach that we know of that allows dealing
with this complication in the general case (that is, without relying on a
pre-compilation step or manual dependency specification) and without
requiring an additional mechanism in the compiler (such as the ability to
preprocess with isolation textual headers in lieu of loading BMIs). As a
result, in build2
, we have decided to use the module mapper
approach to handle header units and include translation.
Our initial attempt used GCC's module mapper to discover and handle
header unit importation and the -M
option family for header
dependency discovery. However, it quickly became clear that there is a
significant overlap between the two mechanisms. In fact, because of the
include translation, the mapper gets notified about most headers reported by
-M
: the only exceptions are the predefined (forced) and command
line (-include
) headers.
More importantly, the mapper approach seemed like a promising way to
resolve many long-standing issues with handling auto-generated headers. To
give some background, in the -M
option family, auto-generated
headers are normally handled using -MG
which instructs the
compiler to not fail on encountering non-existent headers. The build system
then detects such headers in the -M
output, generates them, and
re-executes the compiler.
However, this approach, besides being inefficient, also has many issues and corner cases (listed in the order of increased difficulty to deal with):
- Outdated header: The build system has to detect when an auto-generated header exists but is out of date, update it, and, again, re-execute the compiler.
- Wrong header: If the auto-generated header does not exist, the compiler
may find and include an identically-named but unrelated header that is found
in one of the further
-I
directories. - Outdated/wrong header causes an error: Including an outdated or wrong
header may trigger a fatal preprocessor error (e.g., via an
#error
directive) that would disappear if only the header could be regenerated.
In contrast, the mapper approach would have the ability to sidestep all these issues because it would give the build system a chance to act before preprocessing a header.
Finally, the mapper can also be easily extended to handle potential
future dependencies of translation units, such as those in the
std::embed
proposal (P1040).
The following sections describe the generalized module mapper (now more
accurately called dependency mapper) that we have implemented in GCC and then used in build2
with good results.
2 Communication
GCC currently supports several module mapper communication media:
- File.
- Pipe (including compiler's
stdin
/stdout
). - Program to spawn and then communicate via its
stdin
/stdout
. - Socket/port to connect to (UNIX, IP).
The last two communication media may understandably raise security concerns. We, however, believe they can be omitted or made optional by an implementation if non-intrusive support for legacy build systems is not a priority.
To elaborate, in our experience, the most natural way to integrate the module mapper functionality into a build system is using the first two media (file and/or pipe). The build system spawns the compiler process and using a pipe is the most straightforward and efficient way of establishing bi-directional communication. Only when the build system cannot be easily modified, might other communication media be necessary.
3 Protocol
For the remainder of the paper we refer to the file-based mapper as static and the rest – as dynamic. The dynamic mapper uses the line-based request-response protocol. The static mapper, due to its nature, has a separate, more limited protocol. Refer to P1184 for the protocol basics and to the following sections for the generalizations.
Theoretically, a static mapper can be implemented via something other
than a file. For example, the compiler may read the static mapping from its
stdin
.
An implementation can reasonably be expected to support multiple static mappers and a single dynamic mapper for the same compilation.
One notable protocol feature described in P1184 is request batching in the dynamic.
However, with the relaxation of the preamble rules around macro importation,
the compiler's ability to request multiple mappings in parallel is now
limited to contiguous non-header unit imports. It is therefore unclear
whether the extra complexity (both in the compiler and in the build system)
justifies the now limited benefit. As a result, we propose that if
implemented, this feature be made optional and its use negotiated via the
impl-extra field in the HELLO
request/response (see
below).
3.1 Dynamic Mapper
The generalized protocol uses quoting to distinguish between modules and
headers. The ""
and <>
quoting
are used for the corresponding styles of include
and
import
directives while ''
is used for the
predefined (forced) and command line inclusion as well as in the contexts
where translation or re-search is not allowed (in other words,
''
implies final/immutable inclusion/importation).
Protocol synopsis (leading >
marks a request from the
compiler to the mapper and <
– a response).
> HELLO ver kind ident
[impl-extra...]
< HELLO ver kind ident
[impl-extra...]
< ERROR msg
> EXPORT mod-name
> EXPORT 'hdr-name'
< EXPORT bmi
< ERROR msg
> DONE mod-name
> DONE 'hdr-name'
> IMPORT mod-name
> IMPORT <hdr-name> [hdr-path]
> IMPORT "hdr-name" [hdr-path]
> IMPORT 'hdr-name' hdr-path
< SEARCH
< IMPORT [bmi]
< ERROR msg
> INCLUDE <hdr-name> [hdr-path]
> INCLUDE "hdr-name" [hdr-path]
> INCLUDE 'hdr-name' hdr-path
< SEARCH
< INCLUDE
< IMPORT [bmi]
< ERROR msg
Example exchange translating <stdio.h>
inclusion to an
import :
> HELLO 0 GCC main.cxx < HELLO 0 build2 . > INCLUDE 'stdc-predef.h' /usr/include/stdc-predef.h < INCLUDE > INCLUDE <stdio.h> /usr/include/stdio.h < IMPORT > IMPORT '/usr/include/stdio.h' < IMPORT stdio.gcm
Example exchange importing an auto-generated header:
> HELLO 0 GCC main.cxx < HELLO 0 build2 . > INCLUDE 'stdc-predef.h' /usr/include/stdc-predef.h < INCLUDE > IMPORT <foo/data.h> < SEARCH < IMPORT <foo/data.h> libfoo/foo/data.h < IMPORT libfoo/foo/data.gcm
3.1.1 IMPORT
> IMPORT mod-name
> IMPORT <hdr-name> [hdr-path]
> IMPORT "hdr-name" [hdr-path]
> IMPORT 'hdr-name' hdr-path
< SEARCH
< IMPORT [bmi]
< ERROR msg
The first form of the IMPORT
request is made when importing
a module or a module partition. Valid responses are IMPORT
and
ERROR
.
The next two forms are used for importing header units that were imported
using <>
and ""
importation
styles, respectively. If the compiler was able to resolve this header name
to the header path, then this path is included into the request as
hdr-path. Otherwise, hdr-path is absent. Valid responses for
these two forms are SEARCH
, IMPORT
, and
ERROR
. The SEARCH
response causes the compiler to
re-search the header name and re-issue the IMPORT
request with
the (presumably) new header path.
Instead of requesting the compiler to re-search the header, the response
could have included the desired header path directly. The difficult part
about supporting something like this would be the need to reverse-map the
returned path to an include directory so that mechanisms such as
include_next
, system header status, etc., all work correctly.
And it seems the only way to do this reliably would be to search for files
in the include directories and see if one of them matches the returned path
in the same heavy-handed way as #pragma once
(comparing
file contents, etc).
If the header is not found (hdr-path is absent), then the
IMPORT
response should cause the compiler to issue the usual
"header not found" diagnostics. In this case the bmi field is ignored
and can be omitted.
The last form is used to import header units that cannot be re-searched.
For example, this form of the IMPORT
request is issued for
include directives that have been translated to import (see below).
3.1.2 INCLUDE
> INCLUDE <hdr-name>
[hdr-path]
> INCLUDE "hdr-name" [hdr-path]
> INCLUDE 'hdr-name' hdr-path
< SEARCH
< INCLUDE
< IMPORT [bmi]
< ERROR msg
The first two forms of the INCLUDE
request are analogous to
the corresponding IMPORT
forms. The INCLUDE
response signals that the header should be textually included while the
IMPORT
response signals that it should be translated to an
import. The IMPORT
response may optionally specify the BMI. If
the BMI is omitted then the compiler should issue a separate
IMPORT
request.
Replying with just IMPORT
could be useful if, for example,
the mapping is split between dynamic and static mappers.
Similar to IMPORT
, if the header is not found
(hdr-path is absent), then the INCLUDE
or
IMPORT
response should cause the compiler to issue the usual
"header not found" diagnostics. In this case the bmi field in the
IMPORT
response is ignored and can be omitted.
The last form is used to include headers that can neither be re-searched nor translated.
3.2 Static Mapper
The static mapper specifies one module or header to BMI mapping per line in the following form:
[prefix] mod-name bmi
[prefix] 'hdr-path' bmi
[prefix] !'hdr-path' [bmi]
Note that the same format is used both to provide the input mapping for imported modules/headers as well as the output mapping for writing a module/header BMI.
A line prefix may be specified in an implementation-defined manner (for example, as part of the command line option that specifies the mapper file). If specified, then only lines that begin with such a prefix are considered (the prefix itself is ignored). Leading (after the line prefix, if any) and trailing whitespaces as well as blank lines are ignored.
Specifying the line prefix is supported by GCC but this functionality is not described in P1184.
The line prefix allows reusing existing files, such as the venerable
.d
file, for storing the module mapping information.
The last form (with the leading !
) is used to signal
that including this header should be translated to an import. In this form
specifying the BMI is optional.
It may be desirable to allow separating the specification of header to BMI mapping and include translation, for example, in different mapper files. At the same time we expect it to be common for these specifications to be combined.
4 Questions and Answers
4.1 Is there implementation experience?
Yes, an implementation is available in the boris/c++-modules-ex
GCC branch.
4.2 Is there usage experience?
Yes, the build2
build
system implements support for modules and header units (including
include translation) in GCC using this generalized mapper.
5 Acknowledgments
This work is based on Nathan Sidwell's P1184 and module mapper implementation in GCC. The module mapper idea was originally conceived (according to P1184) in a discussion between Nathan Sidwell, Richard Smith, and David Blaikie.