1. Current Semantics
s in module names currently have no semantics. There is no relation defined
by the standard between
and
or
. The
only meaning they have is the implied hierarchy that developers read into them.
2. Usage
At the SG15 meeting at CppCon the following poll was taken:
Tooling suffers if we remove dots from module namesSF F N A SA 4 7 5 0 0 Attendees: 18
Described below are a few current uses of
s in module names that I am aware of.
2.1. Implied Structure
Module authors may use
s to communicate with developers by implying some structure and relationship between module names with
a common prefix. The general approach is that a module
should
all modules with names that start with
. For example:
export module std ; export import std . vector ; export import std . algorithm ; ...
2.2. Filesystem Mapping
The next is for mapping module names to filesystem paths. Here
is used
as a proxy for
. For example,
could map to
.
2.3. Naming Uniqueness
Dots are a useful tool to avoid naming conflicts, as they form a good
separator.
is unsuitable for this role as it is often used in names, given
that it is a valid identifier character.
2.4. [P1767R0] Packaging C++ Modules
This proposal uses a
naming scheme to deal
with module name collisions.
3. History
Modules have had a long history in C++, dating back to at least 2004. Every paper until recently had an idea for something similar to submodules. It’s useful to explore this history to see which directions we may want to go in the future.
First, let’s start with Daveed Vandevoorde’s 2004 paper: [N1736]
In this paper, submodules are known as module partitions. These are different from the module partitions we have today in that they are externally visible. The syntax started as
, but in 2007 moved to
. In this model, partitions export a subset of names from their parent module. Additionally, non-exported names from a partition are visible to any other partition of the same module. Note that this proposal also supports
in module names. It has no semantic meaning, and is for the purpose of allowing module names to match the namespace they define.
Next, let’s look at Doug Gregor’s 2013 SG2 presentation, which also had submodules.
This was a description of Clang modules and additional syntax. In this model, a module exports all of its submodules as defined in a module map. Today submodules are used in Clang for two primary purposes. The first is to restrict which names are visible when importing a submodule to those in the submodule. The second is to allow interdependencies between submodules without causing a cycle. Clang compiles a single module together along with all of its submodules as a single translation unit, and keeps track of which names are visible from which submodules.
Next is Microsoft’s 2014 modules proposal, which also had submodules as part of the design: [N4214]
Section 4.1.1 Module Names and Filenames We propose a hierarchical naming scheme for the namespace of module-name in support of submodules
Section 4.5 Submodules A submodule can serve as cluster of translation units sharing implementation detail information (within a module) that is not meant to be accessible to outside consumers of the parent module.
The design was intended to allow control of visibility of names in submodules. This functionality never made it into the wording. The design didn’t go into enough detail about how it would be implemented to determine if adding this functionality would be a breaking change.
Last, we have Google’s ATOM proposal: [P0947r1]
This proposal adds the module partitions we have today, but keeps
in module names. A key part of module partitions is that the partition name is not visible outside of the module in which they are defined.
4. Problems
There are two main issues with keeping the
in module names.
4.1. User Confusion
In C++ the
syntax is used by every developer. It has very specific
semantic meanings, but even at the highest level it always establishes some form
of hierarchy. We’ve already seen people be confused about C++ modules having no
hierarchy, and even today a search for subpackages in Java leads to questions
and answers centered on this confusion.
Developers will use
to communicate something to their users, but will they
communicate the same thing? We will end up with different behaviors in different
libraries, which will cause additional confusion.
4.2. Walling Off the Future
By allowing
in module names without semantics, we potentially prevent giving
them semantics in the future as it may be a breaking change.
5. Possible Semantics
5.1. Other Languages
Java/Groovy:
in package and module names only impacts where the classloader (basically the runtime dynamic linker) looks up
files (Java bytecode) on the filesystem (or in
s). They have no semantic meaning in the source code.
Python: Modules cannot have
in their names, instead
s are used for packages and subpackages (which both contain modules). Packages are determined by filesystem layout and directories are accessed using
. The syntax
is controlled by the
variable in the package’s
and can select which, if any, modules from that package are imported. Additionally, there are relative imports using
which go up the package hierarchy.
C#/VB.NET: No modules/packages, just namespaces. Namespaces are separated by
when referencing them, and are hierarchical.
JavaScript: Modules don’t have names; they have paths represented by strings. Paths can contain
s and don’t mean anything special.
is special, as it is a directory separator.
Objective-C{,++}: Uses Clang modules. Module names are separated by
into submodules. Submodules are used to control visibility of names and are hierarchical.
Delphi/Object Pascal: Namespaces can have
s in their names and there’s no hierarchy.
Go: Package names cannot contain
, no subpackages.
Ruby: Modules in Ruby are closer to namespaces, not really a modules system, and can’t contain
s. Ruby uses library names represented by strings for loading other code; these can also be absolute paths.
Swift: Module names cannot contain
s. Can import Objective-C (Clang) modules and submodules.
MATLAB: Package names can’t contain
s. Subpackages are hierarchical and are accessed via
.
R: All identifiers can contain
s.
is used as the namespace separator.
Perl: Uses
and are translated to filesystem paths, can’t contain
.
Rust: No
in module names.
is used as a crate and file system separator.
Of these languages, only two have a
symbol that means something in normal code but means nothing in a package/module name. One is from the 80s and the other is from the 90s. Additionally, every reference I found for Java was either someone confused about what
meant, or someone explaining what it meant to people who were confused. We shouldn’t follow Java’s example here.
5.2. Private Submodules
One problem not solved by partitions is that of restricting visibility between modules. If I have some private details shared by two separate pieces of code, I only have two options: Make it a separate module with no way to restrict access to it; or combine both separate pieces of code into a single module, restricting build paralellism. A proper submodules system could resolve this by allowing us to control which modules a submodule is visible to.
6. Design Tradeoffs
When choosing a syntax, there is always a design tradeoff. In this case, that tradeoff has three major factors: utility, understandability, and extensibility.
There are valid usecases for
, as it does provide additonal ways for a module author to communicate to their
users a relationship between modules. We could have restricted identifiers to
just a and b and have equal semantic power, but we didn’t because that would
severely limit communication.
We want new syntax to be understandable to existing and new C++ users. We often do this by reusing or mimicking existing syntax when it is close enough in semantics that we want that knowledge to carry over. We also choose different syntaxes when we want to avoid conflating two different concepts.
C++ cares about backwards compatibility, even in edge cases (due to Hyrum’s law). When we choose a syntax, we’re pretty much stuck with that syntax and have great difficulty changing what it means. We should have a high bar for the benefit we get from a syntax due to this.
Given these tradeoffs, I think that for C++20 the risks to understandability and
extensibility far outweigh the utility gained by allowing
s in module names.
Over the next few years, we should get to know modules better, how they are
used, and what changes we really want before closing this door.
7. Wording
7.1. [module.unit]
module - declaration : export opt module module - name module - partition opt attribute - specifier - seq opt ; module - name : module - name - qualifier opt identifier module - partition : : module - name - qualifier opt identifier module - name - qualifier : identifier . module - name - qualifier identifier .