1. Revision History
1.1. Revision 6 - March 2nd, 2020
-
Add new section § 2 Relevant Polls.
-
Add new section § 5.3.4 Dependency-Scanning Friendly with #depend.
-
Add new section § 5.3.5 Modules.
-
Improve section § 5.3.6 Statically Polymorphic.
-
Add new section § 5.3.7 Optional Limit.
-
Add new section § 5.3.8 UTF-8 Only.
-
Add new section § 6 Previous Implementations.
-
Improve wording and add static version (thanks, @lichray).
1.2. Revision 5 - January 13th, 2020
-
Split
into a new paper.#embed -
Add memory and time benchmarks from various implementation strategies in the new Current Practice section.
-
Address concerns for a generic API and similar in the new Results Analysis section.
-
Retarget to EWG and SG 7.
1.3. Revision 4 - November 26th, 2018
-
Wording is now relative to [n4778].
-
Minor typo and tweak fixes.
1.4. Revision 3 - November 26th, 2018
-
Change to using
.consteval -
Discuss potential issues with accessing resources after full semantic analysis is performed. Prepare to poll Evolution Working Group. Reference new paper, [p1130], about resource management.
1.5. Revision 2 - October 10th, 2018
-
Destroy
andembed_options
options: if the function is materialized only at compile-time throughalignment
or the upcoming "immediate functions" (constexpr
), there is no reason to make this part of the function. Instead, the user can choose their own alignment when they pin this down into a std::array or some form of C array / C++ storage.constexpr !
1.6. Revision 1 - June 10th, 2018
-
Create future directions section, follow up on Library Evolution Working Group comments.
-
Change
tostd :: embed_options :: null_terminated
.std :: embed_options :: null_terminate -
Add more code demonstrating the old way and motivating examples.
-
Incorporate LEWG feedback, particularly alignment requirements illuminated by Odin Holmes and Niall Douglass. Add a feature macro on top of having
.__has_include ( < embed > )
1.7. Revision 0 - May 11th, 2018
Initial release.
2. Relevant Polls
The following polls are shaping the current design. Votes are in the form of SF (Strongly in Favor), F (in Favor), N (Neutral), A (Against), SA (Strongly Against).
- We would like to have this feature in C++(Something) and spend time figuring out the details.
-
SF F N A SA -
14 13 2 0 0 -
Consensus: Do more work.
- We want recursive globs (recursively searching through directories) for #depend.
-
SF F N A SA -
2 3 9 12 4 -
Consensus: Do not want.
-
Vote Commentary:
-
A: Complexity?
-
SA: Windows is slow with recursive globs.
It should be mandatory that EVERY file for
is specified by a
.
-
SF F N A SA -
4 12 5 6 3 -
Consensus: Split, no consensus. Add why/why not.
-
Vote Commentary:
-
SF: This MUST exist. Both compiler and build system authors. (Implementers.)
-
SA: Can make user experience sad face for common case. Build system should scream at you for making the mistake instead.
should form a Virtual File System / String Table State that constrains the search and should be passed to std::embed.
-
SF F N A SA -
3 11 7 2 2 -
Consensus: Do it.
-
Vote Commentary:
-
SA: Hell to implement. (This was the author.)
Make std::embed ill-formed inside of a module interface (with a plan to revisit later).
-
SF F N A SA
-
4 2 7 1 1
-
Consensus: Yes, but Meh.
-
Vote Commentary:
-
SA: Modules are important we should make sure it interacts well with modules (figure it out now).
-
SF: How does this work with #depend ?
-
SF: std::embed is basically a #include -- why would we want it in interface? Just focus on getting feature working and doing it well.
-
A: Jumping the gun. Space needs more exploration.
-
N: We are highly undecided - need to answer more questions (especially about Modules).
3. Motivation
I’m very keen on std::embed. I’ve been hand-embedding data in executables for NEARLY FORTY YEARS now. — Guy "Hatcat" Davidson, June 15, 2018
Currently | With Proposal |
---|---|
|
|
|
(Works here.) |
A very large amount of C and C++ programmer -- at some point -- attempts to
large chunks of non-C++ data into their code. Of course,
expects the format of the data to be source code, and thusly the program fails with spectacular lexer errors. Thusly, many different tools and practices were adapted to handle this, as far back as 1995 with the
tool. Many industries need such functionality, including (but hardly limited to):
-
Financial Development
-
representing coefficients and numeric constants for performance-critical algorithms;
-
-
Game Development
-
assets that do not change at runtime, such as icons, fixed textures and other data;
-
Shader and scripting code;
-
-
Embedded Development
-
storing large chunks of binary, such as firmware, in a well-compressed format;
-
placing data in memory on chips and systems that do not have an operating system or file system;
-
-
Application Development
-
compressed binary blobs representing data
-
non-C++ script code that is not changed at runtime;
-
-
Server Development
-
configuration parameters which are known at build-time and are baked in to set limits and give compile-time information to tweak performance under certain loads;
-
SSL/TLS Certificates hard-coded into your executable (requiring a rebuild and potential authorization before deploying new certificates), and;
-
-
Static Analyzers
-
Static analyzers suffer -- much like their binary code generating friends -- from having to parse extremely large array literals;
-
Reduces memory pressure and enables better information tracking and potential sanitization (file source is not lost in build system).
-
In the pursuit of this goal, these tools have proven to have inadequacies and contribute poorly to the C++ development cycle as it continues to scale up for larger and better low-end devices and high-performance machines, bogging developers down with menial build tasks and trying to cover-up disappointing differences between platforms. It also absolutely destroys state-of-the-art compilers due to the extremely high memory overhead of producing an Abstract Syntax Tree for a braced initializer list of several tens of thousands of integral constants with numeric values at 255 or less.
The request for some form of
or similar dates back quite a long time, with one of the oldest stack overflow questions asked-and-answered about it dating back nearly 10 years. Predating even that is a plethora of mailing list posts and forum posts asking how to get script code and other things that are not likely to change into the binary.
This paper proposes
to make this process much more efficient, portable, and streamlined.
4. Scope and Impact
is an extension to the language proposed entirely as a library construct. The goal is to have it implemented with compiler intrinsics, builtins, or other suitable mechanisms. It does not affect the language. The proposed header to expose this functionality is
, making the feature entirely-opt-in by checking if either the proposed feature test macro or header exists.
5. Design Decisions
avoids using the preprocessor or defining new string literal syntax like its predecessors, preferring the use of a free function in the
namespace. This gives
a greater degree of power and advantage over
's design is derived heavily from community feedback plus the rejection of the prior art up to this point, as well as the community needs demonstrated by existing practice and their pit falls.
5.1. Implementation Experience & Current Practice
Here, we examine current practice, their benefits, and their pitfalls. There are a few cross-platform (and not-so-cross-platform) paths for getting data into an executable. We also scrutinize the performance, with numbers for both memory overhead and speed overhead available at the repository that houses the current implementation. For ease of access, the numbers as of January 2020 with the latest versions of the indicated compilers and tools are replicated below.
All three major implementations were explored, plus an early implementation of this functionality in GCC. A competing implementation in a separate C++-like meta language called Circle was also looked at by the behest of Study Group 7.
5.1.1. Speed Results
Below are timing results for a file of random bytes using a specific strategy. The file is of the size specified at the top of the column. Files are kept the same between strategies and tests.
-
Intel Core i7-6700HQ @ 2.60 GHz
-
24.0 GB RAM 2952 MHz
-
Debian Sid or Windows 10
-
Method: Gather timings from
*nix program ortime
PowerShell, compute meanMeasure - Command { ... }
Strategy | 4 bytes | 40 bytes | 400 bytes | 4 kilobytes |
---|---|---|---|---|
GCC
| 0.201 s | 0.208 s | 0.207 s | 0.218 s |
GCC
| 0.709 s | 0.724 s | 0.711 s | 0.715 s |
-generated GCC
| 0.225 s | 0.215 s | 0.237 s | 0.247 s |
-generated Clang
| 0.272 s | 0.275 s | 0.272 s | 0.272 s |
-generated MSVC
| 0.204 s | 0.229 s | 0.209 s | 0.232 s |
Circle @
| 0.353 s | 0.359 s | 0.361 s | 0.361 s |
Circle @
| 0.199 s | 0.208 s | 0.204 s | 0.368 s |
(linker)
| 0.501 s | 0.482 s | 0.519 s | 0.527 s |
Strategy | 40 kilobytes | 400 kilobytes | 4 megabytes | 40 megabytes |
---|---|---|---|---|
GCC
| 0.236 s | 0.231 s | 0.300 s | 1.069 s |
GCC
| 0.705 s | 0.713 s | 0.772 s | 1.135 s |
-generated GCC
| 0.406 s | 2.135 s | 23.567 s | 225.290 s |
-generated Clang
| 0.366 s | 1.063 s | 8.309 s | 83.250 s |
-generated MSVC
| 0.552 s | 3.806 s | 52.397 s | Out of Memory |
Circle @
| 0.353 s | 0.363 s | 0.421 s | 0.585 s |
Circle @
| 0.238 s | 0.199 s | 0.219 s | 0.368 s |
(linker)
| 0.500 s | 0.497 s | 0.555 s | 2.183 s |
Strategy | 400 megabytes | 1 gigabyte |
---|---|---|
GCC
| 9.803 s | 26.383 s |
GCC
| 4.170 s | 11.887 s |
-generated GCC
| Out of Memory | Out of Memory |
-generated Clang
| Out of Memory | Out of Memory |
-generated MSVC
| Out of Memory | Out of Memory |
Circle @
| 2.655 s | 6.023 s |
Circle @
| 1.886 s | 4.762 s |
(linker)
| 22.654 s | 58.204 s |
5.1.2. Memory Size Results
Below is the peak memory usage (heap usage) for a file of random bytes using a specific strategy. The file is of the size specified at the top of the column. Files are kept the same between strategies and tests.
-
Intel Core i7-6700HQ @ 2.60 GHz
-
24.0 GB RAM 2952 MHz
-
Debian Sid or Windows 10
-
Method:
or Execute command hundreds of times, stare at Task Manager/ usr / bin / time - v
Strategy | 4 bytes | 40 bytes | 400 bytes | 4 kilobytes |
---|---|---|---|---|
GCC
| 17.26 MB | 17.26 MB | 17.26 MB | 17.27 MB |
GCC
| 38.82 MB | 38.77 MB | 38.80 MB | 38.80 MB |
-generated GCC
| 17.26 MB | 17.26 MB | 17.26 MB | 17.27 MB |
-generated Clang
| 35.12 MB | 35.22 MB | 35.31 MB | 35.88 MB |
-generated MSVC
| < 30.00 MB | < 30.00 MB | < 33.00 MB | < 38.00 MB |
Circle @
| 53.56 MB | 53.60 MB | 53.53 MB | 53.88 MB |
Circle @
| 33.35 MB | 33.34 MB | 33.34 MB | 33.35 MB |
(linker)
| 17.32 MB | 17.31 MB | 17.31 MB | 17.31 MB |
Strategy | 40 kilobytes | 400 kilobytes | 4 megabytes | 40 megabytes |
---|---|---|---|---|
GCC
| 17.26 MB | 17.96 MB | 53.42 MB | 341.72 MB |
GCC
| 38.80 MB | 40.10 MB | 59.06 MB | 208.52 MB |
-generated GCC
| 24.85 MB | 134.34 MB | 1,347.00 MB | 12,622.00 MB |
-generated Clang
| 41.83 MB | 103.76 MB | 718.00 MB | 7,116.00 MB |
-generated MSVC
| ~48.60 MB | ~477.30 MB | ~5,280.00 MB | Out of Memory |
Circle @
| 53.69 MB | 54.73 MB | 65.88 MB | 176.44 MB |
Circle @
| 33.34 MB | 33.34 MB | 39.41 MB | 113.12 MB |
(linker)
| 17.31 MB | 17.31 MB | 17.31 MB | 57.13 MB |
Strategy | 400 megabytes | 1 gigabyte |
---|---|---|
GCC
| 3,995.34 MB | 9,795.31 MB |
GCC
| 1,494.66 MB | 5,279.37 MB |
-generated GCC
| Out of Memory | Out of Memory |
-generated Clang
| Out of Memory | Out of Memory |
-generated MSVC
| Out of Memory | Out of Memory |
Circle @
| 1,282.34 MB | 3,199.28 MB |
Circle @
| 850.40 MB | 2,128.36 MB |
(linker)
| 425.77 MB | 1,064.74 MB |
5.1.3. Results Analysis
The above clearly demonstrates the superiority of
over latest optimized trunk builds of various compilers. It is also notable that originally the Circle language did not have an @
keyword, but it was added in December 2019. When the compiler author was spoken to about Study Group 7’s aspirations for a more generic way of representing data from a file, the ultimate response was this:
I’ll add a new @embed keyword that takes a type and a file path and loads the file and embeds it into an array prvalue of that type. This will cut out the interpreter and it’ll run at max speed. Feed back like this is good. This is super low-hanging fruit.
It was Circle’s conclusion that a generic API was unsuitable and suffered from the same performance pitfalls that currently plagued current-generation compilers today. And it was SG7’s insistence that a more generic API would be suitable, modeled on Circle’s principles. Given that thorough exploration of the design space in Circle led to the same conclusion this proposal is making, and given the wide variety of languages providing a similar interface (D, Nim, Rust, etc.), it is clear that a more generic API is not desirable for functionality as fundamental and simple as this. This does not preclude a more generic solution being created, but it does prioritize the "Bird in the Hand" approach that the Direction Group and Bjarne Stroustrup have advocated for many times.
Furthermore, inspecting compiler bug reports around this subject area reveal that this is not the first time GCC has suffered monumental memory blowup over unoptimized representation of data. In fact, this is a 16+ year old problem that GCC has been struggling with for a long time now (C++ version here). That the above numbers is nearing the best that can be afforded by some of the most passionate volunteers and experts curating an extremely large codebase should be testament to how hard the language is this area for compiler developers, and how painful it is for regular developers using their tools.
Clang, while having a better data representation and more optimized structures at its disposal, is similarly constrained. With significant implementation work, they are deeply constrained in what they can do:
It might be possible to introduce some sort of optimized representation specifically for initializer lists. But it would be a big departure from existing AST handling. And it wouldn’t really open up new use cases, given that string literal handling is already reasonably efficient.
Is this really the best use of compiler developer energy?
To provide a backdrop against which a big departure from current AST handling in can be compared, an implementation of the built-in necessary for this proposal is -- for an experienced developer -- at most a few day’s work in either GCC or Clang. Other compiler engineers have reported similar ease of implementation and integration. Should this really be delegated to Quality of Implementation that will be need to be solved N times over by every implementation in their own particularly special way? Chipping away at what is essentially a fundamental inefficiency required by C++'s inescapable tokenization model from the preprocessor plus the sheer cost of an ever-growing language that makes simple constructs like a brace initializer list of integer constants expensive is, in this paper’s demonstrated opinion, incredibly unwise.
5.1.4. Manual Work
Many developers also hand-wrap their files in (raw) string literals, or similar to massage their data -- binary or not -- into a conforming representation that can be parsed at source code:
-
Have a file
with some data, for example:data . json
{ "Hello" : "World!" }
-
Mangle that file with raw string literals, and save it as
:raw_include_data . h
R" json({ "Hello": "World!" } )json"
-
Include it into a variable, optionally made
, and use it in the program:constexpr
#include <iostream>#include <string_view>int main () { constexpr std :: string_view json_view = #include "raw_include_data.h"; // { "Hello": "World!" } std :: cout << json_view << std :: endl ; return 0 ; }
This happens often in the case of people who have not yet taken the "add a build step" mantra to heart. The biggest problem is that the above C++-ready source file is no longer valid in as its original representation, meaning the file as-is cannot be passed to any validation tools, schema checkers, or otherwise. This hurts the portability and interop story of C++ with other tools and languages.
Furthermore, if the string literal is too big vendors such as VC++ will hard error the build (example from Nonius, benchmarking framework).
5.1.5. Processing Tools
Other developers use pre-processors for data that can’t be easily hacked into a C++ source-code appropriate state (e.g., binary). The most popular one is
, which outputs an array in a file which developers then include. This is problematic because it turns binary data in C++ source. In many cases, this results in a larger file due to having to restructure the data to fit grammar requirements. It also results in needing an extra build step, which throws any programmer immediately at the mercy of build tools and project management. An example and further analysis can be found in the § 8.1.1 Pre-Processing Tools Alternative and the § 8.1.2 python Alternative section.
5.1.6. ld
, resource files, and other vendor-specific link-time tools
Resource files and other "link time" or post-processing measures have one benefit over the previous method: they are fast to perform in terms of compilation time. A example can be seen in the § 8.1.3 ld Alternative section.
5.1.7. The incbin
tool
There is a tool called [incbin] which is a 3rd party attempt at pulling files in at "assembly time". Its approach is incredibly similar to
, with the caveat that files must be shipped with their binary. It unfortunately falls prey to the same problems of cross-platform woes when dealing with VC++, requiring additional pre-processing to work out in full.
5.2. Prior Art
There has been a lot of discussion over the years in many arenas, from Stack Overflow to mailing lists to meetings with the Committee itself. The latest advancements that had been brought to WG21’s attention was p0373r0 - File String Literals. It proposed the syntax
and
, with a few other amenities, to load files at compilation time. The following is an analysis of the previous proposal.
5.2.1. Literal-Based, constexpr
A user could reasonably assign (or want to assign) the resulting array to a
variable as its expected to be handled like most other string literals. This allowed some degree of compile-time reflection. It is entirely helpful that such file contents be assigned to constexpr: e.g., string literals of JSON being loaded at compile time to be parsed by Ben Deane and Jason Turner in their CppCon 2017 talk, constexpr All The Things.
5.2.2. Literal-Based, Null Terminated (?)
It is unclear whether the resulting array of characters or bytes was to be null terminated. The usage and expression imply that it will be, due to its string-like appearance. However, is adding an additional null terminator fitting for desired usage? From the existing tools and practice (e.g.,
or linking a data-dumped object file), the answer is no: but the syntax
makes the answer seem like a "yes". This is confusing: either the user should be given an explicit choice or the feature should be entirely opt-in.
5.2.3. Encoding
Because the proposal used a string literal, several questions came up as to the actual encoding of the returned information. The author gave both
and
to separate binary versus string-based arrays of returns. Not only did this conflate issues with expectations in the previous section, it also became a heavily contested discussion on both the mailing list group discussion of the original proposal and in the paper itself. This is likely one of the biggest pitfalls between separating "binary" data from "string" data: imbuing an object with string-like properties at translation time provide for all the same hairy questions around source/execution character set and the contents of a literal.
5.3. Design Goals
Because of the aforementioned reasons, it seems more prudent to take a "compiler intrinsic"/"magic function" approach. The function overload takes the form:
template < typename T = byte > consteval span < const T > embed ( u8string_view resource_identifier ); template < size_t N , typename T = byte > consteval span < const T , N > embed ( u8string_view resource_identifier ); template < typename T = byte > consteval span < const T > embed ( u8string_view resource_identifier , size_t limit ); template < size_t N , typename T = byte > consteval span < const T , N > embed ( u8string_view resource_identifier , size_t limit );
is a
processed in an implementation-defined manner to find and pull resources into C++ at constexpr time.
is the maximum number of elements the function call can produce (but it may produce less). The most obvious source will be the file system, with the intention of having this evaluated as a core constant expression. We do not attempt to restrict the
to a specific subset: whatever the implementation accepts (typically expected to be a relative or absolute file path, but can be other identification scheme), the implementation should use. The
template parameter is for specifying that the returned span has _at least_
elements (to fit in the statically-sized
).
5.3.1. Implementation Defined
Calls such as
,
, and
are meant to be evaluated in a
context (with "core constant expressions" only), where the behavior is implementation-defined. The function has unspecified behavior when evaluated in a non-constexpr context (with the expectation that the implementation will provide a failing diagnostic in these cases). This is similar to how include paths work, albeit
interacts with the programmer through the preprocessor.
There is precedent for specifying library features that are implemented only through compile-time compiler intrinsics (
,
, and similar utilities). Core -- for other proposals such as p0466r1 - Layout-compatibility and Pointer-interconvertibility Traits -- indicated their preference in using a
magic function implemented by intrinsic in the standard library over some form of
construct. However, it is important to note that [p0466r1] proposes type traits, where as this has entirely different functionality, and so its reception and opinions may be different.
Finally, we use "implementation defined" so that compilers can produce implementation-defined search path for their translation units and modules during compilation with flags. The current implementation uses
to indicate this, but when it is standardized there will probably be an
or
flag instead. It is up to the implementation to pick what works for their platform.
5.3.2. Binary Only
Creating two separate forms or options for loading data that is meant to be a "string" always fuels controversy and debate about what the resulting contents should be. The problem is sidestepped entirely by demanding that the resource loaded by
represents the bytes exactly as they come from the resource. This prevents encoding confusion, conversion issues, and other pitfalls related to trying to match the user’s idea of "string" data or non-binary formats. Data is received exactly as it is from the resource as defined by the implementation, whether it is a supposed text file or otherwise.
and
behave exactly the same concerning their treatment of the resource.
5.3.3. Constexpr Compatibility
The entire implementation must be usable in a
context. It is not just for the purposes of processing the data at compile time, but because it matches existing implementations that store strings and huge array literals into a variable via
. These variables can be
: to not have a constexpr implementation is to leave many of the programmers who utilize this behavior without a proper standardized tool.
5.3.4. Dependency-Scanning Friendly with #depend
One of the biggest hurdles to generating consensus was the deep-seated issues with dependency scanning. The model with only dealing with
as a way of adding extra code or data-as-code means that all dependencies -- at the time of compiler invocation -- can be completely and statically known by the end of Phase 4 of compilation. This greatly aided in speedy dependency pulling.
throws a wrench in that model as it can affect the generation of code at Phase 7 time, which is when
is evaluated. By taking a
, it makes it impossible to know all files which may be used by a translation or module unit.
For this purpose, a new
preprocessor directive was introduced. It is intended to inform the compiler which translation units it depends on in order to make it simpler to retrieve all necessary dependencies. An advanced implementation can also replace all
directives with magical compiler builtins which instruct the compiler to cache the file’s data as part of the translation unit.
This makes it possible to send minimal compiler reproductions and test cases for bug vetting, as well as allow distributed systems to continue to use the
or
flags (or similar preprocessor flags). For example, the GCC and Clang implementations linked above have
and
methods that allow single translation units to be self-sufficient, while the compiler built-ins
will search the internally populated data cache from these entries in order to furnish data. This allows a complete system that keeps current tools working exactly as expected without any issues at all.
The
directive works by allowing a user to specify a single-dependency or a family-dependency. They are analogous to a single resource name or a "glob" resource name. These would, in file system terms, name a single file or all the files in a directory (subject to pattern matching). There used to also be a "recursive-family-dependency" for specifying searching for files in a directory, recursively. They looked like this:
// single-dependency directives #depend <config/graph.bin> #depend <foo.txt> // family-dependencies // do not "recurse" into directories #depend "art /*" #depend "art/mocks/*.json" // recursive-family-dependency // recurse through directories and similar #depend "assets/**" // mixed: all resources starting with // "translation/", with all files that end in ".po", // that have at least one "/" (one directory) // after the "translation/", found recursively #depend "translation/**/ *.po"
Due to Windows being a pile of garbage for
-style directory iteration, it was voted to remove
as the recursive-family-dependency (see § 2 Relevant Polls). Benchmarks for different styles of iteration may prove helpful here to convince others to maybe not do this.
uses both the chevron-delimited
and the quote-delimited
formats. This is because one is for resources that are found only on the system / implementation paths similar to
and
, and then one that uses local look up plus the aforementioned resource paths. This can be useful for e.g. files that get embedded from system SDKs, like icons from Windows or Android build-time graphical resources or similar.
5.3.5. Modules
Modules front-load and bring front-and-center one of the largest problems not with
, but with
.
is a Compilation Phase 4 entity: that is, it does not exist beyond the preprocessor. In the world of
, preprocessor commands were resolved that would not exist in the traditional
world, or at least would not have to need immediate answers. In particular, the fact that modules can "save" Phase 4 information before proceeding with the rest of compilation opens interesting questions for Module Interface Units (AKA, the public-facing portion of modules). Module implementation units and other module-related bits are, thankfully, entirely unaffected.
The problem is illustrated most powerfully by the following snippet:
:
export module m0 ; #depend "header.bin" import < embed > ; import combine ; export consteval auto f0 ( std :: string_view f ) { const auto h = std :: embed ( "header.bin" ); const auto t = std :: embed ( f ); return combine ( h , t ); }
:
export module m1 ; #depend "default.stuff" import m0 ; export consteval auto f1 ( std :: string_view f = "default.stuff" ) { return f0 ( f ); } export consteval auto getpath () { return "default.stuff" ; }
:
import m1 ; import ; import < embed > #depend "coolstuff.bin" int main () { ( f1 ()); // [0] OK, fine ( f1 ( "coolstuff.bin" )); // [1] OK, good std :: embed ( "header.bin" ); // [2] Should this work? std :: embed ( getpath ()); // [3] What about this? }
In the line marked
, everything is fine because the eventual call to
originates from
's module interface unit, which can find
and has an explicit dependency on
. This is straight forward. Line
is fine as well, because there is an explicit
which makes the
resource available from the current translation unit, which works with the
call. The real problem arises, however, when we get to lines
and lines
.
The way modules and macros work right now means macros do not go in (save from special implementation places, such as the command line/response file) and do not come out (unless explicitly exported). What, then, should happen to
and its preprocessor directives? Are the dependencies identified by
supposed to cascade through into translation units / modules that import them? This goes against the idea that modules can successfully isolate the preprocessor and suppress its stateful effects on downstream code.
Because the Committee voted to create a strong correlation between
-- a Phase 7, regular C++ entity -- and
-- a Phase 4, preprocessor entity -- what used to be a compiler hint that the compiler can treat in any manner it so pleases and could instead gently warn on must now provide semantic meaning. Saying that lines
and
should work means acknowledging that
is a stateful preprocessor macro, which is something we have been trying to get rid of and reduce the usage of since the push for C++11. It also opens up the door to these questions in general regarding the usage, and requires compilers to have to answer these questions. If the answer becomes "yes", then programmers are back where we started several years ago with
files and spending many people-months of time advocating for "
what you use". But even then, the answer to that question is even harder than before: because
is a "real C++ entity" that exists outside of the preprocessor, things like default arguments and call location begin to start mattering a lot more than it should.
There are 4 ways to go about fixing this problem for
:
-
Keep
as a requirement for finding files. Ban usage of the feature in module interface units. This makes it impossible to leak outside of implementation units or translation units because it is a#depend
entity.consteval -
Keep
as a requirement for finding files. Answer "no" to lines#depend
and[ 2 ]
above. This means that code that imports modules can no longer take an accidental dependency on things previous modules declared as their dependencies. It becomes less convenient if a path is provided by a library that needs to be used by[ 3 ]
, because then the user may have to duplicatestd :: embed
information.#depend -
Keep
as a requirement for finding files. Answer "yes" to lines#depend
and[ 2 ]
above. This means the[ 3 ]
directive becomes entirely stateful and someone may accidentally rely on a dependency declaration in a module interface unit they use, and those module interface units an break downstream code by changing their#depend
dependencies.#depend -
Remove the normative teeth of
for finding files. Do not let#depend
leak out of modules. This follows the current module rules and does not create accidental library-user dependencies, while not impacting the usability of users who specify both#depend
flags and such appropriately.- resource - path =
#0 is way too big of a hammer to swing at this problem and drastically reduces the friendliness of the feature. It also taints the usability of both modules and
, which is nowhere near a desirable outcome.
#1 seems like a decent solution, but still requires an excessive amount of client-side verbosity for individuals who write wrapper functions around
. If a user relies on a function outside of their code to retrieve the path at
-time, then they would need to duplicate whatever dependency-finding code the client has written as the user. Choosing this option makes the case for the string table / virtual file system approach in § 6.2 String Table / Virtual File System more useful, since the fixed set of strings can be presented to the user in some fashion. (That is, if we conveniently ignore that recursive dependency specifications in any form still make this untenable.)
#2 introduces a heinous problem where users can be broken by library developers who rely on things local to their library but other developers "happen" to catch a hold of, making things miserable for both users and library developers. But, it does create a world in which simple things logically work even if a
call is hidden inside of another
function.
#3 makes it so a library and a user can have code that does not force a user to properly give all dependencies to outside build tools in this corner case in particular (at least, portably: warnings, warnings as errors, etc.). It does not require a strong coupling between the Phase 4 entity and the Phase 7 entity, eliminating the need for this question at all (and making it a matter of "can I, the compiler, find the file, or not").
This paper chooses #3. It takes the normative teeth out of
and leaves it to an implementation to warn when files are not explicitly depended upon. This means that a Phase 4 entity no longer has a tight coupling with a Phase 7 entity, keeping things separate. The modules space beyond Phase 4 is no longer complicated. Note that recompilation due to a change in any of the dependencies under this model still works without creating breaking problems in the program of a library user. If a used library’s module interface unit has a
and a user changes the file that the module interface unit depends on, it triggers a recompile for that module interface unit. By triggering that recompilation, _any down-stream importer that depends on that module_ now must recompile. Now, how the compiler handles the
remains up to its implementation but it still has full reign to get that work done.
5.3.6. Statically Polymorphic
While returning
is valuable, it is impossible to
or
certain things at compile time. This makes it impossible in a
context to retrieve the actual data from a resource without tremendous boilerplate and work that every developer will have to do. While compile-time serialization is an important facet of this feature, there are many types where developers will end up just doing their own
-style
of the bits to the target bits: to save on boilerplate, we instead offer the ability to do
, where
can be anything that satisfies trivial destructibility (e.g.,
is true
). This allows all C types and many types which are mirrored almost exactly in binary form to be pulled effortlessly into code.
5.3.7. Optional Limit
Consider some file-based resources that are otherwise un-sizeable and un-seek/tellable in various implementations such as
. Telling the compiler to go fetch data from this resource infinitely can result in compiler lockups or worse: therefore, the user can specify an additional parameter to the function call such as
. The
here is a
parameter in
, and allows users to ask for up to but not more than
elements in the returned
.
Note that as per § 5.3.6 Statically Polymorphic, the limit is specified in terms of
s, not bytes. This means
bits are required, and the implementation is mandated to require up to but not more than that many.
Additionally, a user can provide a template argument
of type
to return a
. This requires at _least_
elements, but maybe more can exist. One can use both of these to request "exactly this many" elements, e.g. a call
requires up to but not more than 32 elements (the
parameter passed to the function) and that the returned span has at least 32 elements (the template parameter passed between the
and the
). Similarly, one could use
. This fully covers the design space for what making a subset of the data is currently used for (e.g., prefixed data in a common model format that then has "back references" to the data contained in the first
elements).
5.3.8. UTF-8 Only
This is related to a serious problem for string literals, particularly those of Translation Phase 7. When a user types a string literal such as
, that string literal -- at Phase 5 of compilation -- gets translated from an idealized internal compiler character set to the "Execution Source Character Set". What this means is that the compiler is allowed to perform an implementation-defined translation from the string literal’s ideal compiler representation to the execution character set (often, the presumed target execution character set). While this is not a problem for Clang -- which always uses UTF-8 -- and GCC -- which always uses UTF-8 unless someone specifically passes
-- other compilers will take the string
and mangle it. This mangling does not have to come with warnings: in fact, MSVC will often times replace characters it cannot translate to the execution character set with either Mojibake or
.
The solution that Study Group 16 recommended was to allow
and ONLY
as the parameter type. Others in SG-16 voiced concern that this would hamper general usability, but
string literals and
were put in the standard for reasons exactly such as this. The execution character set is an unknown and often lossy encoding on legacy systems: requiring UTF-8 with
matches most internal compiler representations of idealized text storage and provides a lossless way to work with the resource system.
We note that it would be "maximally nice" to provide all of
overloads to the table which would allow individuals working in the
space their choice of presumed encoding to work in, but deem that the additional flexibility is likely not worth the additional overloads (even if those overloads are cheap to implement).
6. Previous Implementations
This section is primarily to address feedback from polls wherein different forms and implementation strategies were asked for by the Evolution Working Group and other implementers. A tour of the design and implementation these cases helps show what has been considered.
6.1. #depend
Soft Warnings, Hard Errors?
The current specification makes it a hard error if a file has not been identified previously by a
directive. This makes the simple case of including a single file a bit more annoying than it needs to be, but also makes the case of general development with a non-distributed build system a pain. To be perfectly transparent, the author of this paper and almost 100% of the author’s clients do not use distributed anything to build their code: errors at "file not found" are generally useful enough. Making the lack of matching
a hard error seems like pushing an important but nevertheless Committee-over-represented concern (i.e. distributed build tool users) onto all C++ users.
While the author feels a lot better about it being a soft warning that can be turned into a hard error by use of
for distributed build users, we leave the specification to strongly encourage implementations to hard error on an inability to not only find the resource but match it with a previous
declaration.
6.2. String Table / Virtual File System
This implementation idea was floated twice, once during SG-7 discussion at the November 2019 Belfast meeting and again during the February 2020 Prague meeting. The crux of this version of
is that it does not take resource identifiers related to e.g. the file system directly: a directive is used to build a "String Table" or "Translation Table" from system-specific paths to "friendly" names:
// map "foo.txt" to file name "foo" #depend_name "foo.txt" "foo" #ifdef _WIN32 // map Windows-specific resource // "win/bazunga.bin" to file name "baz" #depend_name "win/bazunga.bin" "baz" #else // map Unix-specific resource // "nix/bazinga.bin" to file name "baz" #depend_name "nix/bazooka.bin" "baz" #endif #include <embed>int main () { // pulls foo.txt constexpr std :: span < std :: byte > foo = std :: embed ( "foo" ); // pulls either bazunga or bazooka constexpr std :: span < std :: byte > baz = std :: embed ( "baz" ); return foo [ 0 ] == 'f' && baz [ 2 ] == '\x3' ; }
On the tin, this seems to bring nice properties to the table. We get to "erase" platform-specific paths and give them common names, we have a static list of names that we always pull from, and more. However, there are several approaches to this problem. Consider one of the primary use cases for
as a
function: reading a resource and, based on its contents,
-ing other resources.
This becomes a problem: if
has a
line of text in it, we read that line from
, and then attempt to
, we get an error because we did not give
a string table name. This means we need to go back and not only mark it as a dependency with
, but also give it a static string-table based name.
Even conquering that problem, we face another: resource files -- JSON, SPIR-V, Wavefront OBJ, HLSL/GLSL, python, Lua, HSAIL, whatever -- do not speak "C++ String Table" to name their identifiers. Generally, these speak "File System": we are adding a level of indirection that makes it _impossible_ to keep working with the file system, especially when it comes to working with external resources. Most tools communicate and traffic interchange information VIA URIs or relative / local file paths. Forcing users to either adapt their code to play nice with the C++ file system or maintaining a translation table to and from "C++ String Table" to "C++ File System" is an order of magnitude additional complexity that is not both unnecessary and also painful.
There is absolutely room for a (potentially
) Virtual File System in C++. This is not the feature that should bring us there. As the current implementation does, manipulating
similar to the way
is handled alongside
flags is a much better use of not only implementer time, but user time. It fits perfectly into the mental model ("include, but for resources") and lets users keep their file names as the unit of interchange as far as names go.
C++ does not need to make for itself a reputation of trying to be an extremely unique snowflake at the cost of usability and user friendliness.
7. Changes to the Standard
Wording changes are relative to [n4842].
7.1. Intent
The intent of the wording is to provide a function that:
-
handles the provided resource identifying
in an implementation-defined manner;string_view -
and, returns the specified constexpr
representing either the bytes of the resource or the bytes view as the typespan
.T
The wording also explicitly disallows the usage of the function outside of a core constant expression by marking it
, meaning it is ill-formed if it is attempted to be used at not-constexpr time (
calls should not show up as a function in the final executable or in generated code). The program may pin the data returned by
through the span into the executable if it is used outside a core constant expression.
7.2. Proposed Feature Test Macro
The proposed feature test macros are
for the library and
for the preprocessor functionality.
7.3. Proposed Wording
Append to §14.8.1 Predefined macro names [cpp.predefined]'s Table 16 with one additional entry:
Macro name Value __cpp_pp_depend 202006L
Add a new section §15.4 Dependency [cpp.depend]:
15.4 Dependency [cpp.depend]1 A
directive establishes inputs or family of inputs upon which a translation unit depends.
#depend 2 A preprocessing directive of the form
#
depend h-char-sequence
< new-line
> or
#
depend q-char-sequence
" new-line
" provides a dependency name. If an implementation does not find meaning in the quote-delimited q-char-sequence, it may reprocess this directive and treat it as a
#depend h-char-sequence
< new-line directive using the same q-char-sequence, including any
> or
< .
> 3 The q-char-sequence or h-char-sequence may have one of 3 meanings, depending on the use of
or
* within the sequence.
**
- – If the sequence contains a
it denotes a dependency-family.
* - – Otherwise, it denotes a single-dependency.
4 [ Example—
— end Example ].#depend "art.png" // this translation unit depends on 'art.png' #depend <config /*.json> // this translation unit depends on all resources // the implementation can find that // end in ".json" and start with "config/". #depend <data/*/ *.bin> // this translation unit depends on all resources // the implementation can find that // end in ".bin", start with "data/" // and contain a single "/" in-between. 5 Each of the dependency-family and single-dependency shall have an implementation-defined meaning which establishes search information for implementation-defined resources (e.g., for 19.20 [const.res]).
Append to §16.3.1 General [support.limits.general]'s Table 35 one additional entry:
Macro name Value __cpp_lib_embed 202006L
Append to §19.1 General [utilities.general]'s Table 38 one additional entry:
Subclause Header(s) 19.20 Constant Resources <embed>
Add a new section §19.20 Constant Resources [const.res]:
19.20 Constant Resources [const.res]19.20.1 In general [const.res.general]
Constant resources allow the implementation to retrieve data from a variety of sources -- including implementation-defined places -- and allows their processing during constant evaluation.
19.20.2 Header
synopsis [embed.syn]
< embed > namespace std { template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier ) noexcept ; template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t limit ) noexcept ; template <:: std :: size_t N , typename T = byte > consteval span < const T , N > embed ( string_view resource_identifier ) noexcept ; template <:: std :: size_t N , typename T = byte > consteval span < const T , N > embed ( string_view resource_identifier , size_t limit ) noexcept ; } 19.20.3 Function template
[const.embed]
embed namespace std { template < typename T = byte > consteval span < const T > embed ( u8string_view resource_identifier ) noexcept ; template < typename T = byte > consteval span < const T > embed ( u8string_view resource_identifier , size_t limit ) noexcept ; template < size_t N , typename T = byte > consteval span < const T , N > embed ( u8string_view resource_identifier ) noexcept ; template < size_t N , typename T = byte > consteval span < const T , N > embed ( u8string_view resource_identifier , size_t limit ) noexcept ; } 1 Mandates: the implementation-defined bit size of the resource is a multiple of
and
sizeof ( T ) * CHAR_BIT is
std :: is_trivial_v < T > true
. If theparameter is specified for the second or fourth overload, then then implementation-defined bit size of the resource must be less than or equal
limit and must be a multiple of
limit * sizeof ( T ) * CHAR_BIT . For the third and fourth overload, the implementation-defined bit size of the resource must be at least
sizeof ( T ) * CHAR_BIT .
sizeof ( T ) * CHAR_BIT * N 3 [ Note—
being true provides that types with non-trivial destructors do not need to be run for the implementation-provided static storage duration objects. — end Note ].
std :: is_trivial_v < T > 4 If the implementation cannot find the resource specified after exhausting the sequence of implementation-defined search locations, or if the implementation finds the resource specified but that same resource was not named by a previous
directive in some manner, then the program is ill-formed.
#depend 5 Returns: A read-only view to a unique resource identified by the
over a contiguous sequence of
resource_identifier objects with static storage duration. The mapping from the contents of the resource to the contiguous sequence of
T objects is implementation-defined.
T 6 Ensures: For the second overload, let
denote the result of the function call.
r is true.
r . size () <= limit 7 Remarks: The value of
is used to search a sequence of implementation-defined places for a resource identified uniquely by
resource_identifier . The mapping of the resource to the sequence of
resource_identifier is implementation-defined. [ Note— Implementations should provide a mechanism similar but not identical to
T (15.3 [cpp.include]) for finding the specified resource and in coordination with
#include (15.4 [cpp.depend]). — end Note ]
#depend
8. Appendix
8.1. Alternative
Other techniques used include pre-processing data, link-time based tooling, and assembly-time runtime loading. They are detailed below, for a complete picture of today’s sad landscape of options.
8.1.1. Pre-Processing Tools Alternative
-
Run the tool over the data (
) to obtain the generated file (xxd - i xxd_data . bin > xxd_data . h
):xxd_data . h
unsigned char xxd_data_bin [] = { 0x48 , 0x65 , 0x6c , 0x6c , 0x6f , 0x2c , 0x20 , 0x57 , 0x6f , 0x72 , 0x6c , 0x64 , 0x0a }; unsigned int xxd_data_bin_len = 13 ;
-
Compile
:main . cpp
#include <iostream>#include <string_view>// prefix as constexpr, // even if it generates some warnings in g++/clang++ constexpr #include "xxd_data.h"; template < typename T , std :: size_t N > constexpr std :: size_t array_size ( const T ( & )[ N ]) { return N ; } int main () { static_assert ( xxd_data_bin [ 0 ] == 'H' ); static_assert ( array_size ( xxd_data_bin ) == 13 ); std :: string_view data_view ( reinterpret_cast < const char *> ( xxd_data_bin ), array_size ( xxd_data_bin )); std :: cout << data_view << std :: endl ; // Hello, World! return 0 ; }
Others still use python or other small scripting languages as part of their build process, outputting data in the exact C++ format that they require.
There are problems with the
or similar tool-based approach. Lexing and Parsing data-as-source-code adds an enormous overhead to actually reading and making that data available.
Binary data as C(++) arrays provide the overhead of having to comma-delimit every single byte present, it also requires that the compiler verify every entry in that array is a valid literal or entry according to the C++ language.
This scales poorly with larger files, and build times suffer for any non-trivial binary file, especially when it scales into Megabytes in size (e.g., firmware and similar).
8.1.2. python
Alternative
Other companies are forced to create their own ad-hoc tools to embed data and files into their C++ code. MongoDB uses a custom python script, just to get their data into C++:
import os import sys def jsToHeader ( target , source ): outFile = target h = [ '#include "mongo/base/string_data.h"' , '#include "mongo/scripting/engine.h"' , 'namespace mongo {' , 'namespace JSFiles{' , ] def lineToChars ( s ): return ',' . join ( str( ord( c )) for c in ( s . rstrip () + ' \n ' )) + ',' for s in source : filename = str( s ) objname = os . path . split ( filename )[ 1 ] . split ( '.' )[ 0 ] stringname = '_jscode_raw_' + objname h . append ( 'constexpr char ' + stringname + "[] = {" ) with open( filename , 'r' ) as f : for line in f : h . append ( lineToChars ( line )) h . append ( "0};" ) # symbols aren’t exported w/o this h . append ( 'extern const JSFile %s ;' % objname ) h . append ( 'const JSFile %s = { " %s ", StringData( %s , sizeof( %s ) - 1) };' % ( objname , filename . replace ( ' \\ ' , '/' ), stringname , stringname )) h . append ( "} // namespace JSFiles" ) h . append ( "} // namespace mongo" ) h . append ( "" ) text = ' \n ' . join ( h ) with open( outFile , 'wb' ) as out : try : out . write ( text ) finally : out . close () if __name__== "__main__" : if len( sys . argv ) < 3 : "Must specify [target] [source] " sys . exit ( 1 ) jsToHeader ( sys . argv [ 1 ], sys . argv [ 2 :])
MongoDB were brave enough to share their code with me and make public the things they have to do: other companies have shared many similar concerns, but do not have the same bravery. We thank MongoDB for sharing.
8.1.3. ld
Alternative
A full, compilable example (except on Visual C++):
-
Have a file ld_data.bin with the contents
.Hello , World ! -
Run
.ld - r binary - o ld_data . o ld_data . bin -
Compile the following
withmain . cpp
:c ++ - std = c ++ 17 ld_data . o main . cpp
#include <iostream>#include <string_view>#ifdef __APPLE__ #include <mach-o/getsect.h>#define DECLARE_LD(NAME) extern const unsigned char _section$__DATA__##NAME[]; #define LD_NAME(NAME) _section$__DATA__##NAME #define LD_SIZE(NAME) (getsectbyname("__DATA", "__" #NAME)->size) #elif (defined __MINGW32__) /* mingw */ #define DECLARE_LD(NAME) \ extern const unsigned char binary_##NAME##_start[]; \ extern const unsigned char binary_##NAME##_end[]; #define LD_NAME(NAME) binary_##NAME##_start #define LD_SIZE(NAME) ((binary_##NAME##_end) - (binary_##NAME##_start)) #else /* gnu/linux ld */ #define DECLARE_LD(NAME) \ extern const unsigned char _binary_##NAME##_start[]; \ extern const unsigned char _binary_##NAME##_end[]; #define LD_NAME(NAME) _binary_##NAME##_start #define LD_SIZE(NAME) ((_binary_##NAME##_end) - (_binary_##NAME##_start)) #endif DECLARE_LD ( ld_data_bin ); int main () { // impossible //static_assert(xxd_data_bin[0] == 'H'); std :: string_view data_view ( reinterpret_cast < const char *> ( LD_NAME ( ld_data_bin )), LD_SIZE ( ld_data_bin ) ); std :: cout << data_view << std :: endl ; // Hello, World! return 0 ; }
This scales a little bit better in terms of raw compilation time but is shockingly OS, vendor and platform specific in ways that novice developers would not be able to handle fully. The macros are required to erase differences, lest subtle differences in name will destroy one’s ability to use these macros effectively. We omitted the code for handling VC++ resource files because it is excessively verbose than what is present here.
N.B.: Because these declarations are
, the values in the array cannot be accessed at compilation/translation-time.
9. Acknowledgements
A big thank you to Andrew Tomazos for replying to the author’s e-mails about the prior art. Thank you to Arthur O’Dwyer for providing the author with incredible insight into the Committee’s previous process for how they interpreted the Prior Art.
A special thank you to Agustín Bergé for encouraging the author to talk to the creator of the Prior Art and getting started on this. Thank you to Tom Honermann for direction and insight on how to write a paper and apply for a proposal.
Thank you to Arvid Gerstmann for helping the author understand and use the link-time tools.
Thank you to Tony Van Eerd for valuable advice in improving the main text of this paper.
Thank you to Lilly (Cpplang Slack, @lillypad) for the valuable bikeshed and hole-poking in original designs, alongside Ben Craig who very thoroughly explained his woes when trying to embed large firmware images into a C++ program for deployment into production. Thank you to Elias Kounen and Gabriel Ravier for wording review.
For all this hard work, it is the author’s hope to carry this into C++. It would be the author’s distinct honor to make development cycles easier and better with the programming language we work in and love. ♥