1. Summary
Reading uninitialized values is undefined behavior which leads to well-known security issues [CWE-457], from information leaks to attacker-controlled values being used to control execution, leading to an exploit. This can occur when:
-
The compiler re-uses stack slots, and a value is used uninitialized;
-
The compiler re-uses a register, and a value is used uninitialized;
-
Stack structs / arrays / unions with padding are copied; and
-
A heap-allocated value isn’t initialized before being used.
This proposal only addresses the above issues when they occur for automatic storage duration objects (on the stack). The proposal does not tackle objects with dynamic object duration. Objects with static or thread local duration are not problematic because they are already zero-initialized if they cannot be constant initialized, even if they are non-trivial and have a constructor execute after this initialization.
static char global [ 128 ]; // already zero-initialized
thread_local int thread_state ; // already zero-initialized
int main () {
char buffer [ 128 ]; // not zero-initialized before this proposal
struct {
char type ;
int payload ;
} padded ; // padding zero-initialized with this proposal
union {
char small ;
int big ;
} lopsided = '?' ; // padding zero-initialized with this proposal
int size = getSize ();
char vla [ size ]; // in C code, VLAs are zero-initialized with this proposal
char * allocated = malloc ( 128 ); // unchanged
int * new_int = new int ; // unchanged
char * new_arr = new char [ 128 ]; // unchanged
}
This proposal therefore transforms some runtime undefined behavior into well-defined behavior.
Adopting this change would mitigate or extinguish around 10% of exploits against security-relevant codebases (see numbers from [AndroidSec] and [InitAll] for a representative sample, or [KCC] for a long list).
We propose to zero-initialize all objects of automatic storage duration, making C++ safer by default.
This was implemented as an opt-in compiler flag in 2018 for LLVM [LLVMReview] and MSVC [WinKernel], and 2021 in for GCC [GCCReview]. Since then it has been deployed to:
-
The OS of every desktop, laptop, and smartphone that you own;
-
The web browser you’re using to read this paper;
-
Many kernel extensions and userspace program in your laptop and smartphone; and
-
Likely to your favorite videogame console.
Unless of course you like esoteric devices.
The above codebases contain many millions of lines of code of C, C++, Objective-C, and Objective-C++.
The performance impact is negligible (less that 0.5% regression) to slightly positive (that is, some code gets faster by up to 1%). The code size impact is negligible (smaller than 0.5%). Compile-time regressions are negligible. Were overheads to matter for particular coding patterns, compilers would be able to obviate most of them.
The only significant performance/code regressions are when code has very large automatic storage duration objects. We provide an attribute to opt-out of zero-initialization of objects of automatic storage duration. We then expect that programmer can audit their code for this attribute, and ensure that the unsafe subset of C++ is used in a safe manner.
This change was not possible 30 years ago because optimizations simply were not as good as they are today, and the costs were too high. The costs are now negligible. We provide details in § 5 Performance.
Overall, this proposal is a targeted fix which stands on its own, but should be considered as part of a wider push for type and resource safe C++ such as in [P2687r0].
2. Opting out
In deploying this scheme, we’ve found that some codebases see unacceptable
regressions because they allocate large values on the stack and see performance
/ size regressions. We therefore propose standardizing an attribute which
denotes intent,
. When put on automatic variables, no
initialization is performed. The programmer is then responsible for what happens
because they are back to C++23 behavior. This makes C++ safe by default for this
class of bugs, and provides an escape hatch when the safety is too onerous.
An
attribute was proposed in [P0632r0].
As a quality of implementation tool, and to ease transitions, some implementations might choose to diagnose on large stack values that might benefit from this attribute. We caution implementations to avoid doing so in their front-end, but rather to do this as a post-optimization annotation to reduce noise. This was done in LLVM [AutoInitSummary].
3. Security details
What exactly is the security issues?
Currently, uninitialized stack variables are Undefined Behavior if they are used. The typical outcome is that the program reads stale stack or register values, that is whatever previous value happened to be compiled to reside in the same stack slot or register. This leads to bugs. Here are the potential outcomes:
-
Benign: in the best case the program causes an unexpected result.
-
Exploit: in the worst case it leads to an exploit. There are three outcomes that can lead to an exploit:
-
Read primitive: this exploit could be formed from reading leaked secrets that reside on the stack or in registers, and using this information to exploit another bug. For example, leaking an important address to defeat ASLR.
-
Write primitive: an uninitialized value is used to perform a write to an arbitrary address. Such a write can be transformed into an execute primitive. For example a stack slot is used on a particular execution path to determine the address of a write. The attack can control the previous stack slot’s content, and therefore control where the write occurs. Similarly, an attacker could control which value is written, but not where it is written.
-
Execute primitive: an unitialized value is used to call a function.
Here are a few examples:
int get_hw_address ( struct device * dev , struct user * usr ) {
unsigned char addr [ MAX_ADDR_LEN ]; // leak this memory 😢
if ( ! dev -> has_address )
return - EOPNOTSUPP ;
dev -> get_hw_address ( addr ); // if this doesn’t fill addr
return copy_out ( usr , addr , sizeof ( addr )); // copies all
}
Example from [LinuxInfoleak]. What can this leak? ASLR information, or anything from another process. ASLR leak can enable Return Oriented Programming (ROP) or make an arbitrary write effective (e.g. figure out where high-value data structures are). Even padding could leak secrets.
int queue_manage () {
struct async_request * backlog ; // uninitialized
if ( engine -> state == IDLE ) // usually true
backlog = get_backlog ( & engine -> queue );
if ( backlog )
backlog -> complete ( backlog , - EINPROGRESS ); // oops 🔥🔥🔥
return 0 ;
}
Example from [LinuxUse]. The attacker can control the value of backlog by making a previous function call, and ensuring the same stack slot gets reused.
Security-minded folks think this is a good idea. The Microsoft Windows security team [WinKernel] say:
Between 2017 and mid 2018, this feature would have killed 49 MSRC cases that involved uninitialized struct data leaking across a trust boundary. It would have also mitigated a number of bugs involving uninitialized struct data being used directly.To date, we are seeing noise level performance regressions caused by this change. We accomplished this by improving the compilers ability to kill redundant stores. While everything is initialized at declaration, most of these initializations can be proven redundant and eliminated.
They use pure zero initialization, and claim to have taken the overheads down to within noise. They provide more details in [CppConMSRC] and [InitAll]. Don’t just trust Microsoft’s Windows security team though, here’s one of the upstream Linux Kernel security developer asking for this [CLessDangerous], and Linus agreeing [Linus]. [LinuxExploits] is an overview of a real-world execution control exploit using an uninitialized stack variable on Linux. It’s been proposed for GCC [init-local-vars] and LLVM [LocalInit] before.
In Chrome we find 1100+ security bugs for “use of uninitialized value” [ChromeUninit].
12% of all exploitable bugs in Android are of this kind according to [AndroidSec].
Kostya Serebryany has a very long list of issues caused by usage of uninitialized stack variables [KCC].
4. Alternatives
When this is discussed, the discussion often goes to a few places.
First, people will suggest using tools such as memory sanitizer or valgrind. Yes they should, but doing so:
-
Requires compiling everything in the process with memory sanitizer.
-
Can’t deploy in production.
-
Only find that is executed in testing.
-
3× slower and memory-hungry.
The next suggestion is usually couldn’t you just test / fuzz / code-review better or just write perfect code? However:
-
We are (well, except the perfect code part).
-
It’s not sufficient, as evidenced by the exploits.
-
Attackers find what programmers don’t think of, and what fuzzers don’t hit.
The annoyed suggester then says "couldn’t you just use
and fix everything it complains about?" This is similar to the [CoreGuidelines] recommendation. You are beginning to expect shortcoming, in
this case:
-
Too much code to change.
-
Current
is low quality, has false positives and false negatives. We could get better analysis if we had a compiler-frontend-based IR, but it still wouldn’t be noise-free.- Wuninitialized -
Similarly, static analysis isn’t smart enough.
-
The value chosen to initialize doesn’t necessarily make sense, meaning that code might move from unsafe to merely incorrect.
-
Code doesn’t express intent anymore, which leads readers to assume that the value is sensible when it might have simply been there to address the diagnostic.
-
It prevents checkers (static analysis, sanitizers) from diagnosing code because it was given semantics which aren’t intended.
Couldn’t you just use definitive initialization in the language? For example as
explained in [CppConDefinite] and [CppConComplexity]. This one is interesting.
Languages such as C# and Swift allow uninitialized values, but only if the
compiler can prove that they’re not used uniitialized, otherwise they must be
initialized. It’s similar to enforcing
, depending on what is
standardized. We then have a choice: standardize a simple analysis (such as
"must be initialized within the block or before it escapes") and risk needless
initializations (with mitigations explained in the prior references), or
standardize a more complex analysis. We would likely want to deploy a
low-code-churn change, therefore a high-quality analysis. This would require
that the committee standardize the analysis, lest compilers diverge. But a
high-quality analysis tends to be complex, and change as it improves, which is
not sensible to standardize. This would leave us standardizing a high-code-churn
analysis with much more limited visibility. Further with definitive
initialization, it’s then unclear whether
lends meaning to
,
or whether it’s just there to shut that warning up.
An alternative for the
attribute is to use a keyword. We can
argue about which keyword is right. However, maybe through [P0146r1], we’ll
eventually agree that
is the right name for "This variable is not a
variable of honor... no highly esteemed deed is commemorated here... nothing
valued is here." This approach meshes well with a definitive initialization
approach because we would keep the semantics of "no value was intended on this
variable".
One alternative that was suggested instead of zero-initialization is to value-initialize. This would invoke constructors and therefore be a no-go as a significant behavior change. That said, some form of "trivial enough" initialization might be an acceptable approach to increase correctness (not merely safety) for types for which zero isn’t a valid value.
We could instead keep the out-of-band option that is already implemented in all major compilers. That out-of-band option is opt-in and keeps the status quo where "C++ is unsafe by default". Further, many compiler implementors are dissatisfied by this option because they believe it’s effectively a language fork that users of the out-of-band option will come to rely on (this strategy is mitigated in MSVC by using a non-zero pattern in debug builds according to [InitAll]). These developers would rather see the language standardize the option officially.
Rather than zero-initializing, we could use another value. For example, the author implemented a mode where the fundamental types dictate the value used:
-
Integral → repeated
0xAA -
Pointers → repeated
0xAA -
Floating-point → repeated
0xFF
This is advantageous in 64-bit platforms because all pointer uses will trap with a very recognizable value, though most modern platforms will also trap for large offsets from the null pointer. Floating-point values are NaNs with payload, which propagate on floating-point operations and are therefore easy to root-cause. It is problematic for some integer values that represent sizes in a bounds check, because such a number is effectively larger that any bounds and would potentially permit out-of-bounds accesses that a zero initializetion would not.
Unfortunately, this scheme optimizes much more poorly and end up being a roughly 1.5% size and performance regression on some workloads according to [GoogleNumbers] and others.
Using repeated bytes for initialization is critical to effective code generation because instruction sets can materialize repeated byte patterns much more efficiently than any larger value. Any approach other than the above is bound to have a worst performance impact. That is why choosing implementation-defined values or runtime random values isn’t a useful approach.
Another alternative is to guarantee that accessing uninitialized values is implementation-defined behavior, where implementations must choose to either, on a per-value basis:
-
zero-initialize; or
-
trap (we can argue separately as to whether "trap" is immediate termination, Itanium [NaT], or
or something else).std :: terminate
Implementations would then do their best to trap on uninitialized value access, but when they can’t prove that an access is uninitialized through however complex optimization scheme, they would revert to zero-initialization. This is an interesting approach which Richard Smith has implemented a prototype of in [Trap]. However, the author’s and others' experience is that such a change is rejected by teams because it cannot be deployed to codebases that seem to work just fine. The Committee might decide to go with this approach, and see whether the default in various implementations is "always zero" or "sometimes trap".
We could also standardized a hybrid approach that mixes some of the above, creating implementation-defined alternatives which are effectively different build modes that the standard allows. This is what [CarbonInit] does.
5. Performance
Previous publications [Zeroing] have cited 2.7 to 4.5% averages. This was true because, in the author’s experience implementing this in LLVM [LLVMJFB], the compiler simply didn’t perform sufficient optimizations. Deploying this change required a variety of optimizations to remove needless work and reduce code size. These optimizations were generally useful for other code, not just automatic variable initialization.
As stated in the introduction, the performance impact is now negligible (less that 0.5% regression) to slightly positive (that is, some code gets faster by up to 1%). The code size impact is negligible (smaller than 0.5%).
Why do we sometimes observe a performance progression with zero-initialization? There are a few potential reasons:
-
Zeroing a register or stack value can break dependencies, allowing more Instructions Per Cycles (IPC) in an out-of-order CPU by unblocking scheduling of instructions that were waiting on what seemed like a long critical path of instructions but were in fact only false-sharing registers. The register can be renamed to the hardware-internal zero register, and instructions scheduled.
-
Zeroing of entire cache entries is magical, though for the stack this is rarely the case. There’s questions of temporality, special instructions, and special logic to support zeroed cachelines.
Here are a few of the optimizations which have helped reduce overheads in LLVM:
-
CodeGen: use non-zero
for automatic variables D49771memset -
Merge clang’s
with LLVM’sisRepeatedBytePattern
D51751isBytewiseValue -
SelectionDAG: Reuse bigger constants in
D53181memset -
ARM64: improve non-zero
isel by ~2x D51706memset -
Double sign-extend stores not merged going to stack D54846 and D54847
-
LoopUnroll: support loops w/ exiting headers & uncond latches D61962
-
ConstantMerge: merge common initial sequences D50593
-
IPO should track pointer values which are written to, enabling DSO DSO
-
Missed dead store across basic blocks pr40527
This doesn’t cover all relevant optimizations. Some optimizations will be architecture-specific, meaning that deploying this change to new architectures will require some optimization work to reduce cost.
New codebases using automatic variable initialization might trigger missed-optimizations in compilers and experience a performance or size regression. The wide deployment of the opt-in compiler flag means that this is unlikely, but were this to happen then a bug should be filed against the compiler to fix the missed optimization.
For example, here is Android’s experience on performance costs [AndroidCosts].
Some of the performance costs are hidden by modern CPUs being heavily superscalar. Embedded CPUs are often in-order, code running on them might therefore have a performance regression that is roughly proportional to code size increases. Here too, compiler optimizations can dignificantly reduce these costs.
6. Caveats
There are a few caveats to this proposal.
Making all automatic variables explicitly zero means that developers will come to rely on it. The current status-quo forces developers to express intent, and new code might decide not to do so and simply use the zero that they know to be there. This would then make it impossible to distinguish "purposeful use of the uninitialized zero" from "accidental use of the uninitialized zero". Tools such as memory sanitizers and valgrind would therefore be unable to diagnose correctness issues (but we would have removed the security issues). It should still be best practice to only assign a value to a variable when this value is meaningful, and only use an "uninitialized" value when meaning has been give to it.
This proposal will now mean that objects with dynamic storage duration are now uniquely different from all other storage duration objects, in that they alone are uninitialized. We could explore guaranteeing initialization of these objects, but this is a much newer area of research and deployment as shown in [XNUHeap]. The strategy for automatically initializing objects of dynamic storage durations depend on a variety of factors, there’s no agreed-upon one right solution for all codebases. Each implementation approach has tradeoffs that warrant more study.
Some types might not have a valid zero value, for example an enum might not assign meaning to zero. Initializing this type to zero is then safe, but not correct. However, C++ has constructors to deal with this type of issue.
For some types, the zero value might be the "dangerous" one, for example a null
pointer might be a special sentinel value saying "has all priviledges" in a
kernel. A concrete example is on Linux where
of zero means
,
uninitialized reads of zero have been the source of CVEs on Linux before.
Some people think that reading uninitialized stack variables is a good source of randomness to seed a random number generator, but these people are wrong [Randomness].
As an implementation concern for LLVM (and maybe others), not all variables of
automatic storage duration are easy to properly zero-initialize. Variables
declared in unreachable code, and used later, aren’t currently initialized. This
includes
, Duff’s device, other objectionable uses of
. Such
things should rather be a hard-error in any serious codebase.
stack variables are weird. That’s pre-existing, it’s really the
language’s fault and this proposal keeps it weird. In LLVM, the author opted to
initialize all
stack variables to zero. This is techically acceptable
because they don’t have any special hardware or external meaning, they’re only
used to block the optimizer. It’s technically incorrect because the compiler
shouldn’t be performing extra accesses to
variables, but one could
argue that the compiler is zero-initializing the storage right before the
is constructed. This would be a pointless discussion which the author
is happy to have if the counterparty is buying.
7. Wording
Wording for this proposal will be provided once an approach is agreed upon.
The author expects that the solution that is agreed upon maintains the property that:
-
Taking the address, passing this address, and manipulating the address of data not explicitly initialized remains valid.
-
of data not explicitly initialized (including padding) remains valid, but reading from the data or thememcpy
d data is invalid.memcpy