Document #: | P3402R1 |
Date: | 2024-10-11 |
Project: | Programming Language C++ |
Audience: |
SG23 |
Reply-to: |
Marc-André Laverdière, Black Duck Software <marc-andre.laverdiere@blackduck.com> Christopher Lapkowski, Black Duck Software <redacted@blackduck.com> Charles-Henri Gros, Black Duck Software <redacted@blackduck.com> |
We propose an attribute that specifies that every data member that belongs to a verified class has all its data members initialized to determinate values, assuming that the data used for construction is itself initialized properly.
This safety profile restricts what kind of code a constructor can have. Existing code bases are likely to violate these constraints, and thus this feature is an opt-in.
There is a growing push towards greater memory safety and memory-safe
languages. While C++ is not memory-safe, it is desirable to specify and
opt-in mechanism allowing a subset of C++ features that would result in
memory safe programs. This has been termed ‘profiles’([P3274R0]), and would be specified at the
TU level using an attribute. In this paper, we propose [[Profiles::enable(initialization)]]
.
Industry compliance standards, such as CERT C++ [CERT], forbid access of unitialized memory (Rule EXP53-CPP). While they imply complete initialization, they do not specify how a good constructor would achieve that objective.
However, the automobile safety industry desires fully initialized class objects. As part of the The MISRA C++ standard [MISRA], there are two rules that specifically advise proper initialization of class objects.
MISRA C++2023 Rule 15.1.2 “All constructors of class should explicitly initialize all of its virtual base classes and immediate base classes”
MISRA C++2023 Rule 15.1.4 “All direct, non-static data members of a class should be initialized before the class object is accessable”
All classes under the purview of the profile attribute will have the guarantee that all its data members are properly initialized once the object is constructed, assuming that the data used for construction is itself initialized properly. In the case of a class that inherits from one or more classes, all its base classes must be compliant with this profile.
In this paper, we give examples with the profile attribute attached to specific classes. We do so to make it clear which classes are verified classes and which ones aren’t, since some examples have a mix of them.
Example:
struct [[Profiles::enable(initialization)]] parent1 {
int i;
() = default; //non-compliant, i is default initialized
parent1};
struct [[Profiles::enable(initialization)]] child1 : public parent {
int j;
() : parent1(), j(42) {} //child is compliant, but parent isn't
child1}
//Not a verified class, but would be compliant if it were
struct parent2 {
int i = 0;
() = default;
parent2};
struct [[Profiles::enable(initialization)]] child2 : public parent2 {
int j;
() : parent2(), j(42) {} //child is not compliant, because parent2 is not a verified class
child}
A verified class is a class that is affected by the profile attribute or a POD.
An object parameter is either the
this
pointer
or an explicit object parameter ([dcl.fct]).
A verified data member is a data member that is not exempted from verification.
Acceptable inputs are:
The non-exempt transitive closure of X means the set of symbols that are reachable from X using the dot and arrow operators, and which are not exempt from verification.
Some data members are exempt from verification, either due to intrinsic properties, or due to explicit opt-out from the developer.
[P3274R0] mentions that performance critical applications won’t initialize output buffers at first and mentions a few possibilities: “suppression, an uninitialized attribute, and/or by specific uninitialized types.”
This profile allows a pointer to be initialized to any value, whether
that’s a
nullptr
, the
return value of a
new
, or even
a hardcoded address. It does not require that the memory space pointed
to by the pointer is set to any value. This is an allowance for systems
programming, which sometimes have buffers pointing to hardcoded
addresses, which are used for interacting with devices. We therefore
exempt dynamically allocated memory from initialization in this profile.
Note that this profile prohibits spreading this uninitialized memory to
verified non-static data members through requirement [only.acceptable.in]
A cleaner solution for uninitialized memory for the purpose of
buffers, would be to use a specialized type. We envision a class named
std::RawBuffer<T>
,
which would record which regions of the buffer have been written
previously, and prohibit reads outside of that region. This class would
have a use beyond this profile, making it a more generic solution.
We leave the general question of pointer safety to [[Profiles::enable(Pointers)]]
.
Developers could exempt specific data members from verification using
the [[indeterminate]]
attribute from [P2795R5].
struct [[Profiles::enable(initialization)]] HighPerformance {
::byte* buf [[indeterminate]];
stdint sz = -1;
void fill(/*...*/);
};
The static data members ([class.static.data]) in a verified class are
allowed in this profile, but, as long they are initialized solely using
constant or zero initialization. This includes
constexpr
static data members.
Static data members have either static storage duration ([basic.stc.static]) or thread storage duration ([basic.stc.thread]). They are guaranteed to be initialized with constant initialization ([basic.start.static]). However, they can be reassigned during dynamic initialization ([basic.start.dynamic]).
Dynamic initialization can lead to subtle bugs, such as:
We illustrate how uninitialized memory can affect static data members with dynamic initialization below.
int randomInt() {
int therandomint;
return therandomint;
}
struct [[Profiles::enable(initialization)]] WithStaticUninit1 {
() = default; //Not a POD
WithStaticUninit1static int thestatic;
};
int WithStaticUninit1::thestatic = randomInt(); //non-compliant
struct [[Profiles::enable(initialization)]] GetsCorrupted {
() : thefield(0) {} //compliant
GetsCorruptedint thefield;
};
struct [[Profiles::enable(initialization)]] Wrapper {
() = default; //Not a POD
Wrapperstatic GetsCorrupted wrapped;
};
() {
GetsCorrupted corruptingFactory{}; //All initialized, good
GetsCorrupted ret.thefield = randomInt(); //Now, some uninitialized memory snuck in
retreturn ret;
}
::wrapped = corruptingFactory(); //Non-compliant GetsCorrupted Wrapper
All constructors of a verified class must satisfy the following properties:
A data member is considered read whenever it is present in the function, except when:
This definition implies the following:
buf(new char[BUF_SIZE]))
)
and be compliant. We discuss this above.operator*
,
operator->
and operator->*
are not allowed.*v
,
&v
,
v[]
) to
acceptable inputs is not allowed.We restrict function calls because we want to keep the analysis intraprocedural. Initialization that occurs in a member function, or occurs from a function’s return value, or verifying that arguments are not tampered with, would require interprocedural analysis. Keeping the analysis intraprocedural would facilitate adoption.
Examples:
struct [[Profiles::enable(initialization)]] clazz1 {
int i;
int j;
int z = 0;
() {
clazz1= 123;
i if (nondet) {
= 456;
j }
//non-compliant: [init.all.paths]
}
};
struct [[Profiles::enable(initialization)]] clazz2 {
int i;
int j;
() : i(j), j(42) {} //non-compliant: [init.before.read]
clazz2};
struct [[Profiles::enable(initialization)]] clazz3 {
int i;
int j;
() { //compliant, but bad form
clazz3this->i = 0;
this->j = 42;
}
};
struct pod {
int i;
int j;
};
struct [[Profiles::enable(initialization)]] clazz4 {
pod p;() = default; //non-compliant: non-initializing default initialization [init.all.paths]
clazz4};
struct [[Profiles::enable(initialization)]] clazz5 {
{};
pod p() = default; //compliant, p is value-initialized
clazz5}
struct [[Profiles::enable(initialization)]] clazz6 {
() {
pod podFactory// non-initializing default initialization
pod p; return p;
}
() : p(podFactory()) {} //non-compliant: [only.acceptable.in]
clazz6}
struct [[Profiles::enable(initialization)]] clazz7 {
int i;
int j;
(int i) : i(i), j() {} //compliant, j is value-initialized
clazz7};
struct [[Profiles::enable(initialization)]] clazz8 {
int i;
int j;
() const;
utility_function(int i) : i(i), j() {
clazz8(); //non-compliant: [no.args]
utility_function}
};
struct [[Profiles::enable(initialization)]] clazz9 {
int i;
int j;
void mutating(int&) const;
(int i) : i(i), j() {
clazz9(j); //non-compliant: [no.args]
mutating}
};
struct [[Profiles::enable(initialization)]] clazz10 {
int i;
int j;
void mutating();
(int i) : i(i), j() {
clazz10(); //non-compliant: [no.args]
mutating}
};
struct [[Profiles::enable(initialization)]] clazz11 {
int i;
int j;
void mutating() const {
int uninit;
const_cast<clazz11*>(this)->j = uninit;
}
(int i) : i(i), j() {
clazz11(); //non-compliant: [no.args]
mutating}
};
struct [[Profiles::enable(initialization)]] clazz12 {
::byte* buf [[indeterminate]];
stdsize_t buf_size;
int i;
() : i(std::to_integer<int>(buf[0])) {} //non-compliant: [only.acceptable.in]
clazz12};
struct [[Profiles::enable(initialization)]] clazz13 {
static unsigned num_allocations;
() {
clazz13++num_allocations; //compliant: [only.acceptable.in]
}
};
unsigned clazz13::num_allocations = 0;
Please note that this profile will mark some correct code as non-compliant. This is unavoidable.
struct [[Profiles::enable(initialization)]] clazz14 {
::byte* buf [[indeterminate]];
stdsize_t buf_size;
int i;
(size_t sz) : buf_size(sz) {
clazz14if (sz > 0) {
= new std::byte[sz];
buf ::fill(buf, buf + sz, std::byte{0});
std= std::to_integer<int>(buf[0]); //safe, but non-compliant: [only.acceptable.in]
i } else {
= nullptr;
buf = -1;
i }
}
};
In the case of templated classes, the property is verified during template instantiation.
class NotAnnotated{/**/};
class [[Profiles::enable(initialization)]] Annotated {/**/};
template<typename T>
class [[Profiles::enable(initialization)]] AnnotatedTemplate {
= T();
T field };
void foo() {
<NotAnnotated> nat {}; //non-compliant, calling the constructor to a non-verified class
AnnotatedTemplate<Annotated> at {}; //compliant
AnnotatedTemplate}
A recent SG23 mailing list discussion highlighted that delegating initialization to a non-constructor member function is idiomatic in C++. Supporting this idiom would make this profile more useful.
The bit of code that triggered the discussion is the following:
(const _CharT* __s, const _Alloc& __a = _Alloc())
basic_string: _M_dataplus(_M_local_data(), __a)
{
//...
(__s, __end, forward_iterator_tag());
_M_construct}
In this case, _M_local_data()
returns a const pointer to a data member
(_M_local_buf
) and passes it to the
_M_dataplus
data member. The
initialization then is done by
_M_construct
. This code would be
reported as violating the safety profile as we specify it in this draft,
since the constructor does not initialize
_M_dataplus
directly.
There are a few solutions to this problem:
[[must_init]]
.struct DelegatingInit {
int member;
() {
DelegatingInit(&member);
internal_init}
([[must_init]] int* p);
internal_init}
This option is less intrusive than option 1, and would be simpler to verify than option 2, simply because the scope of the analysis becomes well-bounded. As such, it is worth considering.
Nonetheless, we consider it undesirable for the following reasons:
initialize()
member function.[[must_init]]
attribute can delegate further to other [[must_init]]
functions, this can lead to recursion.const&
and const*
ParametersWhile C++ allows to strip
const
-ness
through casting, this practice is uncommon. It would be preferable to
allow function calls in constructors pass acceptable inputs by
const
reference or
const
pointer. However, we need a way to ensure that the callees are do not
use shenanigans allowing them to modify the state of acceptable
inputs.
There are a few solutions to this problem:
[[const_is_const]]
)
at the function declaration that indicates to the analyzer that a
variant of criterion [effective.const] must be verified for the
function.const
-ness,
unless they have the novel [[unsafe]]
attribute.Regarding solution 1, there are edge cases with virtual member functions and we wouldn’t allow verified classes’ constructors to call non-member functions.
Regarding solution 2, we don’t like adding additional attributes, which may hamper adoption. It does however keep the analysis simple.
Regarding solution 3, it widens the scope of the profile. While that is a good thing, it may be challenging to communicate exactly what the scope is to developers.
Regarding solution 4, it is the most obvious solution, but is restricted to cases where the call target can be determined accurately. It would also increase the complexity of the implementation and, in turn, risk to hamper adoption.
Many restrictions stem from the fact that default initialization sometimes mean that no initialization is performed ([dcl.init]). Prohibiting its use in functions called from verified classes’ constructors could be beneficial. However, it expands the scope of the profile.
This draft materially deviates from [P3274R0] in the following ways:
The proposed attribute could be misunderstood to mean that all variables in all the code in the scope of the attribute are properly initialized. This may be addressed in a future version of this profile, or in another profile.
We also observe that profiles imply a constraint on what types can be used in a template. This hints at a new concept. A future revision of this paper would explore this further.
In this paper, we propose a safety profile that guarantees that any class affected by the profile attribute will have all its data members initialized, assuming that the data used for construction is itself initialized properly. The profile does not depend on the presence of specific modern C++ features and can thus be applied to legacy code bases.
Q1: Should an attribute be used to exempt data members from initialization?
Q2: Should specific types be used to exempt data members from initialization?
Q3: Should this profile handle initialization outside of constructors?
Q4: If this profile were to handle initialization outside of constructors, should the profile rely on an attribute on parameters that indicates what the function is responsible for initializing?
Q5: If this profile were to allow verified inputs to be passed by
const&
or const*
,
how would it ensure that they are no tampered with?
Q6: For a given scope of applicability (e.g. translation unit), should this profile prohibit the use of default initialization altogether? Or prohibit the use of default initialization leading to no initialization?
Q7: Given that this profile does not consider initialization at
large, should we rename it to
class_initialization
?