Document #: | P3402R0 |
Date: | 2024-09-17 |
Project: | Programming Language C++ |
Audience: |
SG23 |
Reply-to: |
Marc-André Laverdière, Black Duck Software <marc-andre.laverdiere@blackduck.com> Christopher Lapkowski, Black Duck Software <redacted@blackduck.com> Charles-Henri Gros, Black Duck Software <redacted@blackduck.com> |
We propose an attribute that specifies that every data member that belongs to a verified class has all its data members initialized to determinate values, assuming that the data used for construction is itself initialized properly. Thus, when the assumptions are validated, none of its data members can cause undefined or erroneous behavior.
This safety profile restricts what kind of code a constructor can have. Existing code bases are likely to violate these constraints, and thus this feature is an opt-in.
There is a growing push towards memory-safe languages. While C++ is
not memory-safe, it is desirable to specify and opt-in mechanism
allowing a subset of C++ features that would result in memory
safe-languages. This has been termed ‘profiles’([P3274R0]), and would be specified at the
TU level using an attribute. In this paper, we propose [[Profiles::enable(initialization)]]
.
All classes under the purview of the profile attribute will have the guarantee that all its data members are properly initialized once the object is constructed, assuming that the data used for construction is itself initialized properly. In the case of a class that inherits from one or more classes, all its base classes must be compliant with this profile.
Example:
struct [[Profiles::enable(initialization)]] parent1 {
int i;
() = default; //non-compliant, i is default initialized
parent1};
struct [[Profiles::enable(initialization)]] child1 : public parent {
int j;
() : parent1(), j(42) {} //child is compliant, but parent isn't
child1}
//Not a verified class, but would be compliant if it were
struct parent2 {
int i = 0;
() = default;
parent2};
struct [[Profiles::enable(initialization)]] child2 : public parent2 {
int j;
() : parent2(), j(42) {} //child is not compliant, because parent2 is not a verified class
child}
A verified class is a class that is affected by the profile annotation, or a scalar type.
An object parameter is either
this
or an
explicit object parameter.
A verified data member is a data member that is not exempted from verification.
In this paper, we give examples with the profile annotation attached to specific classes. This deviates from [P3274R0], which suggested annotations at the translation unit level. We do so to make it clear which classes are verified classes and which ones aren’t, since some examples have a mix of them. Our proposal is orthogonal to the location of the annotation.
Some data members are exempt from verification, either due to intrinsic properties, or due to explicit opt-out from the developer.
The presence of static data members ([class.static.data]) in a class are allowed in this profile, but the profile offers minimal guarantees.
Static data members have either static storage duration ([basic.stc.static]) or thread storage duration ([basic.stc.thread]). They are guaranteed to be initialized with constant initialization ([basic.start.static]). However, they can be reassigned during dynamic initialization ([basic.start.dynamic]).
Dynamic initialization can lead to subtle bugs, such as:
All static data members belonging to verified classes that are initialized solely using constant initialization are exempted from verification. All other static data members are non-compliant with this profile.
int randomInt() {
int therandomint;
return therandomint;
}
struct [[Profiles::enable(initialization)]] WithStaticUninit1 {
() = default; //Not a POD
WithStaticUninit1static int thestatic;
};
int WithStaticUninit1::thestatic = randomInt(); //non-compliant
struct [[Profiles::enable(initialization)]] GetsCorrupted {
() : thefield(0) {} //compliant
GetsCorruptedint thefield;
};
struct [[Profiles::enable(initialization)]] Wrapper {
() = default; //Not a POD
Wrapperstatic GetsCorrupted wrapped;
};
() {
GetsCorrupted corruptingFactory{}; //All initialized, good
GetsCorrupted ret.thefield = randomInt(); //Now, some uninitialized memory snuck in
retreturn ret;
}
::wrapped = corruptingFactory(); //Non-compliant GetsCorrupted Wrapper
[P3274R0] mentions that performance critical applications won’t initialize output buffers at first and mentions a few possibilities: “suppression, an uninitialized annotation, and/or by specific uninitialized types.”
This profile allows a pointer to be initialized to any value, whether
that’s a
nullptr
, the
return value of a
new
, or even
an hardcoded address. It does not require that the memory space pointed
to by the pointer is set to any value. This is an allowance for systems
programming, which sometimes have buffers pointing to hardcoded
addresses, which are used for interacting with devices. We leave the
question of pointer safety to [[Profiles::enable(Pointers)]]
.
Developers who require to exempt specific data members from
verification may use the [[indeterminate]]
annotation from [P2795R5].
A better solution is to use a specialized type offers restricted
access to memory regions. We envision a class named std::RawBuffer<T>
,
which would record which regions of the buffer have been written
previously, and prohibit reads outside of that region. This class would
have use beyond this profile, making it a more generic solution.
All constructors of a verified class must satisfy the following properties:
A non-static data member is considered read whenever it is present in the function, except when:
This definition implies the following:
buf(new char[BUF_SIZE]))
)
and be compliant. We discuss this above.We restrict function calls after initialization because we want to keep the analysis intraprocedural. Initialization that occurs in a member function, or occurs from a function’s return value, would require either interprocedural analysis. Keeping the analysis intraprocedural would faciliate adoption.
Examples:
struct [[Profiles::enable(initialization)]] clazz1 {
int i;
int j;
int z = 0;
() {
clazz1= 123;
i if (nondet) {
= 456;
j }
//non-compliant: j is not defined on all paths
}
};
struct [[Profiles::enable(initialization)]] clazz2 {
int i;
int j;
() : i(j), j(42) {} //non-compliant: j is read before it is initialized
clazz2};
struct [[Profiles::enable(initialization)]] clazz3 {
int i;
int j;
() { //compliant, but bad form
clazz3this->i = 0;
this->j = 42;
}
};
struct pod {
int i;
int j;
};
struct [[Profiles::enable(initialization)]] clazz4 {
pod p;() = default; //non-compliant, p is default initialized
clazz4};
struct [[Profiles::enable(initialization)]] clazz5 {
{};
pod p() = default; //compliant, p is value-initialized
clazz5}
struct [[Profiles::enable(initialization)]] clazz6 {
() {
pod podFactory//p is default-initialized
pod p; return p;
}
() : p(podFactory()) {} ; //non-compliant, p initialized from a function's return value
clazz6}
struct [[Profiles::enable(initialization)]] clazz7 {
int i;
int j;
(int i) : i(i), j() {}; //compliant, j is value-initialized
clazz7};
In the case of templated classes, the property is verified during template instantiation.
class NotAnnotated{/**/};
class [[Profiles::enable(initialization)]] Annotated {/**/};
template<typename T>
class [[Profiles::enable(initialization)]] AnnotatedTemplate {
= T();
T field };
void foo() {
<NotAnnotated> nat {}; //non-compliant, calling the constructor to a non-verified class
AnnotatedTemplate<Annotated> at {}; //compliant
AnnotatedTemplate}
A recent SG23 ML discussion highlighted that it is idiomatic in C++ to delegate initialization to a non-constructor method. Supporting this idiom would make this profile more useful.
The bit of code that triggered the discussion is the following:
(const _CharT* __s, const _Alloc& __a = _Alloc())
basic_string: _M_dataplus(_M_local_data(), __a)
{
//...
(__s, __end, forward_iterator_tag());
_M_construct}
In this case, _M_local_data()
returns a const pointer to a data member
(_M_local_buf
) and passes it to the
_M_dataplus
data member. The
initialization then is done by
_M_construct
. This code would be
reported as violating the safety profile as we specify it in this draft,
since the constructor does not initialize
_M_dataplus
directly.
There are a few solutions to this problem:
[[must_init]]
.
This option is less intrusive than option 1, but adding another
annotation is undesirable. It nonetheless would be simpler to verify
than option 2, simply because the scope of the analysis becomes
well-bounded. As such, it is worth considering.It would be a good idea to have a straw poll about which of these options the community considers best suited.
The proposed attribute could be misunderstood to mean that all variables in all the code in the scope of the annotation are properly initialized. This may be addressed in a future version of this profile, or in another profile.
We also observe that profiles imply a constraint on what types can be used in a template. This hints at a new concept. A future revision of this paper would explore this further.
In this paper, we propose a safety profile that guarantees that any class affected by the profile annotation will have all its data members initialized, assuming that the data used for construction is itself initialized properly. The profile does not depend on the presence of specific modern C++ features and can thus be applied to legacy code bases.