2023-12-10
document number | date | comment |
---|---|---|
n3185 | 202312 | this paper, original proposal |
CC BY, see https://creativecommons.org/licenses/by/4.0
Dynamic initialization of global data is often a difficult task in C and C++. Up to recently C had no mandatory tool that would handle this. C++ had constructor calls for static data that would not easily respect a dependency order between different translation units.
There is one particular compiler extension that is meant to deal with these problems namely the [[gnu::constructor]]
attribute that is widely implemented in the field. In its general form (but which has issues with some compilers) it allows to add a numerical priority as a parameter, and thus different TU would be initialized according to their priority. This feature is difficult to handle in larger projects, because dependencies are not made explicit and because priorities have to be assigned and reassigned to TU (much as line numbers in BASIC
) as a project grows.
Also, this feature has the possible disadvantage of initializing unconditionally, even for program parts that might eventually not be used. In contrast to that, the C standard has tools that must be triggered explicitly and thus may avoid expensive initialization for unused parts of a program or library.
In C23 we now have three mandatory functions that handle initialization and cleanup, namely call_once
, atexit
and at_quick_exit
. The basic level of this proposal uses these exclusively to create a feature that solves user triggered initialization, associated cleanup and initialization dependencies à la C in a simplistic and unexciting way.
A second level then builds upon the first and adds unconditional initialization. this second level would need implementation specific tools, such as the above mentioned vendor attribute or such as a C++ constructor for static data.
We present two levels of specification. A first that provides initialization that is triggered by a userspace macro and that only uses C23 standard features under the hood. Then a second level provides unconditional mandatory initialization for TUs that request it. Only that second level needs compiler magic for its implementation.
Note also that this proposal does not handle thread specific initialization and cleanup. For these the C standard foresees the tss_t
type and tss_create
etc functions. Wrapping these in more convenient interfaces could be subject to a different proposal.
The basic feature provides four macro interfaces which we hope are easy to comprehend and only generate minimal overhead.
ONCE_DEFINE
Any invocation of this definition macro should only be compiled once, so it would typically be used in a .c
file. Its syntax is a macro invocation followed by a compound statement that forms the body of an internal initialization function:
ONCE_DEFINE
(
identifier)
compound-statement
It defines one function with a signature
This function will later be called under hood similar to once_call
in places that must ensure that a global initialization of the feature has taken place.
The generated name should be unique and not conflict with any other user space name or other once-feature that has been defined elsewhere, as long as the used identifier is unique within a project.
An example could look as follows:
This registers the depending code to be executed much as a function would be called when using once_call
. But note that the user here does not have to specify a once_flag
, nor do they have to invent a naming convention that ties such a flag and the function together.
ONCE_DEPEND
This macro hides a function call to the function that was defined by ONCE_DEFINE
somewhere in the program, not necessarily in the same TU. It has to appear in block scope at a place where several declarations and statements can be placed. Typically it would appear at the beginning of functions that depend on the proper initialization of the feature:
In particular it can be used to express dependencies between different TU when placed into the initialization code of another once feature:
Now whenever a user uses ONCE_DEPEND
(tracker)
in their code, the initialization of logger
is launched as well. In particular logger
is initialized before the rest of the initializer of tracker
is executed, and so that code can already rely upon logger
and e.g use logfile
.
If the initialization code itself needs interfaces from another TU, it is important that this dependency is marked inside the code of ONCE_DEFINE
as shown above; thereby it is guaranteed that the two initialization codes are chained in the correct order, regardless of the circumstances in which initialization is triggered.
ONCE_ATEXIT
The use of this macro is optional but must be located in the same TU as the corresponding ONCE_DEFINE
. It provides a way to specify cleanup code that is executed as if by an atexit
handler. The syntax is similar to the definition syntax, a macro invocation followed by a compound statement that makes up the body of an internal function:
ONCE_ATEXIT
(
identifier)
compound-statement
Thus the following example
executes a call to fclose
at any regular program termination.
The order in which these handlers are executed is the reverse order in which the initializations have been called dynamically. So in our example above if the chaining was triggered by a call to ONCE_DEPEND
(tracker)
we would see the following ordering
ONCE_DEPEND(tracker)
→ ONCE_DEPEND(logger)
→ // initialization code of logger
// initialization code of tracker
…
exit -> // atexit code of tracker
// atexit code of logger
This order is robust, even if ONCE_DEPEND
(logger)
is called first in some other part of the executable that is independent of tracker
.
ONCE_AT_QUICK_EXIT
The use of this macro is optional and works analogous to ONCE_ATEXIT
:
ONCE_AT_QUICK_EXIT
(
identifier)
compound-statement
only that the depending block forms the body of a function that is handed to at_quick_exit
instead of atexit
.
Such an initialization that would not necessarily be triggered explicitly needs additional support that goes beyond C23, for example the mentioned GNU attribute. We propose that in addition to the above two supplementary macros are provided.
ONCE_DEFINE_STRONG
This macro is similar to ONCE_DEFINE
but guarantees unconditional initialization, if the platform supports such a thing. If this variant is used it is important to maintain dependencies between TU by means of the ONCE_DEPEND
macro. then still guarantees that the initialization code is called in the right order: whichever initialization code is called first by the system at startup, a marked dependency will trigger the other TU before executing the remainder.
ONCE_DEPEND_WEAK
When using strong initialization, marking dependencies in code outside initialization is actually not necessary. To address this possible optimization a second dependency macro can be used. In contexts that support unconditional initialization it basically does nothing. Otherwise, it falls back to the full dependency macro ONCE_DEPEND
.
The following reference implementation of the first level of macros is header only, simple, efficient and in essence fits on one page. A trade-of between some form of efficiency and the number of visible external names is chosen. We don’t think that efficiency is really of high importance here. The “critical” part would be ONCE_DEPEND
, which in this implementation results in two nested function calls. But if it turns out to be critical for user code, this could be reduced to just one function call (by playing some inline
games) or just one atomic exchange (a bit more involved and needs more system support such as futex).
First, the tools that we need are already regrouped in a single header <stdlib.h>
so we propose that we also target that one for the additions.
#include <stdlib.h>
#if __STDC_VERSION_STDLIB_H__ < 202311L
#include <threads.h>
#endif
#define ONCE_NAME(NAME) NAME ## _init_generated_once
#define ONCE_NAME_USER(NAME) NAME ## _user_generated_once
#define ONCE_NAME_INTERNAL(NAME) NAME ## _internal_generated_once
#define ONCE_NAME_FLAG(NAME) NAME ## _flag_generated_once
#define ONCE_NAME_ATEXIT_INTERNAL(NAME) NAME ## _atexit_internal_generated_once
#define ONCE_NAME_ATEXIT(NAME) NAME ## _atexit_generated_once
#define ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME) NAME ## _at_quick_exit_internal_generated_once
#define ONCE_NAME_AT_QUICK_EXIT(NAME) NAME ## _at_quick_exit_generated_once
Note that call_once
is only in <stdlib.h>
since C23. Before it only was in the optional header <threads.h>
which we include as a fallback. It should easily be possible to define other fallbacks, for example by using POSIX threads.
Note also that for convenience we also use macros that implement the internal naming convention that is used here. These could easily adapted as needed.
ONCE_DEPEND
has no surprises
ONCE_DEFINE
is slightly more complicated. In addition to the function with external linkage that we have declared it defines several symbols with internal linkage.
#define ONCE_DEFINE(NAME) \
/* Forward declarations. */ \
static void ONCE_NAME_USER(NAME)(void); \
static void (*const ONCE_NAME_ATEXIT_INTERNAL(NAME))(void); \
static void (*const ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME))(void); \
/* This function is used with call_once */ \
static void ONCE_NAME_INTERNAL(NAME)(void) { \
ONCE_NAME_USER(NAME)(); \
if (ONCE_NAME_ATEXIT_INTERNAL(NAME)) { \
atexit(ONCE_NAME_ATEXIT_INTERNAL(NAME)); \
} \
if (ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME)) { \
at_quick_exit(ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME)); \
} \
} \
/* This is the function called by ONCE_DEPEND */ \
void ONCE_NAME(NAME)(void) { \
/* The once flag is hidden inside */ \
static once_flag ONCE_NAME_FLAG(NAME) = ONCE_FLAG_INIT; \
call_once(&ONCE_NAME_FLAG(NAME), ONCE_NAME_INTERNAL(NAME)); \
} \
/* This has the user code for initialization */ \
static void ONCE_NAME_USER(NAME)(void)
The single point of entry ONCE_NAME
(NAME)
ensures that the linkage namespace is not polluted with more than one symbol and the once_flag
and called user functions are glued together without possibility of bypass.
The fact that ONCE_NAME_ATEXIT_INTERNAL(
NAME
)
is a static function pointer variable comes into play if a ONCE_ATEXIT
definition is provided by the user.
#define ONCE_ATEXIT(NAME) \
static void ONCE_NAME_ATEXIT(NAME)(void); \
static void (*const ONCE_NAME_ATEXIT_INTERNAL(NAME))(void) \
= ONCE_NAME_ATEXIT(NAME); \
static void ONCE_NAME_ATEXIT(NAME)(void)
This now defines and initializes the ONCE_NAME_ATEXIT_INTERNAL(
NAME
)
variable with a pointer to a static function that holds the user code for cleanup. Above the pointer was only passed as an argument to atexit
if it is non-null. Since it is const
qualified and static
any decent compiler should be able optimize that code efficiently:
atexit
call should be optimized out.atexit
should be called unconditionally.The same mechanism is used to define and register code that would be provided for at_quick_exit
.
#define ONCE_AT_QUICK_EXIT(NAME) \
static void ONCE_NAME_AT_QUICK_EXIT(NAME)(void); \
static void (*const ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME))(void) \
= ONCE_NAME_AT_QUICK_EXIT(NAME); \
static void ONCE_NAME_AT_QUICK_EXIT(NAME)(void)
In the implementation on which this proposal is based upon we already have added a marking of the compiled TU by means of [[maybe_unused]]
static strings. It allows to extract an initialization dependency graph from the generated executable.
More generally, implementations in a compiler itself (not via macros) could detect initialization loops and stop translation if any are found.
7.24.4.9 Initialization, cleanup and dependency between translation units
Synopsis
#include <stdlib.h>
ONCE_DEFINE
(
identifier)
compound-statement
ONCE_DEPEND
(
identifier)
;
ONCE_ATEXIT
(
identifier)
compound-statement
ONCE_AT_QUICK_EXIT
(
identifier)
compound-statement
ONCE_DEFINE_STRONG
(
identifier)
compound-statement
ONCE_DEPEND_WEAK
(
identifier)
;
Description
The macros in this clause provide means of executing the compound statements either at program startup or at program termination just as if called or registered with the
call_once
,atexit
orat_quick_exit
library functions. These calls can be triggered in an application controlled way by using the macros for dependencies. In particular, with these applications are able to mark dependencies in initialization between different translation units.
Each identifier that is used with
ONCE_DEFINE
orONCE_DEFINE_STRONG
identifies a specific initialization group. An invocation ofONCE_DEPEND(
ID
)
within the compound statement of an invocationONCE_DEFINE(
JE
)
orONCE_DEFINE_STRONG(
JE
)
constitutes a direct initialization dependency from groupJE
to groupID
. The transitive closure of the direct initialization dependency relation shall form an acyclic directed graph.
7.24.4.9.1 Conditional initialization
7.24.4.9.1.1 The
ONCE_DEFINE
macro
The
ONCE_DEFINE
macro registers its argumentID
as a name of an initialization group that is valid within the whole program and associates the compound statement as to be executed when the initialization of the groupID
is requested.
Any invocation of this macro shall be located in file scope. For any identifier
ID
, at most one invocation of eitherONCE_DEFINE(
ID
)
orONCE_DEFINE_STRONG(
ID
)
shall be present in the whole program. The effect is the same as the definition of a function that has the compound statement as the function body, that has external linkage and that has an implementation-defined name that uses the identifierID
to create a unique reserved identifier that does not collide with any identifier specified by the application. Two invocations with different identifiers shall use different such generated names.
The group
ID
shall be initialized when and only if an invocation of the macroONCE_DEPEND(
ID
)
is executed.
7.24.4.9.1.2 The
ONCE_DEPEND
macro
Invocations of the
ONCE_DEPEND(
ID
)
macro shall be placed in block scope at a point where several declarations and statements are permitted by the syntax. An invocation ofONCE_DEPEND
shall not appear in the compound statement that is associated to an invocation ofONCE_ATEXIT
orONCE_AT_QUICK_EXIT
.
When an invocation of the
ONCE_DEPEND
macro is met during program execution it triggers the initialization of the groupID
. Similar to a call tocall_once
this initialization shall be performed at most once per program execution. For any evaluation that is sequenced after such an invocation this initialization shall have been performed to its entirety and all side effect shall be visible. After the initialization of groupID
has been completed, subsequent callsONCE_DEPEND(
ID
)
have no effect.
7.24.4.9.1.3 The
ONCE_ATEXIT
macro
The
ONCE_ATEXIT
macro associates the compound statement to be executed when theatexit
handler for the groupID
is triggered on termination of the program.
Any invocation of this macro shall be located in file scope. For any identifier
ID
, at most one invocationONCE_ATEXIT(
ID
)
shall be present in the whole program. For each invocationONCE_ATEXIT(
ID
)
there shall be an invocationONCE_DEFINE(
ID
)
orONCE_DEFINE_STRONG(
ID
)
that is situated in the same translation unit.
The effect is the same as the following.
- A function is defined that has the compound statement as the function body, that has internal linkage and that has an implementation-defined name that uses the identifier
ID
to create a unique reserved identifier that does not collide with any identifier specified by the application.- That function is registered by a call to
atexit
if and when the initialization function of the groupID
terminates, that is, after the compound statement that is registered for initialization of the group is executed.
7.24.4.9.1.4 The
ONCE_AT_QUICK_EXIT
macro
This macro is the same as
ONCE_ATEXIT
, only that the code is registered withat_quick_exit
instead ofatexit
.
7.24.4.9.2 Unconditional initialization
The following macros describe groups and dependencies for which the intent is that the initialization code is executed unconditionally at program startup. Whether or not such an unconditional initialization is supported is implementation-defined. Nevertheless these macros are mandatory.
7.24.4.9.2.1 The
ONCE_DEFINE_STRONG
macro
Similar to
ONCE_DEFINE
, theONCE_DEFINE_STRONG
macro registers its argumentID
as a name of an initialization group that is valid within the whole program and associates the compound statement that is to be executed when the initialization of the groupID
is performed.
If application code is executed that depends upon the registered initialization code for
ID
, either the implementation shall support unconditional initialization, or an invocation ofONCE_DEPEND(
ID
)
orONCE_DEPEND_WEAK(
ID
)
shall have been sequenced before. In particular, ifONCE_ATEXIT(
ID
)
orONCE_AT_QUICK_EXIT(
ID
)
are present within the same program this initialization shall be sequenced before any call toexit
orquick_exit
, respectively.
7.24.4.9.2.1 The
ONCE_DEPEND_WEAK
macro
An invocation
ONCE_DEPEND_WEAK
shall not appear in the compound statement that is associated to an invocation ofONCE_DEFINE
,ONCE_DEFINE_STRONG
,ONCE_ATEXIT
orONCE_AT_QUICK_EXIT
.
This macro is the same as
ONCE_DEPEND
, only that the implementation may remove any effect of this macro if it supports unconditional initialization.