Clarifying Memory Allocation

ISO/IEC JTC1 SC22 WG14 N1634 - 2012-09-23
ISO/IEC JTC1 SC22 WG21 N3433 = 12-0123 - 2012-09-23

Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Chandler Carruth, chandlerc@google.com

Introduction
Problem
Solution
    Memory
    Data Races
C Wording
    7.1.4 Use of library functions
    7.22.3 Memory management functions
C++ Wording
    3.7.4 Dynamic storage duration [basic.stc.dynamic]
    3.7.4.1 Allocation functions [basic.stc.dynamic.allocation]
    5.3.4 New [expr.new]
    17.6.4.6 Replacement functions [replacement.functions]
    18.6.1.4 Data races [new.delete.dataraces]
References

Introduction

The allocation and deallocation of memory has become a significant expense in modern systems. The optimization of that process is important to good performance. However, it is important to distinguish between micro-optimization of the calls and macro-optimization of the allocation strategy. In particular, good system performance may well require adapting the allocation stragegy to the dynamic behavior of the application, or even to hints provided by the application.

Problem

As strict reading of the current C and C++ standards may lead one to conclude that the allocation strategy shall not consider any information not derivable from the sequence of allocation and deallocation calls. In essence, the standards may exclude macro-optimization of allocation.

On the other hand, a strict reading of the standards may lead one to conclude that the implementation must make an external function call for each and every nominal call. This reading may exclude micro-optimization of allocation.

Solution

We propose to replace existing mechanistic wording with wording more precisely focused on essential requirements. The intent is to enable behavior that some existing memory allocators already have. For example, see TCMalloc [TCM].

Memory

An essential requirement on implementations is that they deliver usable memory, not that they have a particular sequence of calls. We propose to explicitly decouple the implementation calls from the nominal calls.

  1. The number of implementation calls is not part of the observable behavior of the program.

  2. The parameters and return values of implementation calls is not part of the observable behavior of the program, except that the sum of the size parameters of live implementation allocation calls shall not exceed the sum of the size parameters, adjusted for alignment, of live nominal allocation calls. (That is, implementations may "round up" size parameters, as they already do.)

Together these changes enable implementations to reduce the number of malloc calls by avoiding them or fusing them. However, it would only enable fusing mallocs together into larger mallocs provided it can prove that both mallocs have overlapping lifetimes (ended by corresponding calls to free) such that the peak allocated memory of the program remains unchanged.

Because C++ class-specific memory allocators are often tuned to specific class sizes, we do not apply this relaxation to those allocators.

Data Races

An essential requirement on implementations is that they be data-race free, yet the standards do not say so directly. We propose to replace the current wording with direct wording, thus explicitly enabling an implementation to consider information beyond the strict sequence of allocation and deallocation calls.

C Wording

The wording in this section is relative to WG14 N1570.

7.1.4 Use of library functions

Paragraph 4 is unchanged.

The functions in the standard library are not guaranteed to be reentrant and may modify objects with static or thread storage duration. [Footnote: Thus, a signal handler cannot, in general, call standard library functions. —end footnote]

Paragraph 5 is unchanged, though we note in passing that the wording may need improvement because it seems to fail to distinguish between atomic objects and regular objects, and to make access under mutual exclusion a normative exception. Despite that, we believe the intent of this paragraph is correct.

Unless explicitly stated otherwise in the detailed descriptions that follow, library functions shall prevent data races as follows: A library function shall not directly or indirectly access objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's arguments. A library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-const arguments. [Footnote: This means, for example, that an implementation is not permitted to use a static object for internal purposes without synchronization because it could cause a data race even in programs that do not explicitly share objects between threads. Similarly, an implementation of memcpy is not permitted to copy bytes beyond the specified length of the destination object and then restore the original values because it could cause a data race if the program shared those bytes between threads. —end footnote] Implementations may share their own internal objects between threads if the objects are not visible to users and are protected against data races.

Paragraph 6 is unchanged.

Unless otherwise specified, library functions shall perform all operations solely within the current thread if those operations have effects that are visible to users. [Footnote: This allows implementations to parallelize operations if there are no visible side effects. —end footnote]

7.22.3 Memory management functions

Edit paragraph 1 as follows.

The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. A live allocation call is one in which the corresponding deallocation call has not occured. The number of calls to the implementation of these functions may be less than the number of nominal calls, provided that the sum of the size parameters of the implementation calls shall not exceed the sum of the size parameters of live nominal allocation calls, where each parameter is rounded up to a multiple of the corresponding alignment. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

Paragraph 2 is unchanged.

For purposes of determining the existence of a data race, memory allocation functions behave as though they accessed only memory locations accessible through their arguments and not other static duration storage. the provisions of 7.1.4 apply. These functions may, however, visibly modify the storage that they allocate or deallocate. A call to free or realloc that deallocates a region p of memory synchronizes with any allocation call that allocates all or part of the region p. This synchronization occurs after any access of p by the deallocating function, and before any such access by the allocating function.

C++ Wording

The wording in this section is relative to WG21 N3337.

3.7.4 Dynamic storage duration [basic.stc.dynamic]

Paragraph 2 is unchanged.

The library provides default definitions for the global allocation and deallocation functions. Some global allocation and deallocation functions are replaceable (18.6.1). A C++ program shall provide at most one definition of a replaceable allocation or deallocation function. Any such function definition replaces the default version provided in the library (17.6.4.6). The following allocation and deallocation functions (18.6) are implicitly declared in global scope in each translation unit of a program.

void* operator new(std::size_t);
void* operator new[](std::size_t);
void operator delete(void*);
void operator delete[](void*);

These implicit declarations introduce only the function names operator new, operator new[], operator delete, and operator delete[]. [Note: The implicit declarations do not introduce the names std, std::size_t, or any other names that the library uses to declare these names. Thus, a new-expression, delete-expression or function call that refers to one of these functions without including the header <new> is well-formed. However, referring to std or std::size_t is ill-formed unless the name has been declared by including the appropriate header. —end note] Allocation and/or deallocation functions can also be declared and defined for any class (12.5).

Paragraph 3 is unchanged.

Any allocation and/or deallocation functions defined in a C++ program, including the default versions in the library, shall conform to the semantics specified in 3.7.4.1 and 3.7.4.2.

3.7.4.1 Allocation functions [basic.stc.dynamic.allocation]

Edit paragraph 2 as follows.

The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall return the address of the start of a block of storage whose length in bytes shall be at least as large as the requested size. There are no constraints on the contents of the allocated storage on return from the allocation function. The order, contiguity, and initial value of storage allocated by successive calls to an allocation function are unspecified. A live allocation call is one in which the corresponding deallocation call has not occured. The number of calls to the implementation of the global allocation functions may be less than the number of nominal calls, provided that the sum of the size parameters of the live implementation calls shall not exceed the sum of the size parameters of live nominal calls, where each parameter is rounded up to a multiple of the corresponding alignment. The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function). Even if the size of the space requested is zero, the request can fail. If the request succeeds, the value returned shall be a non-null pointer value (4.10) p0 different from any previously returned value p1, unless that value p1 was subsequently passed to an operator delete. The effect of dereferencing a pointer returned as a request for zero size is undefined. [Footnote: The intent is to have operator new() implementable by calling std::malloc() or std::calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer. —end footnote]

5.3.4 New [expr.new]

Paragraph 8 is unchanged, but note that the number of allocation implementation calls may be less than the number of nominal calls specified below via the provision of 3.7.4.1/2.

A new-expression obtains storage for the object by calling an allocation function (3.7.4.1). If the new-expression terminates by throwing an exception, it may release storage by calling a deallocation function (3.7.4.2). If the allocated type is a non-array type, the allocation function's name is operator new and the deallocation function's name is operator delete. If the allocated type is an array type, the allocation function's name is operator new[] and the deallocation function's name is operator delete[]. [Note: an implementation shall provide default definitions for the global allocation functions (3.7.4, 18.6.1.1, 18.6.1.2). A C++ program can provide alternative definitions of these functions (17.6.4.6) and/or class-specific versions (12.5). —end note]

Paragraph 8 is unchanged, but note that the number of allocation implementation calls may be less than the number of nominal calls specified below via the provision of 3.7.4.1/2.

A new-expression passes the amount of space requested to the allocation function as the first argument of type std::size_t. That argument shall be no less than the size of the object being created; it may be greater than the size of the object being created only if the object is an array. For arrays of char and unsigned char, the difference between the result of the new-expression and the address returned by the allocation function shall be an integral multiple of the strictest fundamental alignment requirement (3.11) of any object type whose size is no greater than the size of the array being created. [Note: Because allocation functions are assumed to return pointers to storage that is appropriately aligned for objects of any type with fundamental alignment, this constraint on array allocation overhead permits the common idiom of allocating character arrays into which objects of other types will later be placed. —end note]

17.6.4.6 Replacement functions [replacement.functions]

Paragraph 2 is unchanged.

A C++ program may provide the definition for any of eight dynamic memory allocation function signatures declared in header <new> (3.7.4, 18.6):

Paragraph 3 is unchanged.

The program's definitions are used instead of the default versions supplied by the implementation (18.6). Such replacement occurs prior to program startup (3.2, 3.6). The program's definitions shall not be specified as inline. No diagnostic is required.

18.6.1.4 Data races [new.delete.dataraces]

Edit paragraph 1 as follows.

For purposes of determining the existence of data races, the library versions of operator new, user replacement versions of global operator new, and the C standard library functions calloc and malloc shall behave as though they accessed and modified only the storage referenced by the return value. shall conform to the provisions of 17.6.5.9 [res.on.data.races]. The library versions of operator delete, user replacement versions of operator delete, and the C standard library function free shall behave as though they accessed and modified only the storage referenced by their first argument. shall conform to the provisions of 17.6.5.9 [res.on.data.races]. The C standard library function realloc shall behave as though it accessed and modified only the storage referenced by its first argument and by its return value. shall conform to the provisions of 17.6.5.9 [res.on.data.races]. Calls to these functions that allocate or deallocate a particular unit of storage p shall occur in a single total order, and each such deallocation call shall happen before the next allocation (if any) in this order. These functions may, however, visibly modify the storage that they allocate or deallocate. A call that deallocates a region p of memory shall synchronize after any access to p. A call that allocates a region p of memory shall synchronize before any access to p.

References

[TCM]
TCMalloc : Thread-Caching Malloc, http://goog-perftools.sourceforge.net/doc/tcmalloc.html.