Doc No: X3J16/94-0144 WG21/N0531 Date: 7/28/94 Project: Programming Language C++ Reply To: J. Lawrence Podmolik Andersen Consulting jlp@chi.andersen.com Removing ptr_dyn_array Introduction Chapter 17 of the current Working Paper contains two classes, dyn_array and ptr_dyn_array, that implement variable-size sequences of types T and T*, respectively. With the acceptance of the STL proposal [4] at the Waterloo meeting, dyn_array is essentially superceded by the STL vector class. A separate proposal (X3J16/94-XXXX:WG21/NXXXX) therefore recommends removing dyn_array from the standard library, possibly incorporating some of its features into vector. How to handle ptr_dyn_array is a somewhat different matter. First, this class has no equivalent in STL. Second, if ptr_dyn_array is retained (presumably reshaped as ptr_vector), it follows that we should provide pointer-based versions of the other STL containers, at least those parameterized on a single element type. Therefore, there are two basic choices: o7 Remove ptr_dyn_array from the library, or o7 Standardize pointer versions of the following STL template classes: vector, list, deque, set and multiset. This proposal advocates the former approach: eliminating pointer-based collections from the standard library entirely. 1.0 Motivation for ptr_dyn_array The ptr_dyn_array class in the current library is intended to be semantically equivalent to dyn_array. That is, unlike some popular commercial libraries [3] that provide pointer collections that depend on some properties of T, ptr_dyn_array does not reference values of type T in any way. It is provided purely to optimize instantiation time and the amount of object code generated for programs that use vectors of different pointer types. Currently, ptr_dyn_array is implemented via public derivation from dyn_array, which presumably could be pre-instantiated in a library. All member functions of ptr_dyn_array are then written as simple inline forwarding functions to the dyn_array base class, fixing up the pointer types as necessary. 2.0 Existing practice Specialized pointer-based collections appear in several current commercial C++ libraries. The USL C++ Standard Components [1] implements pointer-based lists and sets (though, curiously, not vectors) using essentially the same technique described above. However, some of the USL pointer versions have subtle interface differences that, for example, require Set and Set_of_p to be used in slightly different ways. The USL library also requires separate iterator types to support these pointer-based collections. The Rogue Wave Tools.h++ library [3] uses a somewhat more elaborate scheme based on private inheritance and "intrusive" list base classes. Also, the Rogue Wave library uses a different model for some of its pointer containers: even though pointers are stored in the containers, some operations on type T (such as operator=()) are used to implement certain container member functions. Thus to achieve "pure" lists of arbitrary pointer types, one uses RWTValDlist instead of RWTPtrDlist, which lessens the degree of code sharing. Like USL, the Rogue Wave value-based and pointer-based containers require different iterator types. 3.0 How much template overhead is there? As discussed below, providing separate pointer containers increases the size and complexity of the standard library and presents additional type-related issues for users. Since pointer containers are defined specifically to reduce object code bloat, a fundamental question to answer is: how much overhead is incurred by each additional instantiation of a container with a different pointer type? The results naturally depend on the compiler and platform, as well as on the design of the specific container itself. If a container consists primarily of small inline functions, then the additional space required is minimal. 3.1 Algorithms and containers in STL In STL, much of the executable code resides in its algorithms, not in member functions specific to the individual containers. Almost all of the algorithms are generic, in that they are implemented as freestanding template functions that operate on iterators, rather than directly on containers. In this way, for example, the same sort() or merge() functions can be applied to any type of container. However, this also means that even if pointer-based containers were provided in STL, users might see only modest savings in object code for algorithms used in their programs. That is, since most of the member functions in the STL container classes are simple inline functions, the benefits of pointer versions that share a void* implementation are minimized. While some savings would be achieved for STL container member functions that are implemented using non-member algorithms, user-level code would not benefit since it does not interact directly with the void* container representation. To achieve increased savings in object code size when manipulating pointers of various types, pointer versions of these algorithms would also be needed, a clearly impractical approach. 3.2 Code size example: the STL vector class The following table shows the amount of object code generated for each the non-inline member functions of the STL vector class (instantiated with char*), plus for any non-member functions used by the implementation of these members. These numbers were produced using the current HP implementation of STL, compiled on a Sun SPARCstation 10 using Sun's C++ 4.0 native compiler. Function Bytes --------------------------------------------------------- ----- allocator::allocate(unsigned int) 579 allocator::init_page_size(void) 243 copy(char* const*, char* const*, char**) 191 copy_backward(char**, char**, char**) 189 fill(char**, char**, char* const&) 183 uninitialized_copy(char**, char**, char**) 261 uninitialized_copy(char* const*, char* const*, char**) 264 uninitialized_fill_n(char**, unsigned int, char* const&) 262 vector::insert(char**, char* const*, char* const*) 1308 vector::insert(char**, unsigned int, char* const&) 1254 vector::insert_aux(char**, char* const&) 993 vector::static_allocator.o 302 Although the vector class interface consists of almost 30 member functions, only the three insert() functions are non-inline and therefore stand to gain much from code sharing. In this example, the amount of object code that might be generated for each additional complete instantiation of vector is approximately 6K bytes: vector executable instances size ---------- ---------- 1 116032 2 121636 3 127632 4 133240 Note that a program will incur this total amount of space overhead only if it actually uses each of these member functions and/or algorithms. [ I will provide more comprehensive object code numbers in subsequent revisions of this paper. ] 4.0 Issues with pointer-based collections This section examines some of the problems inherent in using specialized pointer collections. 4.1 Additional complexity One obvious problem with providing pointer containers is that it effectively doubles the number of containers in the library. That is, if we can justify ptr_vector, it would be inconsistent not to provide prt_list, ptr_deque and ptr_set and ptr_multiset as well. Overall, then, this dramatically increases the size of the library while adding no additional functionality for users. This should be considered only if the savings in generated template object code are significant. More importantly, adding pointer containers to the library will confuse many users. Some users will clearly want to write code that operates on specific container types rather than on iterators alone. Should a pointer or non-pointer version of the container be assumed? The issue becomes even more complex when generic template functions with container arguments are involved. 4.2 Type system problems A more subtle problem with pointer containers is that they don't fit neatly into the type system. A vector and a ptr_vector, while conceptually the same, are unrelated types and cannot be used interchangeably. For example, most template functions simply cannot be written to accept both pointer and non-pointer versions of a given collection. For example, template void print(ostream& os, const vector& vec) { vector::const_iterator iter = vec.begin(); while (iter != vec.end()) os << *iter++ << " "; } Since vector and ptr_vector are unrelated types, a different version of print() would have to be written to accommodate each pointer collection. In large systems, the potential for such mismatches across subsystem boundaries is considerable. The current ptr_dyn_array class tries to overcome some of these problems by publicly deriving ptr_dyn_array from dyn_array. This arrangement allows a ptr_dyn_array to be used in some places where a dyn_array was expected (although the reverse is not true). Unfortunately, such an arrangement it not type safe because it permits a user to erroneously add a pointer of the wrong type [2]: void insert_y(vector& vec) { vec.push_back(new Y); } ptr_vector vec; insert_Y(vec); // insert_y((vector&)vec) Moreover, the conversion to vector doesn't work at all with template functions that contain other arguments dependent on T: template void add(vector&, T*); ptr_vector pvi; add(pvi, new int); // error This fragment doesn't compile since the ptr_vector "decays" into vector, not vector. 5.0 Summary While some savings in template instantiation time and object code size may be achieved by adding pointer-based containers to the standard library, these savings come at significant costs: o7 Increased size and complexity of the standard library. o7 Arguable real savings in object code size. o7 Usability and type safety problems when combining pointer and non-pointer containers. I do not believe that on balance the benefits outweigh the costs. As always, users with special needs (or vendors with such customers) may choose to implement pointer containers themselves. However, we need to demonstrate clear advantages before we add them to the standard. Therefore, I propose we remove ptr_dyn_array from the standard library and not consider adding pointer-based versions of any of the other STL containers. References 1. Carroll, M. and J. Isner. The Design of the C++ Standard Components. AT&T and Unix System Laboratories, Inc., 1992. 2. Carroll, M. and M. A. Ellis. Reducing Instantiation Time. C++ Report, 6(6), July-August 1994. 3. Keffer, T. Tools.h++ Introduction and Reference Manual. Rogue Wave Software, 1993. 4. Stepanov, A. and M. Lee. The Standard Template Library. X3J16/94-0140:WG21/N0527.