Document number: | N1418 = 02-0076 |
Date: | November 11, 2002 |
Project: | Programming Language C++ |
Reference: | ISO/IEC IS 14882:1998(E) |
Reply to: | Pete Becker |
Dinkumware, Ltd. | |
petebecker@acm.org |
Overview · Linking and Libraries · Usage Models · Semantic Issues
Many operating systems today support applications consisting of an executable file and one or more dynamic libraries1. Compilers for such operating systems typically provide language extensions that support fine-grained control over the process of creating such applications. The C and C++ language standards, however, say nothing about dynamic libraries, so it is difficult to write portable applications that use them. This paper provides background material needed to better understand some of the problems posed by applications that use dynamic libraries.
Perhaps the most important impediment to discussion of dynamic libraries is differing notions of what the term "dynamic library" means. Systems programmers know the details of how dynamic libraries are loaded and how names defined in dynamic libraries are resolved; application programmers know what they want to do with dynamic libraries. Designing a model of dynamic libraries that is suitable for standardization requires exposure to both domains. The systems programming aspects of dynamic libraries are discussed in Linking and Libraries, and the application programming aspects are discussed in Usage Models. Finally, there are several decisions that must be made concerning what the language should say about dynamic libraries. These are discussed in Semantic Issues.
Formally, a program in C or C++ consists of one or more translation units which are compiled separately. The resulting object files are then linked together to produce an executable file:
/home/pete$ cc -c test.cpp /home/pete$ cc -c helper.cpp /home/pete$ cc test.o helper.o
In practice the link step is often handled by a script or by the compiler itself, so an application can be compiled and linked with a single command:
/home/pete$ cc test.cpp helper.cpp
Compilers also support the use of libraries. A library is nothing more than a set of object files grouped together into one file. A library is linked to an application by putting its name on the command line:
C:\work> cc test.obj helper.obj mylib.lib
Despite the apparent similarities, though, there is usually an important difference between linking an object file into an application and linking a library into an application. With most implementations the linker puts all of the functions and data objects from each object file into the application. When linking a library into an application, however, only the parts of the library that are needed by the application are linked in. The linker scans the library for object files that define names needed by other parts of the application, and links those object files into the application. The rest of the library isn't used. For example:
#include <stdlib.h> #include <stdio.h> int main() { puts("Hello, world"); exit(0); }
When the compiler compiles this translation unit it produces an object file
that defines the symbol main
2 and
has internal notes that tell the linker that the object file needs definitions for
the symbols puts
and exit
. When the linker links this object file
to produce an application it looks through the standard library
3 for an object file that defines one or both of
those names and links it into the application, adding any names that that object
file needs definitions for into the list of names that it is searching for. The link
step is complete only when the linker has resolved all of the symbols needed by the
object files that constitute the application and all of the symbols needed by the object
files that it linked in from libraries.
To run an application the operating system calls the program loader and gives it the name of the executable file. The program loader finds memory space for the application and copies the application's executable code into the memory space. Then it does any adjustments to the executable code that are needed to make it ready to run. For example, memory addresses in the executable code might need to be adjusted to reflect the actual location in memory where the program has been loaded. Once these loader fixups have been made the loader turns execution over to the application.
When an application uses dynamic libraries the picture changes. The code contained in the dynamic library is not linked into the application; in fact, the code often doesn't even have to be present on the system when the application is linked. The linker just makes notes in the executable file about symbols that it thinks will be resolved by dynamic libraries4.
To run an application that uses dynamic libraries the program loader has a great deal more work to do: in addition to loading the executable file into memory it has to find all of the dynamic libraries that the executable file depends on, including those needed by other dynamic libraries that it has loaded. After a dynamic library has been loaded, each function call from the executable file or from another dynamic library into that dynamic library has to be fixed up. These fixups can't be done any sooner, because it is only at load time that the locations in memory of the dynamic libraries that constitute the application are known. Thus, some of the work that the linker does when linking to a static library is deferred until load time when linking to dynamic libraries.
It is also possible to manually load a dynamic library at runtime. This is done by passing the name of the dynamic library to a system function that loads the library and returns a handle that the application can use to refer to the code in the library. After successfully loading the library the application can get the addresses of symbols defined in the dynamic library by calling another system function and passing it the handle for the library and the name of a symbol.
For the designer of an application there are three usage models for dynamic libraries, reflecting the three forms of linking and loading discussed in the previous section.
A monolithic application is an application that doesn't explicitly use any dynamic libraries5. All of the application's code must be present when the application is built, and all of the code is statically linked into the application. This is the traditional C and C++ program model. It imposes the tightest coupling among an application's components, which reduces flexibility and increases robustness.
A closed application uses dynamic libraries but doesn't manually load any dynamic libraries at runtime. The application designer determines what the application will be able to do and distributes the applicaton's code among the executable file and the application's dynamic libraries. This allows for the possibility of upgrading the application by distributing an updated executable file or updated dynamic libraries while leaving the unaffected code in place. Such an application is less tightly coupled than a monolithic application, and requires more care to ensure that new components work with correctly older versions.
An application that supports plug-ins manually loads dynamic libraries to supplement its capabilities. Plug-ins often come from the application implementor, but they can also come from third-party developers. To support the latter, the application implementor documents the interface that a plug-in must support, and it provides, documents, and maintains services needed by plug-ins. Unlike a closed application, an application that supports plug-ins permits extensions that were not designed into the application. In this sense such an application is less tightly coupled than a closed application; however, this flexibility comes at a price: new versions of the application must continue to provide the old version's support services so that existing plug-ins will continue to work.
Under Windows, when a dynamic library is built symbols that are intended to be used in code that uses the dynamic library must be marked as exported. Further, in code that uses symbols from a dynamic library each such symbol must be marked as imported. This marking is done by adding implementation-specific keywords to the declarations of these symbols. Each such symbols is modified by a macro that expands to the appropriate keyword for an exported symbol when the dynamic library is being built and to the appropriate keyword for an imported symbol when the dynamic library is being used:
#ifndef MY_HEADER #define MY_HEADER #if BUILD_MY_LIBRARY #define MY_LIBRARY_DECL __declspec(dllexport) #else #define MY_LIBRARY_DECL __declspec(dllimport) #endif MY_LIBRARY_DECL void f(int); #endif /* MY_HEADER */
Under Unix, the default is that when a dynamic library is built all symbols with external linkage are made available to code that uses the dynamic library. Nothing has to be done to the source code to make symbols available from a dynamic library or to use symbols defined in a dynamic library.
Both of these approaches have problems. The Windows approach requires careful maintenance
of the macros that describe the dynamic library. Moving code from one dynamic library to another
requires changing the controlling macros so that the symbols will be marked as exported from
the new library (e.g., the macro BUILD_MY_LIBRARY
in the example above would
have to be changed to a name that was defined when building the new library). The Unix
approach, simply put, does too much. It exposes internal details that the designers of
a dynamic library would prefer to keep private. Unix compilers address this problem from
outside the language through a text file that tells the linker which names to make available
from a dynamic library. This is obviously awkward, and some compilers are moving toward
a keyword-based approach.
Overall, it looks like some form of language support is needed to provide fine-grained control over which symbols are made available by dynamic libraries. There doesn't appear to be any technical barrier to simply marking a symbols as exported (with whatever syntax is deemed appropriate); with that information the compiler can generate whatever information is needed when it sees the definition of that symbol and when it sees a use of that symbol6.
The syntax for declaring exported symbols ought to be simple. One possibility that has
been discussed on the mail reflector is extending the syntax for a
linkage-specification
, so that a symbol that is defined in a dynamic library
could be marked with something like
extern "library"
7.
The following is not intended to be a proposal, merely a survey of the issues presented.
Ordinary functions and data objects can be marked in the same way as they can be
labeled extern "C"
:
extern "library" { int i; // i is defined in a dynamic library void f(); // f is defined in a dynamic library } extern "library" double d; // d is defined in a dynamic library
The symbols defined by a class consist of its member functions and its static data members. Putting the implementation of a class into a dynamic library requires being able make all of those symbols available:
extern "library" { class C { public: void f(); // C::f is defined in a dynamic library static int i; // C::i is defined in a dynamic library }; }
Templates are patterns for creating functions and classes. They are not, in themselves, code or data. Thus, they do not need any special handling for dynamic libraries. Rather, it is template instances that must be labeled when their code and data are in a dynamic library:
template <class T> struct C { void set(const T&tt) {t = t; T get() {return t; } private: T t; }; extern "library" { template <> C<int>; // C<int>::set and C<int>::get // are defined in a dynamic library }
The compiler also generates data that is used by the implementation, such as the data that supports runtime type information. For applications that support plug-ins it may be important to control the availability of such data, since writers of plug-ins may rely on the availability of type information for some of the application's types. This poses a problem, since the name of the data structure that holds type information8 is not usually known to the user. Some other syntax would be needed to support control of this data.
There are several semantic issues that the standard would have to address, mostly turning on the applicability of the one-definition rule when dynamic libraries are used. What should an implementation be required to do if two dynamic libraries export the same symbol? What should an implementation be required to do if two dynamic libraries define the same symbol as a symbol with external linkage but do not export it? What should an implementation be required to do if two dynamic libraries define the same type? (For example, when code in a dynamic library throws an exception, should code which called that code from another dynamic library be able to catch that exception?)
1. In Windows they're known as DLLs; in Unix they're shared libraries. Throughout this paper they are referred to as "dynamic libraries", in the hope that the name suggests that the two models are similar and that they are different.
2. This discussion ignores name mangling.
3. Although it generally doesn't appear on the command line, the standard library is usually no different from a user-defined library except that the compiler knows its name and passes that name to the linker even if it isn't mentioned on the command line.
4. This is deliberately vague, because the details vary fairly widely from system to system.
5. An application that doesn't explicitly use any dynamic libraries will often use dynamic libraries anyway -- the standard library for C and C++ is often packaged as a dynamic library. However, this is usually not something that the application designer need be concerend with; the implementor will make it work.
6. For Windows programmers this is a simplification; for Unix programmers it is a complication.
7. The use of "library"
here is intended only
as an aid to exposition, not as a recommendation.
8. If there is, in fact, such a name at all. Some implementations store this information in the vtable.