DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

rtld(M)


rtld -- runtime (dynamic) linker

Description

The dynamic linker (or RTLD) is the software that coordinates the shared libraries and the original dynamic executable that together form the user-level portion of a typical process. For a dynamic executable, the kernel passes control first to RTLD, which then bootstraps the process, and passes control off to the caller of main. Bootstrapping involves mapping in all additional start-up shared libraries, fixing all relocations, and calling all shared library initialization routines in reverse dependency order.

When a shared library needs to be mapped into the process, if the pathname for the library includes a slash, it will be used as is, except for $ORIGIN replacement. If the pathname includes no slash, it is searched for in the following places (in order):

The above list of search directories is modified when a LD_ROOT environment variable is present. LD_ROOT takes the form of a colon-separated list of absolute directories (starts with a slash) and for each one specified, in order, the above list of search directories is checked with each absolute path prefixed by that LD_ROOT directory. If after all the LD_ROOT directories have been run through, no suitable shared library has been found, then the above list of search directories is checked with no modifications.

Unless the process has gained privilege, when the pathname includes a $ORIGIN or a ${ORIGIN}, that portion of the string is replaced with the full pathname (without any . or .. or use of symbolic links) of the directory of the invoking object. For processes that has gained privilege, any potential pathname that contains a $ORIGIN or a ${ORIGIN} is skipped.

RTLD also comes into play when any of the dl... functions are called and when a lazy-bound routine is called from a shared library or the dynamic executable for the first time. References to global functions in dynamic executables and shared libraries generally require runtime binding by RTLD. It is possible to perform this binding on first reference (or ``lazily'') by having the target of the call be an indirect jump instruction (in the Procedure Linkage Table, or PLT) which is initially constructed to cause entry to RTLD. As it is most likely that only a fraction of the total functions referenced are ever called, lazy binding provides less overhead to the process as well as reducing the start-up cost. Thus, lazy binding is the default, but by setting the LD_BIND_NOW environment variable to a nonempty string, RTLD will cause all references to functions from the dynamic executable and within the start-up shared libraries to be bound before reaching main.

Listing of Dynamic Dependencies

The ldd(C) command actually invokes its subject command, but does so having exported LD_TRACE_LOADED_OBJECTS (set to a nonempty string) which causes RTLD to list the loaded shared libraries, but exit prior to execution of any application code. If ldd(C) is invoked with -d or -r, it also exports LD_WARN (set to a nonempty string) which also causes RTLD to check its startup relocations. The -r option also causes LD_BIND_NOW to be exported with a nonempty string value, causing any missing functions also to be complained about.

Adding Shared Libraries

You can cause extra shared libraries to be loaded into the process address space through use of LD_PRELOAD and LD_INSERT. The former is a colon-separated list of shared libraries which will be loaded at the head of the library list for the dynamic process. LD_INSERT's value takes the form of a list of semicolon-separated pairs of strings separated by a colon:
   lib1:lib2;lib3:lib4;...
The first string of each colon-separated pair specifies a shared library to be loaded just before the second is to be loaded. The spelling of the string to be matched must be exact, so if there is more than one way to reach a particular shared library, multiple strings pairs should be used.

Both forms of library insertion are disabled for processes which have gained privilege. Also, care should be taken with use of these environment variables as they can easily cause dynamic applications to fail in unexpected ways.

A typical use for adding shared libraries is to provide an alternate implementation of one or more routines, as a preloaded library will be searched first (after the main executable), and inserted libraries will be searched just before the second library named in the pair. (Thus LD_INSERT disrupts the overall namespace in a less jarring manner, for example when there are already multiple definitions of a routine present in the linked-in shared libraries.)

Application Tracing

For processes which have not gained privilege, RTLD provides a library and application routine tracing similar to truss(C) by exploiting the runtime binding of functions. When LD_TRACE is present in the environment, RTLD announces the call to each function that uses runtime binding. This is accomplished by RTLD creating an alternate stub function as the target of each PLT indirect jump which emits the call information and then jumps to the intended function.

The basic tracing information is a line of the form

   sym(arg1,arg2,arg3) from addr
where addr (and by default the args) are in hexadecimal. Additional tracing information can be obtained through comma-separated keywords in the LD_TRACE environment variable:

lib
Causes a @lib to be appended to both sym and addr, giving the containing object filename.

ret
Causes RTLD also to create a return stub that emits a line of the form
   => sym returned val
when a traced function returns. Again by default val is shown in hexadecimal.

sym
Causes RTLD to emit name+offset instead of addr where name is the closest exported symbol whose address is not greater than addr and offset is the distance (in bytes) from name to addr. Note that (particularly for a.outs) it can easily be that name is not the routine calling sym since RTLD only can see symbols present in dynamic symbol tables. [See dladdr(S).]

tim
Causes at sec.usec to be added on the end of the call trace line, representing the time in seconds and microseconds since the start of the process. Note that information is not available until after all shared library _init routines have been run.

hitim
Causes at ticks to be added on the end of the call trace line, where ticks is the clock ticks since the start of the process given by the Pentium rdtsc instruction. The tim-like form of this information will be presented when the LD_TRACE_SCALE environment variable is set with a decimal integer value of clock rate for your CPU, 133 for a 133 MHZ Pentium, for example. Note that you must be on a Pentium (or later) CPU and you must be permitted to use the rdtsc instruction -- see the USER_RDTSC setting on idtune(ADM). If either of these are not met, your process will get a SIGSEGV signal for hitim. Note further that timings can be inaccurate for a multi-processor system, since we do not synchronize the processor timers.

For C library routines and some other shared libraries, the tracing output reflects some knowledge of arguments and return values. This is provided to the RTLD tracing code through an exported weak symbol named __arglist. If a symbol's address corresponds to a __arglist entry (as defined by the object containing the symbol) its tracing information is presented according to the __arglist entry.

RTLD expects __arglist to have the following shape:

   #pragma weak __arglist
   extern const struct {
   	unsigned long nentries; // arglist[] length
   	struct {
   		void *funcaddr;
   		unsigned long format;
   	} arglist[]; // in increasing funcaddr order
   } __arglist;
but in practice it is usually defined in assembly, as that makes it easier to include the appropriate relocations, since funcaddr for shared libraries is expected to be the relative to the start of the library. (Relative addresses means that no runtime relocations are needed for the __arglist entries.) Note that the array is required to be sorted by increasing funcaddr values, which means that __arglist.arglist[0].funcaddr is the routine with the lowest address and __arglist.arglist[__arglist.nentries-1].funcaddr is the routine with the highest.

The format encodes basic argument and return value information using four bits (or a nibble) per each, giving room for at least seven arguments along with a return value for each entry, since a long is at least 32 bits. The low nibble holds the first argument's information, the next nibble holds the second's and so on; the high order four bits always holds the return value's information. The values are the following:

   #define F_N	0x0	// none -- stop processing args
   #define F_D	0x1	// decimal (signed)
   #define F_U	0x2	// decimal (unsigned)
   #define F_X	0x3	// hexadecimal: 0x...
   #define F_B	0x4	// bit list: 0b...
   #define F_S	0x5	// string: "..." or NULL
   #define F_C	0x6	// char: '...'
   #define F_LD	0x7	// long long decimal (signed)
   #define F_LU	0x8	// long long decimal (unsigned)
   #define F_LX	0x9	// long long hexadecimal
   #define F_FF	0xa	// single precision (float)
   #define F_FD	0xb	// double precision (double)
   #define F_FL	0xc	// double extended (long double)
   #define F_NT	0xf	// no tracing (return or at all)
where many of these assume that all integer arguments with types smaller than long long are passed in a one-size-fits-all container.

Of the above format encodings, F_S is the one that can easily cause RTLD to misbehave since it takes it as specifying a pointer to a sequence of null-terminated chars. Although there is a special case made for the null pointer, a ``garbage'' value will most likely result in a SIGSEGV signal sent to the process. The LD_TRACE_MAXSTR environment variable can be used to control the display of strings. By setting it to zero, all nonnull F_S items will be displayed as hex addresses. Otherwise, the (decimal) value of LD_TRACE_MAXSTR will be used as the maximum length of string to display, restricted to no more than a little more than 100 bytes. The default limit is 20 bytes.

Certain routines step outside the bounds of regular C or C++. Two examples are setjmp and getcontext. Attempts to trace the return value for these two functions would result in using a stale return address stub. (These functions potentially return more than once from ``the same'' call.) This is why F_NT exists. If it is used as the return value encoding, no return value tracing will be performed. Similarly, if the low nibble is F_NT, then no tracing of this routine will be done at all.

If there is no __arglist in the defining object for a routine, or the routine's address does not correspond to an __arglist entry, then the default tracing form is used, in which three hexadecimal integer arguments are shown, and if tracing returns, a hexadecimal integer is printed on its return. The number of integer arguments used by default can be specified by setting the LD_TRACE_ARGNO environment variable to the (decimal) number of integer arguments.

The tracing information by default is sent to file descriptor 2, the standard error output stream. A specific file descriptor number can be given by setting the LD_TRACE_FILENO environment variable to the (decimal) file descriptor number desired.

Tracing can be filtered through use of two other environment variables: LD_TRACE_LIBRARY and LD_TRACE_ROUTINE. LD_TRACE_LIBRARY optionally starts with a ! and is a comma-, colon- or semicolon-separated list of library pathnames. The pathnames must match exactly the pathname used by RTLD. See ldd(C). RTLD only traces those routines defined in the dynamic executable, and either those defined in the named shared libraries or, if it starts with a !, those not defined in the named shared libraries. Similarly, LD_TRACE_ROUTINE optionally starts with a ! and is a comma-separated list of routine names. RTLD only traces either those routines named or, if LD_TRACE_ROUTINE starts with a !, those routines not named. Note that the ``mangled'' name must be used for C++ routines.

An alternate style of RTLD tracing is also available through the LD_TRACE_STACK environment variable. This variable is a comma-separated list of routine names. When present it causes RTLD to emit a stack trace when any of the named routines are called. No tracing occurs for any other routines. The sym and lib LD_TRACE keywords affect all routines in the stack trace, but the other keywords only affect the named routines. The stack trace depth can be limited by setting LD_TRACE_FRAMES to a (decimal) maximum number, where zero (the default) means to display the entire stack.

The stack trace display looks reminiscent of debug(CP) and uses one of its simple stack tracing algorithms. It works in most cases, but it fails with functions that return structures. Also note that tracing through signal handlers is likely to be problematic.

RTLD Configuration File

In addition to the above environment variables, RTLD also takes settings for a subset of these controls through lines found in /etc/default/rtld. Except for LD_BIND_NOW and the controls which begin with LD_TRACE, lines of the form
   NAME=value
will behave the same as if
   LD_NAME=value
were present in the environment, except that settings will still be used even if the process has gained privilege.

A ROOT or LIBRARY_PATH setting from /etc/default/rtld will be overridden by a LD_ROOT or LD_LIBRARY_PATH environment variable respectively if the process has not gained privilege; otherwise the /usr/default/rtld setting will be used.

There can be multiple PRELOAD and INSERT settings in /etc/default/rtld, adding more shared libraries to load. A LD_PRELOAD or LD_INSERT, if not ignored due to privilege gain, will add on to any PRELOAD or INSERT settings respectively found in /etc/default/rtld.

Files


/etc/default/rtld
RTLD configuration file

/usr/lib/libc.so.1
RTLD and the runtime C shared library.

/usr/lib
default shared library directory.

References

cc(CP), CC(CP), ld(CP), ldd(C), truss(C), dladdr(S), dlclose(S), dlerror(S), dlopen(S), dlsym(S), malloc(S)
© 2007 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 05 June 2007