search    
HP-UX Linker and Libraries User's Guide
Hewlett-Packard
Improving Your Application Performance

The linker provides several ways to improve your application performance. This chapter discusses the following topics:

Linker Optimizations

The linker supports the -O option that performs the following optimizations at link time:

  • optimizes references to data by removing unnecessary ADDIL instructions from the object code (PA-RISC only).

  • removes procedures that can never be reached.

These optimizations can be separately enabled or disabled with the +O[no]fastaccess and +O[no]procelim options, respectively. The -O linker option simply combines enabling of these into one option. For example, the following ld command enables linker optimizations and results in a smaller, faster executable:

$ ld -O -o prog /usr/ccs/lib/crt0.o prog.o -lm -lc
	

To enable one or the other optimization only, use the appropriate +O option:

$ ld +Ofastaccess -o prog /usr/ccs/lib/crt0.o prog.o -lm -lc
$ ld +Oprocelim -o prog /usr/ccs/lib/crt0.o prog.o -lm -lc
	

Invoking Linker Optimizations from the Compile Line

The compilers automatically call the linker with the +Oprocelim options if compiler at all optimization levels. For example, the following cc command invokes full compiler optimization as well as linker optimization:

	$ cc -o prog +O4 prog.c       //O4 invokes +Ofastaccess and +Oprocelim
	

If invoked with +O4, the compilers generate object code in such a way that code optimization is done at link time. Thus, the linker does a better job of optimizing code that was compiled with +O4.

When the compile and link phases are invoked by separate commands, specify +O4 on both command lines. For example:

	$ cc -c +O4 prog.c            //invokes compiler optimizations
	$ cc -o prog +O4 prog.o       //invokes linker optimizations
	


Note

You can also invoke linker optimizations at levels 2 and 3 by using the +Ofastaccess or +Oprocelim option.


See also

For a brief description of compiler optimization options, see Selecting an Optimization Level with PBO . For a complete description, see your compiler documentation.


Incompatibilities with other Options

The -O, +Ofastaccess, and +Oprocelim options are incompatible with the following linker options:

-b

These options have no effect on position-independent code, so they are not useful when building shared libraries with ld -b.

-A

(PA32 only) Dynamic linking is incompatible with link-time optimization.

-r

Relocatable linking is incompatible with link-time optimization.

-D

Setting the offset of the data space is incompatible with link-time optimization.

The linker issues a warning when such conflicts occur. If you require any of these features, do not use the linker optimization options.


Unused Procedure Elimination with +Oprocelim

Unused or "dead" procedure elimination is the process of removing unreferenced procedures from the $TEXT$ space (or .text section in the case of ELF object file) of an executable or shared library to reduce the size of the program or library.

Dead procedure elimination is performed after all symbols have been resolved prior to any relocation. It works on the basis of per subspace (per section in the case of ELF object file). That is, only entire subspaces are removed and only if all procedures in the subspace are unreferenced. Typically, if a relocatable link (ld -r) has not been performed and the code is not written in assembly, every procedure is in its own subspace. Relocatable links may merge subspaces. Merged subspaces can prevent the removal of dead procedures. Therefore, it is optimal to have each procedure in its own subspace.

The +Oprocelim option removes unreferenced data.

If your program does symbol binding at run-time, rather than at link-time, be cautious about using the +Oprocelim option. The +Oprocelim option works on most compiler-generated subspace/sections.

The +Onoprocelim is the default procelim option in linker. Compilers often automatically pass +Oprocelim to the linker for higher levels of optimization. For more information on when compilers automatically pass +Oprocelim to the linker for higher levels of optimization, refer the Compiler documentation.

Complete Executables

For complete executables, dead procedure elimination removes any text subspaces that are not referenced from another subspace. Self-references, such as recursive procedures or subspaces with multiple procedures that call each other, are not considered outside references and are therefore candidates for removal.

If the address of a procedure is taken, the subspace within which it resides is not removed. If a subspace is referenced in any way by a fixup representing a reference other than a PC-relative call or an absolute call it is not removed.

Incomplete Executables

For incomplete executables, dead procedure elimination works in the same way as for complete executables except that no exported symbols or their dependencies are removed. If an incomplete executable contains a symbol that is to be referenced by a shared library and is not exported, it is removed if the other conditions discussed above hold.

Shared Libraries

In shared libraries only symbols that are not referenced and not exported are removed. In shared libraries all symbols that are not of local scope are exported. Therefore only locally scoped symbols not referenced are removed.

Relocatable Objects

When performing a relocatable link with the -r option, dead procedure elimination is disabled since the only possible gain is the removal of unreferenced local procedures. Objects resulting from a relocatable link are subject to dead procedure elimination upon a final link.

Affects on Symbolic Debugging

Any procedure that has symbolic debug information associated with it is not removed. Procedures that do not have symbolic debug information associated with them but are included in a debug link are removed if they are not referenced.

Options to Improve TLB Hit Rates

To improve Translation Lookaside Buffer (TLB) hit rates in an application running on an Itanium-based or a PA 8000-based system, use the following linker or chatr virtual memory page setting options:

  • +pd size - requests a specified data page size of 4K bytes, 16K, 64K, 256K, 1M, 4M, 16M, 64M, 256M, or L. Use L to specify the largest page size available. The actual page size may vary if the requested size can not be fulfilled.

  • +pi size - requests a specified instruction page size. (See +pd size for size values.)

The default data and instruction page size is 4K bytes on Itanium and PA-RISC systems.

The Itanium architecture supports multiple page sizes from 4K to 4G (4K, 8K, 16K, 64K, 256K, 1M, 4M, 16M, 64M, 256M and 4G). The PA-RISC 2.0 architecture supports multiple page sizes, from 4K bytes to 64M bytes, in multiples of four. This enables large contiguous regions to be mapped into a single TLB entry. For example, if a contiguous 4MB of memory is actively used, 1000 TLB entries are created if the page size is 4K bytes, but only 64 TLB entries are created if the page size is 64K bytes.

Applications and benchmarks have larger and larger working-set sizes. Therefore, the linker and chatr TLB page setting options can help boost performance by improving TLB hit rates.

Some scientific applications benefit from large data pages. Alternatively, some commercial applications benefit from large instruction page sizes.

Examples
  • To set the virtual memory page size by using the linker:

    		$ ld +pd 64K +pi 16K /opt/langtools/lib/crt0.o myprog.o -lc 
    		
  • To set the page size from HP C and HP Fortran:

    		$ cc -Wl,+pd,64K,+pi,16K myprog.c
    		$ f90 -Wl,+pd,64K,+pi,16K myprog.f
    		
  • To set the page size by using chatr:

    		$ chatr +pd 64K +pi 16K a.out 
    		
Profile-Based Optimization (Itanium)

For information on Profile-Based Optimization on Itanium systems, see +Oprofile=collect and +Oprofile=use in the C/C++ help document.

Profile-Based Optimization (PA-RISC)

In profile-based optimization (PBO), the compiler and linker work together to optimize an application based on profile data obtained from running the application on a typical input data set. For instance, if certain procedures call each other frequently, the linker can place them close together in the a.out file, resulting in fewer instruction cache misses, TLB misses, and memory page faults when the program runs. Similar optimizations can be done at the basic block levels of a procedure. Profile data is also used by the compiler for other general tasks, such as code scheduling and register allocation.


General Information about PBO
Using PBO


Note

The compiler interface to PBO is currently supported only by the C, C++, and FORTRAN compilers.



When to Use PBO

Profile-Based Optimization must be the last level of optimization you use when building an application. As with other optimizations, it must be performed after an application has been completely debugged.

Most applications benefit from PBO. The two types of applications that benefit the most from PBO are:

  • Applications that exhibit poor instruction memory locality. These are usually large applications in which the most common paths of execution are spread across multiple compilation units. The loops in these applications typically contain large numbers of statements, procedure calls, or both.

  • Applications that are branch-intensive. The operations performed in such applications are highly dependent on the input data. User interface managers, database managers, editors, and compilers are examples of such applications.

The best way to determine whether PBO improves an application's performance is to try it.


Note

Under some conditions, PBO is incompatible with programs that explicitly load shared libraries. Specifically, PBO does not function properly if the shl_load routine has either the BIND_FIRST or the BIND_NOSTART flags set. For more information about explicit loading of shared libraries, see The shl_load and cxxshl_load Routines .



How to Use PBO

Profile-based optimization involves these steps:

  1. Instrument the application - prepare the application so that it generates profile data.

  2. Profile the application - create profile data that can be used to optimize the application.

  3. Optimize the application - generate optimized code based on the profile data.

A Simple Example

Suppose you want to apply PBO to an application called sample. The application is built from a C source file sample.c. Discussed below are the steps involved in optimizing the application.

Step 1 Instrument

First, compile the application for instrumentation and level 2 optimization:

	$ cc -v -c +I -O sample.c
	/opt/langtools/lbin/cpp sample.c /var/tmp/ctm123
	/opt/ansic/lbin/ccom /var/tmp/ctm123 sample.o -O2 -I
	$ cc -v -o sample.inst +I -O sample.o
	/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main \
	 -o sample.inst sample.o -I -lc
	

At this point, you have an instrumented program called sample.inst.

Step 2 Profile

Assume you have two representative input files to use for profiling, input.file1 and input.file2. Now execute the following three commands:

	$ sample.inst < input.file1
	$ sample.inst < input.file2
	$ mv flow.data sample.data
	

The first invocation of sample.inst creates the flow.data file and places an entry for that executable file in the data file. The second invocation increments the counters for sample.inst in the flow.data file. The third command moves the flow.data file to a file named sample.data.

Step 3 Optimize

To perform profile based optimizations on this application, relink the program as follows:

	$ cc -v -o sample.opt +P +pgm sample.inst \
	  +df sample.data sample.o
	/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o sample.opt \
	 +pgm sample.inst +df sample.data sample.o -P -lc
	

Note that it is not necessary to recompile the source file. The +pgm option is used because the executable name used during instrumentation, sample.inst, does not match the current output file name, sample.opt. The +df option is necessary because the profile database file for the program has been moved from flow.data to sample.data.


Instrumenting (+I/-I)

Although you can use the linker alone to perform PBO, the best optimizations result if you use the compiler as well; this section describes this approach.

To instrument an application (with C, C++, and FORTRAN), compile the source with the +I compiler command line option. This causes the compiler to generate a .o file containing intermediate code, rather than the usual object code. (Intermediate code is a representation of your code that is lower-level than the source code, but higher level than the object code.) A file containing such intermediate code is referred to as an I-SOM file.

After creating an I-SOM file for each source file, the compiler invokes the linker as follows:

  1. In 32-bit mode, instead of using the startup file /usr/ccs/lib/crt0.o, the compiler specifies a special startup file named /opt/langtools/lib/icrt0.o. When building a shared library, the compiler uses /usr/ccs/lib/scrt0.o. In 64-bit mode, the linker automatically adds /usr/css/lib/pa20_64/fdp_init.o or /usr/css/lib/pa20_64/fdp_init_sl.o to the link when detects that -I crt0.o is not changed.

  2. The compiler passes the -I option to the linker, causing it to place instrumentation code in the resulting executable.

You can see how the compiler invokes the linker by specifying the -v option. For example, to instrument the file sample.c, to name the executable sample.inst, to perform level 2 optimizations (the compiler option -O is equivalent to +O2), and to see verbose output (-v):

	$ cc -v -o sample.inst +I -O sample.c
	/opt/langtools/lbin/cpp sample.c /var/tmp/ctm123
	/opt/ansic/lbin/ccom /var/tmp/ctm123 sample.o -O2 -I
	/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o \
	 sample.inst sample.o -I -lc
	

Notice in the linker command line (starting with /usr/ccs/bin/ld), the application is linked with /opt/langtools/lib/icrt0.o and the -I option is given.

To save the profile data to a file other than flow.data in the current working directory, use the FLOW_DATA environment variable as described in Specifying a Different flow.data with FLOW_DATA .

The Startup File icrt0.o

The icrt0.o startup file uses the atexit system call to register the function that writes out profile data. (For 64-bit mode, the initialization code is in /usr/ccs/lib/pa20_64/fdp_init.0.) That function is called when the application exits.

The atexit system call allows a fixed number of functions to be registered from a user application. Instrumented applications (those linked with -I) have one less atexit call available. One or more instrumented shared libraries use a single additional atexit call. Therefore, an instrumented application that contains any number instrumented shared libraries uses two of the available atexit calls.

For details on atexit, see atexit(2).

The -I Linker Option

When invoked with the -I option, the linker instruments all the specified object files. Note that the linker instruments regular object files as well as I-SOM files; however, with regular object files, only procedure call instrumentation is added. With I-SOM files, additional instrumentation is done within procedures.

For instance, suppose you have a regular object file named foo.o created by compiling without the +I option, and you compile a source file bar.c with the +I option and specify foo.o on the compile line:

	$ cc -c foo.c
	$ cc -v -o foobar -O +I bar.c foo.o
	/opt/langtools/lbin/cpp bar.c /var/tmp/ctm456
	/opt/ansic/lbin/ccom /var/tmp/ctm456 bar.o -O2 -I
	/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
	 bar.o foo.o -I -lc
	

In this case, the linker instruments both bar.o and foo.o. However, since foo.o is not an I-SOM file, only its procedure calls are instrumented; basic blocks within procedures are not instrumented. To instrument foo.c to the same extent, you must compile it with the +I option - for example:

	$ cc -v -c +I -O foo.c
	/opt/langtools/lbin/cpp foo.c /var/tmp/ctm432
	/opt/ansic/lbin/ccom /var/tmp/ctm432 foo.o -O2 -I
	$ cc -v -o foobar -O +I bar.c foo.o
	/opt/langtools/lbin/cpp bar.c /var/tmp/ctm456
	/opt/ansic/lbin/ccom /var/tmp/ctm456 bar.o -O2 -I
	/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
	 bar.o foo.o -I -lc
	

A simpler approach is to compile foo.c and bar.c with a single cc command:

	$ cc -v +I -O -o foobar bar.c foo.c
	/opt/langtools/lbin/cpp bar.c /var/tmp/ctm352
	/opt/ansic/lbin/ccom /var/tmp/ctm352 bar.o -O2 -I
	/opt/langtools/lbin/cpp foo.c /var/tmp/ctm456
	/opt/ansic/lbin/ccom /var/tmp/ctm456 foo.o -O2 -I
	/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o foobar \
	 bar.o foo.o -I -lc
	
Code Generation from I-SOMs

As discussed in Looking "inside" a Compiler , a compiler driver invokes several phases. The last phase before linking is code generation. When using PBO, the compilation process stops at an intermediate code level. The PA-RISC code generation and optimization phase is invoked by the linker. The code generator is /opt/langtools/lbin/ucomp.


Note

Since the code generation phase is delayed until link time with PBO, linking can take much longer than usual when using PBO. Compile times are faster than usual, since code generation is not performed.


Building Portable Code with Linker Optimization

To build executables on a PA-RISC 2.0 system that run on 1.1 and 2.0 systems, compiled for optimization with +O4, +P, or +I, explicitly compile those components with +DAportable or +DA1.1. This is due to the code generation that the linker invokes at link-time for optimization. When you compile with +O4, +P or +I, your compiler builds an I-SOM (Intermediate code-System Object Module) file instead of a SOM file at compile time. (See Instrumenting (+I/-I) for more information). At link-time, the linker invokes the code generator (ucomp) to generate SOM files from the I-SOM files and to complete the optimization. If you did not build the I-SOM file with +DAportable or +DA1.1, ucomp generates a SOM file that contains code for the PA-RISC architecture of the machine on which you are building.

For example, if you build an archive library on a 1.1 system with +O4, +P, or +I, without specifying the architecture, the I-SOM files in the library do not contain a specific option for 1.1 code generation. If you move the archive library to a 2.0 system and use it to build an executable, the executable is built as a 2.0 executable because of the link-time code generation. To build a 1.1 executable, rebuild the archive library with +DAportable or +DA1.1.

Another approach is to combine objects that have been compiled with +O4, +P, or +I into a merged object file with the linker -r option: the -r option produces an object file (SOM) not an I-SOM file. Since code generation occurs when the merged file is built, if this file is built on a 1.1 system, the file is safe to ship to other systems for building 1.1 applications.

To determine if an object file is an I-SOM file, use the size(1) command. I-SOM files have zero listed for the size of all the sections (text, data and bss (uninitialized data)):

	$ size foo.o
	0 + 0 + 0 = 0
	


Profiling

After instrumenting a program, you can run it one or more times to generate profile data, which is ultimately used to perform the optimizations in the final step of PBO.

This section provides information on the following profiling topics:

Choosing Input Data

For best results from PBO, use representative input data when running an instrumented program. Input data that represents rare cases or error conditions is usually not effective for profiling. Run the instrumented program with input data that closely resembles the data in a typical user's environment. Then, the optimizer focuses its efforts on the parts of the program that are critical to performance in the user's environment. You need not do a large number of profiling runs before the optimization phase. Usually it is adequate to select a small number of representative input data sets.

The flow.data File

When an instrumented program terminates with the exit(2) system call, special code in the 32-bit icrt0.o startup file or the 64-bit /usr/ccs/lib/pa20_64/fdp_init.o file writes profile data to a file called flow.data in the current working directory. This file contains binary data, which cannot be viewed or updated with a text editor. The flow.data file is not updated when a process terminates without calling exit. That happens, for example, when a process aborts because of an unexpected signal, or when the program calls exec(2) to replace itself with another program.

There are also certain non-terminating processes (such as servers, daemons, and operating systems) which never call exit. For these processes, you must programmatically write the profile data to the flow.data file. In order to do so, a process must call a routine called _write_counters(). This routine is defined in the icrt0.o file. A stub routine with the same name is present in the crt0.o file so that the source does not have to change when instrumentation is not being done.

If flow.data does not exist, the program creates it. If flow.data exists, the program updates the profile data.

As an example, suppose you have an instrumented program named prog.inst, and two representative input data files named input_file1 and input_file2. Then the following lines create a flow.data file:

	$ prog.inst < input_file1
	$ ls flow.data
	  flow.data
	$ prog.inst < input_file2
	

The flow.data file includes profile data from both input files.

To save the profile data to a file other than flow.data in the current working directory, use the FLOW_DATA environment variable as described in Specifying a Different flow.data with FLOW_DATA .

Storing Profile Information for Multiple Programs

A single flow.data file can store information for multiple programs. This allows an instrumented program to spawn other instrumented programs, all of which share the same flow.data file.

To allow multiple programs to save their data in the same flow.data file, a program's profile data is uniquely identified by the executable's basename (see basename(1)), the executable's file size, and the time the executable was last modified.

Instead of using the executable's basename, you can specify a basename by setting the environment variable PBO_PGM_PATH. This is useful when a number of programs are actually linked to the same instrumented executables.

For example, consider profiling the ls, lsf and lsx commands (lsx is ls with the -x option and lsf is ls with the -F option). Because the three commands could be linked to the same instrumented executables, the developer may want to collect profile data under a single basename by setting PBO_PGM_PATH=ls. If PBO_PGM_PATH=ls is not set, the profile data is saved under the ls, the lsf, and the lsx basenames.

When an instrumented program begins execution, it checks whether the basename, size, and time-stamp match those in the existing flow.data file. If the basename matches but the size or time-stamp does not match, that probably means that the program has been relinked since it last created profile data. In this case, the following error message is issued:

	program: Can't update counters.  Profile data exists
	         but does not correspond to this executable.  Exit.
	

You can fix this problem in any one of the following ways:

  • Remove or rename the existing flow.data file.

  • Run the instrumented program in a different working directory.

  • Set the FLOW_DATA environment variable so that profile data is written to a file other than flow.data.

  • Rename the instrumented program.

Sharing the flow.data File Among Multiple Processes

A flow.data file can potentially be accessed by several processes at the same time. For example, this can happen when you run more than one instrumented program at the same time in the same directory, or when profiling one program while linking another with -P.

Such asynchronous access to the file can potentially corrupt the data. To prevent simultaneous access to the flow.data file in a particular directory, a lock file called flow.lock is used. Instrumented programs that need to update the flow.data file and linker processes that need to read it must first obtain access to the lock file. Only one process can hold the lock at any time. As long as the flow.data file is being actively read and written, a process will wait for the lock to become available.

A program that terminates abnormally can leave the flow.data file inactive but locked. A process that tries to access an inactive but locked flow.data file gives up after a short period of time. In such cases, you may need to remove the flow.lock file.

If an instrumented program fails to obtain the database lock, it writes the profile data to a temporary file and displays a warning message containing the name of the file. You could then use the +df option along with the +P option while optimizing, to specify the name of the temporary file instead of the flow.data file.

If the linker fails to obtain the lock, it displays an error message and terminates. In such cases, wait until all active processes that are reading or writing a profile database file in that directory have completed. If no such processes exist, remove the flow.lock file.

Forking an Instrumented Application

When instrumenting an application that creates a copy of itself with the fork system call, you must ensure that the child process calls a special function named _clear_counters(), which clears all internal profile data. If you don't do this, the child process inherits the parent's profile data, updating the data as it executes, resulting in inaccurate (exaggerated) profile data when the child terminates. The following code segment shows a valid way to call _clear_counters:

	if ((pid = fork()) == 0) /* this is the child process */
	  {
	    _clear_counters();     /* reset profile data for child */
	         . . .             /* other code for the child */
	  }
	
The function _clear_counters is defined in icrt0.o. It is also defined as a stub (an empty function that does nothing) in crt0.o. This allows you to use the same source code without modification in the instrumented and un-instrumented versions of the program.


Optimizing Based on Profile Data (+P/-P)

The final step in PBO is optimizing a program using profile data created in the profiling phase. To do this, rebuild the program with the +P compiler option. As with the +I option, the +P option causes the compiler to generate an I-SOM .o file, rather than the usual object code, for each source file.

Note that it is not really necessary to recompile the source files; you could, instead, specify the I-SOM .o files that were created during the instrumentation phase. For instance, suppose you have already created an I-SOM file named foo.o from foo.c using the +I compiler option; then the following commands are equivalent in effect:

	$ cc +P foo.c
	$ cc +P foo.o
	

Both commands invoke the linker, but the second command doesn't compile before invoking the linker.

The -P Linker Option

After creating an I-SOM file for each source file, the compiler driver invokes the linker with the -P option, causing the linker to optimize all the .o files. As with the +I option, the driver uses /opt/langtools/lbin/ucomp to generate code and perform various optimizations.

To see how the compiler invokes the linker, specify the -v option when compiling. For instance, suppose you have instrumented prog.c and gathered profile data into flow.data. The following example shows how the compiler driver invokes the linker when +P is specified:

	$ cc -o prog -v +P prog.o
	/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o prog \
	 prog.o -P -lc
	

Notice how the program is now linked with /usr/ccs/lib/crt0.o instead of /opt/langtools/lib/icrt0.o because the profiling code is no longer needed.

Using The flow.data File

By default, the code generator and linker look for the flow.data file in the current working directory. In other words, the flow.data file created during the profiling phase should be located in the directory where you relink the program.

Specifying a Different flow.data File with +df

What if you want to use a flow.data file from a different directory than where you are linking? Or what if you have renamed the flow.data file - for example, if you have multiple flow.data files created for different input sets? The +df option allows you to override the default +P behavior of using the file flow.data in the current directory. The compiler passes this option directly to the linker.

For example, suppose after collecting profile data, you decide to rename flow.data to prog.prf. You could then use the +df option as follows:

	$ cc -v -o prog +P +df prog.prf prog.o
	/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o prog \
	 +df prog.prf prog.o -P -lc
	

The +df option overrides the effects of the FLOW_DATA environment variable.

Specifying a Different flow.data with FLOW_DATA

The FLOW_DATA environment variable provides another way to override the default flow.data file name and location. If set, this variable defines an alternate file name for the profile data file.

For example, to use the file /home/adam/projX/prog.data instead of flow.data, set FLOW_DATA:

$ FLOW_DATA=/home/adam/projX/prog.data
	
$ export FLOW_DATA                               Bourne and Korn shell
	 
$ setenv FLOW_DATA /home/adam/projX/prog.data    C shell
	
Interaction between FLOW_DATA and +df

If an application is linked with +df and -P, the FLOW_DATA environment variable is ignored. In other words, +df overrides the effects of FLOW_DATA.

Specifying a Different Program Name (+pgm)

When retrieving a program's profile data from the flow.data file, the linker uses the program's basename as a lookup key. For instance, if a program were compiled as follows, the linker would look for the profile data under the name foobar:

	$ cc -v -o foobar +P foo.o bar.o
	/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main -o foobar \
	 foo.o bar.o -P -lc
	

This works fine as long as the name of the program is the same during the instrumentation and optimization phases. But what if the name of the instrumented program is not the same as name of the final optimized program? What does linker do?

Let us say, for example, you want the name of the instrumented application to be different from the optimized application. So, you use the following compiler commands:

$ cc -O +I -o prog.inst prog.c   //Instrument prog.inst.

$ prog.inst < input_file1     //Profile it, storing the data under 
                                 the name prog.inst.
$ prog.inst < input_file2

$ cc +P -o prog.opt prog.c       //Optimize it, but name it prog.opt.

The linker is unable to find the program name prog.opt in the flow.data file and issues the error message:

	No profile data found for the program prog.opt in flow.data
	

To get around this problem, the compilers and linker provide the +pgm name option, which allows you to specify a program name to look for in the flow.data file. For instance, to make the above example work properly, you would include +pgm prog.inst on the final compile line:

	$ cc +P -o prog.opt +pgm prog.inst prog.c
	

Like the +df option, the +pgm option is passed directly to the linker.


Selecting an Optimization Level with PBO

When -P is specified, the code generator and linker perform profile-based optimizations on any I-SOM or regular object files found on the linker command line. In addition, optimizations will be performed according to the optimization level you specified with a compiler option when you instrumented the application. Briefly, the compiler optimization options are:

+O0

Minimal optimization. This is the default.

+O1

Basic block level optimization.

+O2

Full optimization within each procedure in a file. (Can also be invoked as -O.)

+O3

Full optimization across all procedures in an object B file. Includes subprogram inlining.

+O4

Full optimization across entire application, performed at link time. (Invokes ld +Ofastaccess +Oprocelim.) Includes inlining across multiple files.


Note

The +O3 and +O4 options are incompatible with symbolic debugging. The only compiler optimization levels that allow for symbolic debugging are +O2 and lower.

For more detailed information on compiler optimization levels, see your compiler documentation.


PBO has the greatest impact when it is combined with level 2 or greater optimizations. For instance, this compile command combines level 2 optimization with PBO (note that the compiler options +O2 and -O are equivalent):

	$ cc -v -O +I -c prog.c
	/opt/langtools/lbin/cpp prog.c /var/tmp/ctm123
	/opt/ansic/lbin/ccom /var/tmp/ctm123 prog.o -O2 -I
	$ cc -v -O +I -o prog prog.o
	/usr/ccs/bin/ld /opt/langtools/lib/icrt0.o -u main -o prog \
	 prog.o -I -lc
	

The optimizations are performed along with instrumentation. However, profile-based optimizations are not performed until you compile later with +P:

	$ cc -v +P -o prog prog.o
	/usr/ccs/bin/ld /usr/ccs/lib/crt0.o -u main \
	     -o prog prog.o -P -lc
	

Using PBO to Optimize Shared Libraries

Beginning with the HP-UX 10.0 release, the -I linker option can be used with -b to build a shared library with instrumented code. Also, the -P, +df, and +pgm command-line options are compatible with the -b option.

To profile shared libraries, you must set the environment variable SHLIB_FLOW_DATA to the file that receives profile data. Unlike FLOW_DATA, SHLIB_FLOW_DATA has no default output file. If SHLIB_FLOW_DATA is not set, profile data is not collected. This allows you to activate or suspend the profiling of instrumented shared libraries.

Note that you can set SHLIB_FLOW_DATA to flow.data which is the same file as the default setting for FLOW_DATA. But, again, profile data can be collected from shared libraries only if you explicitly set SHLIB_FLOW_DATA to some output file.

The following is an example for instrumenting, profiling, and optimizing a shared library:

$ cc +z +I -c -O libcode.c             //Create I-SOM files.

$ ld -b -I libcode.o -o mylib.inst.sl  //Create instrumented sl.

$ cc main.c mylib.inst.sl              //Create executable a.out file.

$ export SHLIB_FLOW_DATA=./flow.data   //Specify output file for profile data

$ a.out < input_file                //Run instrumented executable with 
                                       representative input data.

$ ld -b -P +pgm mylib.inst.sl \        //Perform PBO.
  libcode.o -o mylib.sl

Note that the name used in the database is the output pathname specified when the instrumented library is linked (mylib.inst.sl in the example above), regardless of how the library might be moved or renamed after it is created.


Using PBO with ld -r

Beginning with the HP-UX 10.0 release, you can take greater advantage of PBO on merged object files created with the -r linker option.

Briefly, ld -r combines multiple .o files into a single .o file. It is often used in large product builds to combine objects into more manageable units. It is also often used in combination with the linker -h option to hide symbols that may conflict with other subsystems in a large application. (See Hiding Symbols with -h for more information on ld -h.)

In HP-UX 10.0, the subspaces in the merged .o file produced by ld -r are relocatable which allows for greater optimization.

The following is an example of using PBO with ld -r:

$ cc +I -c file1.c file2.c             //Create individual I-SOM files

$ ld -r -I -o reloc.o file1.o file2.o  //Build relocatable, merged file

$ cc +I -o a.out reloc.o               //Create instrumented executable file.

$ a.out < input_file                 //Run instrumented executable
                                       with representative input data.
	
$ ld -r -P +pgm a.out -o reloc.o \
	  file1.o file2.o            //Rebuild relocatable file for PBO.

$ cc +P -o a.out reloc.o               //Perform PBO on the final executable
                                       file.

Notice, in the example above, that the +pgm option was necessary because the output file name differs from the instrumented program file name.


Note

If you are using -r and C++ templates, check "Known Limitations" in the HP C++ Release Notes for possible limitations.



Restrictions and Limitations of PBO

This section describes restrictions and limitations you must be aware of when using Profile-Based Optimization. This section discusses the folowing topics:


Note

PBO calls malloc() during the instrumentation (+I) phase. If you replace libc malloc(3C) calls with your own version of malloc(), use the same parameter list (data types, order, number, and meaning of parameters) as the HP version. (For information on malloc(), see malloc(3C).)


Temporary Files

The linker does not modify I-SOM files. Rather, it compiles, instruments, and optimizes the code, placing the resulting temporary object file in a directory specified by the TMPDIR environment variable. If PBO fails due to inadequate disk space, try freeing up space on the disk that contains the $TMPDIR directory. Or, set TMPDIR to a directory on a disk with more free space.

Source Code Changes and PBO

To avoid the potential problems described later in this section, PBO must be used only during the final stages of application development and performance tuning, when source code changes are the least likely to be made. Whenever possible, an application should be re-profiled after source code changes have been made.

What happens if you attempt to optimize a program using profile data that is older than the source files? For example, this can occur if you change source code and recompile with +P, but don't gather new profile data by re-instrumenting the code.

In that sequence of events, optimizations are still performed. However, full profile-based optimizations will be performed only on those procedures whose internal structure has not changed since the profile data was gathered. For procedures whose structure has changed, the following warning message is generated:

ucomp warning: Code for name changed since profile
database file flow.data built.  Profile data for name
ignored.  Consider rebuilding flow.data.

Note that it is possible to make a source code change that does not affect the control flow structure of a procedure, but which does significantly affect the profiling data generated for the program. In other words, a very small source code change can dramatically affect the paths through the program that are most likely to be taken. For example, changing the value of a program constant that is used as a parameter or loop limit value may have this effect. If the user does not re-profile the application after making source code changes, the profile data in the database does not reflect the effects of those changes. Consequently, the transformations made by the optimizer can degrade the performance of the application.

Profile-Based Optimization (PBO) and High-Level Optimization (HLO)

High-level optimization, or HLO, consists of a number of optimizations, including inlining, that are automatically invoked with the +O3 and +O4 compiler options. (Inlining is an optimization that replaces each call to a routine with a copy of the routine's actual code.) +O3 performs HLO on each module while +O4 performs HLO over the entire program and removes unnecessary ADDIL instructions. Since HLO distorts profile data, it is suppressed during the instrumentation phases of PBO.

When +I is specified along with +O3 or +O4, an I-SOM file is generated. However, HLO is not performed during I-SOM generation. When the I-SOM file is linked, using the +P option to do PBO, HLO is performed, taking advantage of the profile data.

Example

The following example illustrates high-level optimization with PBO:

	
$ cc +I +O3 -c file.c  //Create I-SOM for instrumentation.
	
$ cc +I +O3 file.o     //Link with instrumentation.
	
$ a.out < input_file   //Run instrumented executable with 
                       representative input data.
	
$ cc +P +O3 file.o     //Perform PBO and HLO.

Replace +O3 with +O4 in the above example to get HLO over the entire program and ADDIL elimination. (You may see a warning when using +O4 at instrumentation indicating that the +O4 option is being ignored. You can ignore this warning.)

I-SOM File Restrictions

For the most part, there are not many noticeable differences between I-SOM files and ordinary object files. Exceptions are noted below.

ld

Linking object files compiled with the +I or +P option takes much longer than linking ordinary object files. This is because in addition to the work that the linker already does, the code generator must be run on the intermediate code in the I-SOM files. On the other hand, the time to compile a file with +I or +P is relatively fast because code generation is delayed until link time.

All options to ld work normally with I-SOM files with the following exceptions:

-r

The -r option works with both -I and -P. However, it produces an object file, not an I-SOM file. In 64-bit mode, use -I, -P, or the +nosectionmerge option on a -r linker command to allow procedures to be positioned independently. Without these options, a -r link merges procedures into a single section.

-s

Do not use this option with -I. However, there is no problem using this option with -P.

-G

Do not use this option with -I. However, there is no problem using this option with -P.

-A

Do not use this option with -I or -P.

-N

Do not use this option with -I or -P.

nm

The nm command works on I-SOM files. However, because code generation has not yet been performed, some of the imported symbols that may appear in an ordinary relocatable object file does not appear in an I-SOM file.

ar

I-SOM files can be manipulated with ar in exactly the same way that ordinary relocatable files can be.

size

To determine if an object file is an I-SOM file, use the size(1) command. I-SOM files have zero listed for the size of all the sections (text, data and bss (uninitialized data)):.

	$ size foo.o
	0 + 0 + 0 = 0
	
strip

Do not run strip on files compiled with +I or +P. Doing so results in an object file that is essentially empty.

Compiler Options

Except as noted below, all cc, CC, and f77 compiler options work as expected when specified with +I or +P:

-g

This option is incompatible with +I and +P.

-G

This option is incompatible with +I, but compatible with +P (as long as the insertion of the gprof library calls does not affect the control flow graph structure of the procedures.)

-p

This option is incompatible with +I option, but is compatible with +P (as long as the insertion of the prof code does not affect the control flow graph structure of the procedures.)

-s

You must not use this option together with +I. Doing so results in an object file that is essentially empty.

-S

This option is incompatible with +I and +P options because assembly code is not generated from the compiler in these situations. Currently, it is not possible to get assembly code listings of code generated by +I and +P.

-y/+y

The same restrictions apply to these options that were mentioned for -g above.

+o

This option is incompatible with +I and +P. Currently, you cannot get code offset listings for code generated by +I and +P.


Compatibility with 9.0 PBO

PBO is largely compatible between the 9.0 and 10.0 releases of HP-UX.

I-SOM files created under 9.0 are completely acceptable in the 10.0 environment.

However, it is advantageous to re-profile programs under 10.0 in order to achieve improved optimization. Although you can use profile data in flow.data files created under 9.0, the resulting optimization will not take advantage of 10.0 enhancements. In addition, a warning is generated stating that the profile data is from a previous release. See the section called Profiling in this chapter for more information.

See the section called Profiling for more information about the warning generated for profile data generated from a previous release.

Incremental Linking

In the edit-compile-link-debug development cycle, link time is a significant component. The incremental linker can reduce the link time by taking advantage of the fact that you can reuse most of the previous version of the program and that the unchanged object files need not be processed. The incremental linker allows you to insert object code into an output file (executable or shared library) that you created earlier, without relinking the unmodified object files. The time required to relink after the initial incremental link depends on the number of modules you modify.

You can debug the resulting executable or shared library produced by the incremental linker using the gdb debugger with incremental-linking support.

The linker performs the following different modes of linking:

  • Normal link: the default operation mode in which the linker links all modules.

  • Initial incremental link: the mode entered when you request an incremental link, but the output module created by the incremental linker does not exist, or it exists but the incremental linker is unable to perform an incremental update.

  • Incremental link: the mode entered when you request an incremental link, an output module created by the incremental linker exists, and the incremental linker does not require an initial incremental link.

Incremental links are usually much faster than regular links. On the initial link, the incremental linker requires about the same amount of time that a normal link process requires, but subsequent incremental links can be much faster than a normal link. A change in one object file in a moderate size link (tens of files, several megabytes total) normally is about 10 times faster than a regular ld link. The incremental linker performs as many incremental links as allocated padding space and other constraints permit. The cost of the reduced link time is an increase in the size of the executable or shared library.

The incremental linker allocates padding space for all components of the program. Padding makes modules larger than those modules linked by ld. As object files increase in size during successive incremental links, the incremental linker can exhaust the available padding. If this occurs, it displays a warning message and does a complete initial incremental link of the module. When an object file changes, the incremental linker not only replaces the content of that file in the executable or shared library being linked, but also adjusts references to all symbols defined in the object file and referenced by other objects. This is done by looking at relocation records saved in the incrementally linked executable or shared library.

On the initial incremental link, the linker processes the input object files and libraries in the same way as the normal link. In addition to the normal linking process, the incremental linker performs the additional actions:

  • Saves information about all the object files processed during the linking process.

  • Saves information about all global symbol for recreating the linker symbol table.

  • Saves relocations to keep track of symbolic references.

  • Pads text, data, bss, and other sections in the output file with additional space for future expansion.

On subsequent incremental links, the linker uses timestamps and file sizes to determine which object files have changed. It then performs the following actions:

  • Remove and vacate the effect of old versions of the modified object files from the output file.

  • Add contents of the modified object files into the space vacated, or into the padding space when needed.

  • Use the saved relocation information to patch the symbolic references in the rest of the output file.

Under certain conditions, the incremental linker cannot perform incremental links. When this occurs, the incremental linker automatically performs an initial incremental link to restore the process. In the following situations, the linker automatically performs an initial incremental link of the output file:

  • Changed linker command line, where the linker command line does not match the command line stored in the output file. (With the exceptions of the verbose and tracing options)

  • Any of the padding spaces have been exhausted.

  • Modules have been modified by the ld -s or ld -x options or tools (for example, strip(1)). The incremental linking requires the parts of the output load module which are stripped out with these options.

  • Incompatible incremental linker version, when you run a new version of the incremental linker on an executable created by an older version.

  • New working directory, where the incremental linker performs an initial incremental link if current directory changes.

  • Archive or shared libraries are added/removed to/from the linker command line.

  • Objects are added/removed to/from the linker command line.


Using Incremental Linking Options

To use incremental linking from your HP C (cc) or HP aC++ (aCC) compiler, specify the +ild from your compiler command line.

If the output file does not already exist or if it was created without the +ild option, the linker performs an initial incremental link. The output file produced is suitable for subsequent incremental links. The incremental link option is valid for both executable and shared library links. The +ild option is not valid for relocatable links, options (or tools) that strip the output module, and certain optimization options.

The incremental linker support the +ildrelink option to allow you to instruct the incremental linker to ignore the output load module and perform an initial incremental relink. In certain situations (for example, when internal padding space is exhausted), the incremental linker is forced to perform an initial incremental link. You can avoid such unexpected initial incremental links by periodically rebuilding the output file with the +ildrelink option.

The ld command supports additional options with +ild. The +ildnowarn option suppresses all incremental-linking related warning messages. The +ildpad percentage controls the amount of padding (percentage) the incremental linker allocates. You can use these options with the -Wl, arg1... compiler option.

See ld(1) for more information.


Archive Library Processing

The incremental linker searches an archive library if there are unsatisfied symbols. It extracts all archive members satisfying unsats and processes them as new object files. If an archive library is modified, the linker reverts to an initial incremental link.

An object file extracted from an archive library in the previous link remains in the output load module even if all references to symbols defined in the object file have been removed. The linker removes these object files when it performs the next initial incremental link.


Shared Library Processing

In an initial incremental link, the linker scans shared library symbol tables and resolves unsats the same way it does in a regular link. In incremental links, the linker does not process shared libraries and their symbol tables at all and does not report shared library unsats. The dynamic loader detects them at run time. If any of the shared libraries on the command line was modified, the linker reverts to an initial incremental link.


Performance

Performance of the incremental linker may suffer greatly if you change a high percentage of object files.

The incremental linker may not link small programs much faster, and the relative increase in size of the executable is greater than that for larger programs.

Generally, the linker needs to scan through all shared libraries on a link line in order to determine all the unsats, even in incremental links. This process may slow down incremental links. The incremental linker does not scan shared libraries and leaves detection of shared library unsats to the dynamic loader.

It is not recommended that you use the incremental linker to create final production modules. Because it reserves additional padding space, modules created by the incremental linker are considerably larger than those created in regular links.


Note

Any program that modifies an executable (for example, strip strip(1)), may affect the ability of ld to perform an incremental link. When this happens, the incremental linker issues a message and performs an initial incremental link.

Third-party tools that work on object files may have unexpected results on modules produced by the incremental linker.


Reusing Compiled Object Files (PA-RISC)

You can improve compile time performance by using a feature that reuses compiled object files resulting from intermediate object code (I-SOM) generation, incrementally recompiling only those intermediate object code files changed since the prior build. Using the +Oreusedir=dir option, with C, C++, or the linker, enables this feature. The default behavior is no reuse of object files.

The +Oreusedir=dir option specifies a directory where the linker can save object files created from intermediate object files when using +O4 or profile- based optimization. When you compile with +I, +P, or +O4, the compiler generates intermediate code in the object file. Otherwise, the compiler generates regular object code in the object file. When you link, the linker first compiles the intermediate object code to regular object code, then links the object code. With this option you can reduce link time on subsequent links by avoiding recompiling intermediate object files that have already been compiled to regular object code and have not changed.

The dir argument specifies a directory for the reuse repository, where the linker can save object files created from intermediate object files. The dir argument can be an absolute path name or relative to the directory in which the linker was invoked. If pathname does not exist, the reuse mechanism creates a directory with that pathname. If dir exists and is readable, the mechanism reuses the object files deposited there. If dir exists and is writable, the mechanism stores the object files for reuse there whenever an I-SOM is compiled to a regular object file.

The reuse repository can reside on a short file name filesystem, where file names are truncated to 14 characters. You can move or copy the reuse repository to another location, including to another filesystem, and reuse the object file from the new location by compiling with the new dir argument of +Oreusedir. Moving the repository from a long file name filesystem to a short file name filesystem forces recompilation of any object file whose full names are longer than 14 characters. Moving the repository from a short file name filesystem to a long file name filesystem forces a similar recompilation of any object file whose full names are longer than 14 characters.

When you do change a source file or command line options and recompile, a new intermediate object file is created and compiled to regular object code in the specified directory. The reuse mechanism does not remove the previous object file in the directory. You should periodically remove this directory since old object files cannot be reused and are not automatically removed. You can also delete an object file in the reuse repository to force it to be recompiled from an I- SOM.

The reuse mechanism does not rebuild an object file when the flow.data file changes. You must delete those object files stored in the reuse depository if the profile changes significantly enough to degrade the runtime performance of your code.

Improving Performance with the Global Symbol Table

The global symbol table mechanism is designed as a performance enhancement option. Enabling this mechanism causes the creation of a global symbol table which speeds up symbol lookup, by eliminating the need to scan all loaded libraries in order to find a symbol. This is particularly effective for applications with large numbers of shared libraries. This mechanism is off by default.

The global symbol table is implemented using a hash table. Under this mechanism, whenever a library is loaded (either implicitly or by using dlopen() or shl_load()), the mechanism hashes the library's exports and places them into this table. When a library is unloaded, the mechanism looks up the library's exports in the table and removes them.

The hash table does not contain entries for symbols defined by shl_definesym(). User-defined symbols must therefore be handled separately. Enabling the mechanism causes the dynamic loader to use more memory and impacts the performance of the dlopen(), dlclose(), shl_load(), and shl_unload() API calls.

With the global symbol table, the dynamic loader may need to perform a large number of hashing operations to locate symbols. Performing this hash function may cost considerable time, especially when symbol names are very long (C++ programs). To speed up dld, computing hash values can be off-loaded to the linker.

Use the +gst options, +gst, +gstbuckets (PA-32 only), +gstsize, +nodynhash (PA-64 and IPF only), and +plabel_cache, (PA-32 only), to control the behavior of the global symbol table hash mechanism. See the ld(1) and chatr(1) manpages for information on these options.

With these options, you can tune the size of the hash table and number of buckets per entry to reach a balance of performance and memory use. To maximize for performance, tune the table size for an average chain length of one. For maximum memory use, at the expense of performance, tune the size of the table to minimize the number of empty entries. In general, use prime numbers for the table size. The mechanism provides default values of table size, 1103, and number of buckets, 3.

To get statistical information about hash table performance, set the environment variable _HP_DLDOPTS to contain the -symtab_stat option. This option provides a message for each library that contains the following information:

  • Operation (load/unload)

  • Name of library

  • Number of exports

  • Number of entries in table with no stored symbols

  • Average length of non-zero chains

  • Calculated performance of the hash table

  • Amount of memory used by the hash table

Improving Performance with Function Symbol Aliasing

The +afs option supports function symbol aliasing. Often user programs have functions that exactly match the functionality of optimized library functions with a different name. These user- defined functions are usually called frequently in the program. With the +afs option, you can make significant gains in performance by replacing all references to a user-defined function with references to a tuned library function during link time, thus optimizing these functions with just a relink.

The +afs func_sym_x=func_sym_y ... instructs the linker to replace the function symbol with an alternate function symbol in shared library and executable file links.

Both functions must define the same number and type of parameters, and return a value of the same type. If they do not match, the results can be unpredictable, and the linker does not generate a warning message.

Example:

	$ ld  ... +afs func_sym1=func_sym2 ...
	

In the example, the linker replaces all references to the function symbol func_sym1 with references to func_sym2. The func_sym2 symbol must be an normal unaliased symbol. It cannot appear on the left-hand side of "=" on another +afs option.

You can specify more than one function symbol alias on the command line with multiple option-symbol pairs. That is, each symbol pair you specify must be preceded by the +afs option.

Improving Shared Library Start-Up Time with fastbind

The fastbind tool improves the start-up time of programs that use shared libraries. When fastbind is invoked, it caches relocation information inside the executable file. The next time the executable file runs, the dynamic loader uses this cached information to bind the executable instead of searching for symbols.

The syntax for fastbind is:

	fastbind  [-n] [-u] incomplete executable...
	

where:

-n

Removes fastbind data from the executable.

-u

Performs fastbind even when unresolved symbols are found. (By default, fastbind stops when it cannot resolve symbols.)


Using fastbind

You can create and delete fastbind information for an executable file after it has been linked with shared libraries. You can invoke fastbind from the linker or use the fastbind tool directly. You can set the _HP_DLDOPTS environment variable to find out if fastbind information is out-of-date and to turn off fastbind at run time.


Invoking the fastbind Tool

To invoke fastbind on an incomplete executable file, verify that your executable has write access (because fastbind writes to the file) and then run fastbind.

	$ ls -l main
	-rwxrwxrwx   1 janet     191          28722 Feb 20 09:11 main
	$ fastbind main
	

The fastbind tool generates fastbind information for main and rewrites main to contain this information.


Invoking fastbind from the Linker

To invoke fastbind from ld, pass the request to the linker from your compiler by using the -Wl,+fb options. Eexample:

	$ ld -b convert.o volume.o -o libunits.s   //Build the shared library.
	$ cc -Aa -Wl,+fb main.c -o main \           //Link main to the shared
	 
	  libunits.s -lc                           library.  //Perform fastbind.
	

The linker performs fastbind after it creates the executable file.


How to Tell if fastbind Information is Current

By default, when the dynamic loader finds that fastbind information is out-of-date, it silently reverts back to the standard method for binding symbols. To find out if an executable file has out-of-date fastbind information, set the _HP_DLDOPTS environment variable as follows:

	$ export _HP_DLDOPTS=-fbverbose
	$ main
	/usr/lib/dld.sl: Fastbind data is out of date
	

The dynamic loader provides a warning when the fastbind information is out-of-date.


Removing fastbind Information from a File

To remove fastbind information from a file, use the fastbind tool with the -n option. Example:

	$ fastbind -n main                //Remove fastbind information from main. 
	

Turning off fastbind at Run Time

To use the standard search method for binding symbols instead of the fastbind information in an executable file, set the _HP_DLDOPTS environment variable as follows:

	export _HP_DLDOPTS=-nofastbind    //Turns off fastbind at run time. 
	
For More Information

See the fastbind(1) man page.