search    
HP-UX Linker and Libraries User's Guide
Hewlett-Packard
Writing and Generating Position-Independent Code

This chapter is useful mainly to programmers who want to write position-independent assembly language code, or who want to convert existing assembly language programs to be position independent. It is also of interest to compiler developers. This chapter assumes you have a good understanding of virtual memory concepts and memory management.

This chapter discusses the following topics:


Note

This chapter is applicable to only PA-RISC 32-bit applications.

Throughout this chapter, examples of position-independent code (PIC) are shown in assembly code.

For the corresponding information for 64-bit mode, see 64-bit Runtime Architecture for PA-RISC 2.0 available at:

http://devresource.hp.com/drc/STK/docs/archive/pa64rt.pdf

For the corresponding information for IPF applications, see Itanium Software Conventions and Runtime Architecture available at:

http://devresource.hp.com/drc/resources/ia64rt-12-gen.pdf


What is a Relocatable Object Code?

Relocatable object code is machine code that is generated by compilers and assemblers and stored in relocatable object files or .o files. A relocatable object file contains symbolic references to locations defined within the compilation unit as well as symbolic references to locations defined outside the compilation unit. The object file also contains relocation information. The linker uses this information to replace the symbolic references with actual addresses.

For example, if you write a program that references the external variable errno, the object code created by the compiler contains only a symbolic reference to errno because errno is not defined in your program. Only when the linker links this object code does the reference to errno change (relocate) to an absolute address in virtual memory.

If your program defines a global variable, the compiler assigns a relocatable address to that variable. The compiler also marks all references to that variable as relocatable. The linker replaces the references to the variable with the absolute address of the variable.

What is a Absolute Object Code?

Absolute object code is a machine code that contains references to actual addresses within the program's address space. When the linker combines relocatable object files to build a program file or a.out file, it writes absolute object code into the file. Thus, when the program is executed, its routines and data reside at the addresses determined by the linker.

Note that absolute object code does not contain physical addresses. Physical addresses refer to exact locations in physical memory. Instead, absolute object code contains virtual addresses within a process's address space. These virtual addresses are mapped to physical addresses by the HP-UX virtual memory management system.

Program files contain absolute virtual addresses. Hence exec the HP-UX program loader, must always load the code and data into the same location within a process's address space. Because this code always resides at the same location within the address space, and because it contains virtual addresses, it is not suitable for shared libraries, although it can be shared by several processes running the same program.

What is a Position-Independent Code?

Position-independent code (PIC) is a form of absolute object code that does not contain any absolute addresses, and therefore does not depend on where it is loaded in the process's virtual address space. This is an important property for building shared libraries.

In order for the object code in a shared library to be fully shareable, it must not depend on its position in the virtual address space of any particular process. The object code in a shared library may be attached at different points in different processes, so it must work independent of being located at any particular position. Hence the term position-independent code.

Position independence is achieved by two mechanisms: First, PC-relative addressing is used wherever possible for branches within modules. Second, indirect addressing through a per-process linkage table is used for all accesses to global variables, or for inter-module procedure calls and other branches and literal accesses where PC-relative addressing cannot be used. Global variables must be accessed indirectly because they may be allocated in the main program's address space, and even the relative position of the global variables may vary from one process to another.

The HP-UX dynamic loader (see dld.sl(5)) and the virtual memory management system work together to find free space at which to attach position-independent code within a process's address space. The dynamic loader also resolves any virtual addresses that might exist in the library.

Calls to PIC routines are accomplished through a procedure linkage table (PLT), which is built by the linker. Similarly, references to data are accomplished through a data linkage table (DLT). Both tables reside in a process's data segment. The dynamic loader fills in these tables with the absolute virtual addresses of the routines and data in a shared library at run time (known as binding). Because of this, PIC can be loaded and executed anywhere that a process has free space.

On compilers that support PIC generation, the +z and +Z options cause the compiler to create PIC relocatable object code.

Generating Position-Independent Code

To be position-independent, the object code must restrict all references to code and data to either PC-relative or indirect references, where all indirect references are collected in a single linkage table that can be initialized on a per-process basis by dld.sl.

Register 19 (%r19) is the designated pointer to the linkage table. The linker generates stubs that ensure %r19 always points to the correct value for the target routine and that handle the inter-space calls needed to branch between shared libraries.

The linker generates an import stub for each external reference to a routine. The call to the routine is redirected to branch to the import stub, which obtains the target routine address and the new linkage table pointer value from the current linkage table; it then branches to an export stub for the target routine. In 32-bit mode, the linker generates an export stub for each externally visible routine in a shared library or program file. The export stub is responsible for trapping the return from the target routine in order to handle the inter-space call required between shared libraries and program files.


Note

The 64-bit mode linker does not require or support export stubs.


Shown below is the PIC code generated for import and export stubs. Note that this code is generated automatically by the linker. You do not have to generate the stubs yourself.

	;Import Stub (Incomplete Executable)
	X':  ADDIL  L'lt_ptr+ltoff,%dp   ; get procedure entry point
	     LDW    R'lt_ptr+ltoff(%r1),%r21
	     LDW    R'lt_ptr+ltoff+4(%r1),%r19  ; get new r19 value.
	     LDSID  (%r21),%r1
	     MTSP   %r1,%sr0
	     BE     0(%sr0,%r21)     ; branch to target
	     STW    %rp,-24(%sp)     ; save rp
	 
	;Import Stub (Shared Library)
	X':  ADDIL  L'ltoff,%r19    ; get procedure entry point
	     LDW    R'ltoff(%r1),%r21
	     LDW    R'ltoff+4(%r1),%r19   ; get new r19 value
	     LDSID  (%r21),%r1
	     MTSP   %r1,%sr0
	     BE     0(%sr0,%r21)     ; branch to target
	     STW    %rp,-24(%sp)     ; save rp
	 
	;Export Stub (Shared libs and Incomplete Executables)
	X':  BL,N   X,%rp ; trap the return
	     NOP
	     LDW    -24(%sp),%rp     ; restore the original rp
	     LDSID  (%rp),%r1
	     MTSP   %r1,%sr0
	     BE,N   0(%sr0,%rp) ; inter-space return
	
For More Information:

The remainder of this section describes how compilers generate PIC for the following addressing situations:

You can use these guidelines to write assembly language programs that generate PIC object code. For details on assembly language, see the Assembler Reference Manual and PA-RISC 2.0 Architecture.


PIC Requirements for Compilers and Assembly Code

The linkage table pointer register, %r19, must be stored at %sp-32 by all PIC routines. This can be done once on procedure entry. The %r19 linkage table pointer register must also be restored on return from a procedure call. The value must be stored in %sp-32 (and possibly in a callee-saves register). If the PIC routine makes several procedure calls, the routine copies %r19 into a callee-saves register as well, to avoid a memory reference when restoring %r19 upon return from each procedure call. Just like %r27 (%dp), the compilers treat %r19 as a reserved register whenever PIC mode is in effect.

In general, references to code are handled by the linker, and the compilers act differently only in the few cases where they would have generated long calls or long branches. References to data, however, need a new fixup request to identify indirect references through the linkage table, and the code generated changes slightly.


Note

Any code which is PIC or which makes calls to PIC must follow the standard procedure call mechanism.


When linking files produced by the assembler, the linker exports only those assembly language routines that have been explicitly exported as entry (that is, symbols of type ST_ENTRY). Compiler-generated assembly code does not explicitly export routines with the entry type specified. So, the assembly language programmer must ensure that this is done with the .EXPORT pseudo-op.

For example, in assembly language, a symbol is exported using

	 .EXPORT foo, type
	

where type can be code, data, entry, and others. To ensure that foo is exported from a shared library, the assembly statement must be:

	.EXPORT foo,entry
	

Long Calls

Normally, the compilers generate a single-instruction call sequence using the BL instruction. The compilers can be forced to generate a long call sequence when the module is so large that the BL is not guaranteed to reach the beginning of the subspace. In the latter case, the linker can insert a stub. The existing long call sequence is three instructions, using an absolute target address:

	LDIL    L'target,%r1
	     BLE     R'target(%sr4,%r1)
	     COPY    %r1,%rp
	

When the PIC option is in effect, the compilers must generate the following instruction sequence, which is PC-relative:

	BL      .+8,%rp                    ; get pc into rp
	      ADDIL   L'target - $L0 + 4, %rp   ; add pc-rel offset to rp
	      LDO     R'target - $L1 + 8(%r1), %r1
	$L0:  LDSID   (%r1), %r31
	$L1:  MTSP    %r31, %sr0
	      BLE     0(%sr0,%r1)
	      COPY    %r31,%rp
	

Long Branches and Switch Tables

Long branches are similar to long calls, but are only two instructions because the return pointer is not needed:

	LDIL    L'target,%r1
	     BE      R'target(%sr4,%r1)
	

For PIC, these two instructions must be transformed into four instructions, similar to the long call sequence:

	BL      .+8,%r1          ; get pc into r1
	     ADDIL   L'target-L,%r1   ; add pc-relative offset
	L:   LDO     R'target-L,%r1   ; add pc-relative offset
	     BV,N    0(%r1)           ; and branch
	

The only problem with this sequence occurs when the long branch is in a switch table, where each switch table entry is restricted to two words. A long branch within a switch table must allocate a linkage table entry and make an indirect branch:

	LDW     T'target(%r19),%r1  ; load LT entry
	     BV,N    0(%r1)             ; branch indirect
	

Here, the T' operator indicates a new fixup request supported by the linker for linkage table entries.


Assigned GOTO Statements

ASSIGN statements in FORTRAN must be converted to a PC-relative form. The existing sequence forms the absolute address in a register before storing it in the variable:

	LDIL    L'target,tmp
	     LDO     R'target(tmp),tmp
	

This must be transformed into the following four-instruction sequence:

	BL      .+8,tmp         ; get rp into tmp
	     DEPI    0,31,2,tmp      ; zero out low-order 2 bits
	L:   ADDIL   L'target-L,tmp  ; get pc-rel offset
	     LDO     R'target-L(%r1),tmp
	

Literal References

References to literals in the text space are handled exactly like ASSIGN statements (shown above). The LDO instruction can be replaced with LDW as appropriate.

An opportunity for optimization in both cases is to share a single label (L) throughout a procedure, and let the result of BL become a common sub-expression. Thus, only the first literal reference within a procedure is expanded to three instructions. The rest remain two instructions.


Global and Static Variable References

References to global or static variables currently require two instructions either to form the address of a variable, or to load or store the contents of the variable:

	; to form the address of a variable
	     ADDIL   L'var-$global$+x,%dp
	     LDO     R'var-$global$+x(%r1),tmp
	     ; to load the contents of a variable
	     ADDIL   L'var-$global$+x,%dp
	     LDW     R'var-$global$+x(%r1),tmp
	

These sequences must be converted to equivalent sequences using the linkage table pointer in %r19:

	; to form the address of a variable
	     LDW     T'var(%r19),tmp1
	     LDO     x(tmp1),tmp2    ; omit if x == 0
	     ; to load the contents of a variable
	     LDW     T'var(%r19),tmp1
	     LDW     x(tmp1),tmp2
	

Note that the T' fixup on the LDW instruction allows for a 14-bit signed offset, which restricts the DLT to be 16Kb. Because %r19 points to the middle of the DLT, we can take advantage of both positive and negative offsets. The T' fixup specifier must generate a DLT_REL fixup proceeded by an FSEL override fixup. If the FSEL override fixup is not generated, the linker assumes that the fixup mode is LD/RD for DLT_REL fixups. In order to support larger DLT table sizes, the following long form of the above data reference must be generated to reference tables that are larger. If the DLT table grows beyond the 16Kb limit, the linker emits an error indicating that the user must recompile using the +Z option, which produces the following long-load sequences for data reference:

	; form the address of a variable
	     ADDIL   LT'var,%r19
	     LDW     RT'var(%r1),tmp1
	     LDO     x(tmp1),tmp2    ; omit if x == 0
	     ; load the contents of a variable
	     ADDIL   LT'var,%r19
	     LDW     RT'var(%r1),tmp1
	     LDW     x(tmp1),tmp2
	

Procedure Labels

The compilers already mark procedure label constructs so that the linker can process them properly. No changes are needed to the compilers.

When building shared libraries and incomplete executables, the linker modifies the plabel calculation (produced by the compilers in both shared libraries and incomplete executables) to load the contents of a DLT entry, which is built for each symbol associated with a CODE_PLABEL fixup.

In shared libraries and incomplete executables, a plabel value is the address of a PLT entry for the target routine, rather than a procedure address. Hence, $$dyncall must be used when calling a routine with a procedure label. The linker sets the second-to-last bit in the procedure label to flag this as a special PLT procedure label. The $$dyncall routine checks this bit to determine which type of procedure label has been passed, and calls the target procedure accordingly.

In order to generate a procedure label that can be used for shared libraries and incomplete executables, assembly code must specify that a procedure address is being taken (and that a plabel is wanted) by using the P' assembler fixup mode. For example, to generate an assembly plabel, the following sequence must be used:

	LDIL LP'function,%r1
	LDO RP'function(%r1), %r22
	; Now to call the routine
	BL $$dyncall, %r31 ; r22 is the input register for $$dyncall
	COPY %r31, %r2
	

This code sequence generates the necessary PLABEL fixups that the linker needs in order to generate the proper procedure label. The dyncall millicode routine in /usr/lib/milli.a must be used to call a procedure using this type of procedure label, (that is, a BL or BV does not work).