The SH FDPIC ABI Joseph Myers CodeSourcery, Inc. May 11, 2010 Version 1.0 Based on FR-V FDPIC ABI Version 1.0a by Kevin Buettner, Alexandre Oliva and Richard Henderson, adapted for SH by Joseph Myers. Introduction ------------ This document describes extensions (and some minor changes) to the existing SH ELF ABI (as used on GNU/Linux) required to support the implementation of shared libraries on a system whose OS (and hardware) require that processes share a common address space. This document will also attempt to explore the motivations behind and the implications of these extensions. One of the primary goals in using shared libraries is to reduce the memory requirements of the overall system. Thus, if two processes use the same library, the hope is that at least some of the memory pages will be shared between the two processes resulting in an overall savings. To realize these savings, tools used to build a program and library must identify which sections may be shared and which must not be shared. The shared sections, when grouped together, are commonly referred to as the "text segment" whereas the non-shared (grouped) sections are commonly referred to as the "data segment". The text segment is read-only and is usually comprised of executable code and read-only data. The data segment must be writable and it is this fact which makes it non-sharable. Systems which utilize disjoint address spaces for its processes are free to group the text and data segments in such a way that they may always be loaded with fixed relative positions of the text and data segments. I.e, for a given load object, the offset from the start of the text segment to the start of the data segment is constant. This property greatly simplifies the design of the shared library machinery. The design of the shared library mechanism described in this document does not (and cannot) have this property. Due to the fact that all processes share a common address space, the text and data segments will be placed at arbitrary locations relative to each other and will therefore need a mechanism whereby executable code will always be able to find its corresponding data. One of the CPU's registers is typically dedicated to hold the base address of the data segment. This register will be called the "FDPIC register" in this document. Such a register is sometimes used in systems with disjoint address spaces too, but this is for efficiency rather than necessity. The fact that the locations of the text and data segments are at non-constant offsets with respect to each other also complicates function pointer representation. As noted above, executable code must be able to find its corresponding data segment. When making an indirect function call, it is therefore important that both the address of the function and the base address of the data segment are available. This means that a function pointer needs to represented as the address of a "function descriptor" which contains the address of the actual code to execute as well as the corresponding data (FDPIC register) address. FDPIC Register -------------- The FDPIC register is used as a base register for accessing the global offset table (GOT) and function descriptors. Since both code and data are relocatable, executable code may not contain any instruction sequences which directly encode a pointer's value. Instead, pointers to global data are indirectly referenced via the global offset table. At load time, pointers contained in the global offset table are relocated by the dynamic linker to point at the correct locations. Register R12 is used as the FDPIC register; in this specification it is caller-save, not callee-save, to avoid problems with PLT entries needing to save the register. Upon entry to a function, the caller saved register R12 is the FDPIC register. As described above, it contains the GOT address for that function. R12 obtains its value in one of three ways: 1) By being inherited from the calling function in the case of a direct call to a function within the same load module. 2) By being set either in a PLT entry or in inlined PLT code. 3) By being set from a function descriptor as part of an indirect call. The specifics associated with each of these cases are covered in greater detail in "Procedure Linkage Table (PLT)" and "Function Calls", below. The prologue code of a non-leaf function should save R12 either on the stack or in one of the callee-saved registers. After each function call, R12 must be restored if it is needed later on in the function. Direct calls to functions in the same load module and direct calls which are routed through a PLT entry require that R12 be restored. Calls which use inlined PLT code and indirect calls may be able to avoid using R12; such calls will need to use some other register in which the GOT address has been saved, however. A leaf function makes no calls and need not save R12. Note that once a function has moved R12 to one of its callee saved registers, the function is then free to use that register as the FDPIC register for accessing data. This is why the sections describing relocations are careful to specify FDPIC-relative references instead of R12-relative references. It's envisioned (though not mandated) that the GOT entries are located at positive FDPIC-based offsets and that function descriptors are found at negative offsets to FDPIC. Function Descriptors -------------------- A number of programs assume that pointers to functions are as wide as pointers to data, even though programming languages don't require this. However, two words are needed to represent a function pointer meaningfully: not only is the function's entry point required, but also some context information that enables the function to find the corresponding data segment in the current process. Such context information is given in the form of a pointer to the GOT in FDPIC (which is R12 upon entry to a function). In order to keep pointers to functions as 32-bit values, while adding context information to them, we introduce function descriptors, such that, when the address of a function is taken, the address of its descriptor is obtained. As shown below, the descriptor contains pointers to both the function's entry point and its GOT. A load module will also likely contain a number of private function descriptors which are used in conjunction with a corresponding PLT entry (or inlined PLT code) for calling a function. A function descriptor consists of two 4-byte words: 1) The "entry point" at offset 0 contains the text address of the function. This is the address at which to start executing the function. 2) The "GOT address" at offset 4 contains the value to which the FDPIC register must be set when executing the function. Each direct function call requiring a PLT entry (or which uses inlined PLT code) requires a function descriptor stored in the data segment. Each private function descriptor needs to be initialized using a 64-bit relocation which fills in both the function entry point and GOT address. The R_SH_FUNCDESC_VALUE relocation is used for this purpose. Function Addresses ------------------ When a function address is required, the address of an "official" (or canonical) function descriptor is used. Descriptors corresponding to static, non-overridable functions are allocated by the link editor and are initialized at load time via the R_SH_FUNCDESC_VALUE relocation. The dynamic linker is responsible for allocating and initializing all other "official" function descriptors. As described above, a function's address is actually the address of a function descriptor, not that of the function's entry point. As is the case with other kinds of pointers, executable code obtains the values of pointer constants via the global offset table. The R_SH_FUNCDESC relocation (see below) is used in global offset table entries and initialized data to obtain the addresses of function descriptors used for representing function addresses. Note: This document borrows many of the concepts and terminology related to function addresses and their descriptors from the IA-64 System V ABI [1, 2]. Procedure Linkage Table (PLT) ----------------------------- In order to make direct calls to a function external to a given load module, the branch instruction's target is a PLT entry. (Calls to internal, but overridable functions also need PLT entries.) The PLT entry contains instructions for fetching the function's start address and global pointer value from a function descriptor associated with the function in question. The function descriptor will be located at a fixed offset from the address specified by the FDPIC register. The instructions in a PLT entry could look like this: plt(foo): mov.l .L1, r0 mov.l @(r0,r12), r1 add #4,r0 jmp @r1 mov.l @(r0,r12), r12 nop .L1: .long foo@GOTOFFFUNCDESC The function address is loaded into r1, then the function is called, with the new FDPIC register value being loaded in the branch delay slot. If the caller needs its FDPIC register value again, it must save and restore it around the call. On the SH-2A, the movi20 instruction supports 20-bit sign-extended immediate operands, and so the following may be used: plt(foo): movi20 #foo@GOTOFFFUNCDESC, r0 mov.l @(r0,r12), r1 add #4,r0 jmp @r1 mov.l @(r0,r12), r12 In order to accomplish "lazy dynamic linking" (see below), r1 must be set to the entry point address found in the function descriptor. The PLT entry must load the entry address from the descriptor before loading the FDPIC register, to avoid a race condition (see below). Dynamic Linker Reserve Area --------------------------- The linker reserves three words starting at the location pointed to by the FDPIC register for use by the dynamic linker. The first two words comprise a function descriptor for invoking the resolver used in lazy dynamic linking. The third (at R12+8) is used by the dynamic linker and the debugger to obtain access to information regarding the loaded module and the amount that each segment has been relocated by. Lazy Procedure Linkage ---------------------- Lazy procedure linkage requires an additional PLT fragment for each dynamic function that requires a local descriptor in the module. These entries are not large, but their aggregate will increase the size of the text segment. For this reason, the use of lazy dynamic linking is optional. (Implementation of lazy dynamic linking in the dynamic linker is mandatory, however.) A lazy PLT fragment looks like this: .long funcdesc_value_reloc_offset(foo) lazy_plt(foo): bra resolverStub nop The code for "resolverStub" looks like this: resolverStub: mov.l @r12, r0 jmp @r0 mov.l @(4,r12), r3 The link editor adds as many "resolverStub" fragments as necessary to ensure that the branch in each lazy PLT fragment is within range. It is also possible to inline the resolverStub instructions as follows: .long funcdesc_value_reloc_offset(foo) lazy_plt(foo): mov.l @r12, r0 jmp @r0 mov.l @(4,r12), r3 nop Lazy PLT fragments have word (32-bit) alignment. Function descriptors residing in the GOT are initialized so that the entry point is that of the corresponding lazy PLT entry address. The function descriptor's GOT address is initialized to the GOT address for the load module itself. These initializations occur as the result of the dynamic linker performing R_SH_FUNCDESC_VALUE relocations (located in the .rel.plt section) at load time. Thus a function call to an unresolved function will go through the lazy PLT fragment for that function as a result of picking up the lazy PLT entry point from the function descriptor. The lazy PLT fragment immediately branches to "resolverStub", a special PLT entry which uses the dynamic linker reserve area (see above) to cause execution to be transferred to the actual resolver without disturbing either R1 or R12. Branches always go to PLT entries, not directly to the resolver stubs. Upon entry to the actual (lazy) resolver, the following register values are important: R0 -- the address of the resolver itself R3 -- the GOT address (FDPIC value) for the resolver's GOT R1 -- the address of the lazy PLT entry being resolved R12 -- the GOT address for the caller's GOT or sometimes for the called function's GOT (see below) The resolver must take care not to modify the argument registers or the callee-saved registers, or if it does, to restore them to their original state when it's done. The resolver uses the word at R1 - 4 (that is @(-4,r1) ) which is an offset to a R_SH_FUNCDESC_VALUE relocation. This offset is relative to the value (address) associated with the DT_JMPREL tag in the dynamic section. (Tags related to DT_JMPREL are DT_PLTRELSZ and DT_PLTREL. The value associated with DT_PLTRELSZ provides the size of this section. The value associated with DT_PLTREL must be set to DT_RELA indicating that Elf32_Rela structs are used to hold the relocation information.) The R_SH_FUNCDESC_VALUE relocation provides the offset to the function descriptor to update and the symbol table index of the function to resolve. Assuming the resolver completes successfully, it will perform the following actions prior to transferring control to the entry point of the resolved function: 1) Fill in the function descriptor in the caller's GOT so that the entry point and GOT address are correct for the next call of the resolved function. As in the Blackfin FDPIC ABI, there is a race condition between both words getting written and some other thread attempting to read them, and no atomic 64 bit load/store instruction that could be used to prevent it. To avoid problems arising from this race, when function descriptors are read the entry point must be read before the FDPIC pointer, and when the resolver writes them it must write the new FDPIC pointer before writing the new entry point. This leaves the possibility of a lazy PLT entry (and so the resolver) being called with the FDPIC register pointing to the GOT for the load module containing the called function instead of the load module containing the call and GOT and PLT entries, if the call is made after the resolver was interrupted between updating the two words of the function descriptor; the resolver must allow for this possibility. In addition, the resolver may need to use locking to ensure that two different threads are not updating a function descriptor at the same time to point to functions in two different load modules. 2) Set R12 to the GOT address of the resolvee's GOT. Function Calls -------------- Direct function calls are performed as follows: "set up arguments as on GNU/Linux with MMU" "load function address into a register" "call loaded address" "restore any needed "caller saves" registers" The "call loaded address" pseudo-instruction will either transfer control directly to the function's entry point (for calls to functions in the same load module) or will transfer control to the function's PLT entry if one is needed. Since PLT entries reference R12, a function must ensure that R12 is set correctly prior to making a function call. Inlined PLT code may be able to make use of the FDPIC value stored in another register - thus avoiding the need for setting R12. However, it would significantly enlarge the code size. Indirect calls are performed by loading the entry point and GOT address from the function descriptor into R1 and R12, respectively. The same atomicity issues apply as when these are loaded from a PLT entry, so again the entry point address must be loaded first. Control is transferred via a jsr or jsr/n instruction to the function's entry point, possibly a lazy PLT fragment. The call site for an indirect function call might look like this: "set up arguments as on GNU/Linux with MMU" "load function descriptor address into a register" "load entry point and GOT address from function descriptor" into R1 and R12" "call loaded entry point" "restore any needed "caller saves" registers" Global Data and the Global Offset Table (GOT) --------------------------------------------- As noted earlier, position independent code must not contain any instruction sequences which directly encode a reference to global data. If they did so, load time relocations would be necessary to adjust these addresses. Also, any reference to a address in a non-shared segment would force the executable segment in question to be non-sharable. The global offset table (GOT) contains words which hold the addresses of global data. In order to access these global data, position independent code must first use an FDPIC-relative load instruction to fetch the data address from the GOT. The data structure is then accessed as necessary using the address obtained from the GOT. It is envisioned that the various GOT related structures might look something like this: +-----------------------+ <--------------------\ | . | | . | | . | | +-----------------------+ | | | | +- Func Descr #2 -+ | | | | +-----------------------+ | | | | +- Func Descr #1 -+ | | | | +-----------------------+ <---\ | FDPIC -----> | | | | +- Resolver Descriptor -+ Dynamic Linker | | | Reserve Area | +-----------------------+ | | | link_map pointer | | | +-----------------------+ <---/ Global | Global Data Addr #1 | Offset +-----------------------+ Table | Global Data Addr #2 | (GOT) +-----------------------+ | | Global Data Addr #3 | | +-----------------------+ | | . | | . | | . | | +-----------------------+ <--------------------/ The link-editor is responsible for determining the precise layout of the GOT. The only hard requirements are the following: (a) FDPIC must point at the first word of the dynamic linker reserve area. (b) The global offset table must reside in a non-shared segment. In the picture above, function descriptors are placed at negative offsets relative to R12 and the GOT data address entries are placed at positive offsets relative to R12. The link editor is free to place either the function descriptors at positive offsets or the data address entries at negative offsets. Also, note that there is no requirement that the function descriptors or data address entries have any particular grouping. GOT initialization is performed at load time by the dynamic linker. In order to accomplish these initializations, the dynamic linker uses relocations that have been placed in the object file by the link editor. These relocations (as already defined for non-FDPIC) may cause addresses of other global data in other load modules to be resolved or the relocation may refer to data within the same load module. (For function descriptors, the R_SH_FUNCDESC_VALUE relocation is used. This relocation is described in greater detail below.) Each load module has a symbol _GLOBAL_OFFSET_TABLE_ which resolves to the GOT address for that load module. The DT_PLTGOT dynamic section entry in each load module contains the GOT address also. The GOT address points to the dynamic linker reserve area. The simplest way to load the address of a data object, on all SH variants, is: mov.l .L1, r0 mov.l @(r0,r12), rN .L1: .long foo@GOT On SH-2A, the movi20 instruction may be used: movi20 #foo@GOT, r0 mov.l @(r0,r12), rN If data symbol bar is known to be local to the translation unit, or to have internal, hidden or protected (but not global) visibility, different sequences can be used that assume the symbol to be located at a fixed offset within the text or data segments. These sequences avoid the need for a GOT entry for bar. If the symbol is known to be in the .data section, the following sequence computes the address of bar: mov.l .L1, rN add r12, rN .L1: .long bar@GOTOFF On SH-2A, the following may be used: movi20 #foo@GOTOFF, rN add r12, rN If the symbol is known to be in the .rodata section (that is mapped to the text segment), PC-relative relocations have to be used instead. The @PCREL assembler operator is defined for this purpose. For example: mova .L1, r0 mov.l .L1, rN add r0, rN .L1: .long foo@PCREL Taking the address of a function descriptor can be accomplished with the following sequences: mov.l .L1, r0 mov.l @(r0,r12), rN .L1: .long foo@GOTFUNCDESC or on SH-2A: movi20 #foo@GOTFUNCDESC, r0 mov.l @(r0,r12), rN If the function is local to a translation unit, or is known to have internal or hidden (but not protected or global) visibility, the canonical function descriptor of the function will be in the module, so it is possible to avoid the need for a GOT entry containing the address of the function descriptor, by using code sequences like: mov.l .L1, rN add r12, rN .L1: .long foo@GOTOFFFUNCDESC or on SH-2A: movi20 #foo@GOTOFFFUNCDESC, rN add r12, rN Global-scope variable initialized with a pointer to a function causes code like this to be generated: bar: .long foo@FUNCDESC Variables initialized with pointers (to data or code) must not be assigned to read-only segments; the dynamic linker will need to set up the pointers at module load time. Preexisting Relocation Types ---------------------------- The existing relocations implemented by the GNU linker may be used with FDPIC code with their existing semantics, although some may not be useful in this context. When an existing relocation is applied to a function symbol, it is taken to refer to the function entry point (possibly a PLT entry), not to a function descriptor. New Relocations --------------- The following are new relocation types for supporting position independent code with function descriptors. Name Value Meaning ---- ----- ------- R_SH_GOT20 201 Used for the FDPIC-relative offset to a GOT entry for a symbol in a movi20 instruction. R_SH_GOTOFF20 202 Used for the FDPIC-relative offset to a data object in a movi20 instruction. R_SH_GOTFUNCDESC 203 Used for the FDPIC-relative offset to a GOT entry containing a pointer to a function descriptor for a symbol. R_SH_GOTFUNCDESC20 204 Likewise, in a movi20 instruction. R_SH_GOTOFFFUNCDESC 205 Used for the FDPIC-relative offset to the function descriptor itself. R_SH_GOTOFFFUNCDESC20 206 Likewise, in a movi20 instruction. R_SH_FUNCDESC 207 Used for a pointer to an "official" function descriptor, in both GOT entries and user-initialized data. R_SH_FUNCDESC_VALUE 208 Used to fill in function entry point and GOT address in private function descriptors The dynamic loader needs to adjust or "fix up" portions of the data segment due to it being dynamically located. The various dynamic relocation entries tell the dynamic loader how to do this. The text segment is dynamically located too, but it is read-only and must not have any relocation entries associated with it. New dynamic relocations have the following types: R_SH_FUNCDESC and R_SH_FUNCDESC_VALUE. The precise interpretation given to these relocation types by the dynamic linker is described in the following paragraphs. R_SH_FUNCDESC -------------- The R_SH_FUNCDESC relocation is used to obtain the address of an "official" function descriptor from the dynamic linker. The "r_offset" field contains the location (offset) of the word which must receive this address. The "r_info" field contains an encoding of the symbol table index corresponding to the function to resolve. The dynamic linker resolves the function and determines the address of the corresponding official descriptor, allocating and initializing it as necessary. (It is the dynamic linker's responsibility to allocate and initialize all official descriptors.) The address of the official descriptor is written to the location specified by "r_offset". Note: This relocation is always expected to reference symbols for which the dynamic linker is expected to create an "official descriptor". References to descriptors (for static or hidden functions) which are allocated and initialized by the link editor are handled via pre-existing relocations. R_SH_FUNCDESC_VALUE -------------------- The R_SH_FUNCDESC_VALUE relocation is used to initialize both words of a function descriptor. The "r_offset" member (in an Elf32_Rel struct) specifies the location of the descriptor to initialize. The "r_info" member encodes both the number associated with the R_SH_FUNCDESC_VALUE type and a symbol table index. Support for lazy binding is accomplished by R_SH_FUNCDESC_VALUE relocations residing in the .rel.plt section. The symbol index encoded in "r_info" corresponds to the symbol to resolve. In the descriptor itself, the link editor sets the low word to the address of the lazy PLT entry which, when executed, will ultimately resolve the symbol. The high word is set to the index of the segment containing the lazy PLT code. Relocations in .rel.plt are potentially processed twice, once at load time to fix up the offset so that the function descriptor really points at the lazy PLT entry, and possibly later on, as a result of the code in the lazy PLT entry being run, forcing actual binding to be done. Note: The environment variable "LD_BIND_NOW" may be set to a non-null value to force binding to occur at load time. When "LD_BIND_NOW" is used for this purpose, the descriptor's contents are ignored, and the relocations are only processed once. R_SH_FUNCDESC_VALUE relocations found outside of .rel.plt are used either for non-lazy binding support (forced at compile/link time) or for static function descriptor initializations. These cases will be considered separately. Relocations used for resolving external functions (in a non-lazy manner) have the symbol index encoded in "r_info" set to correspond to symbol to resolve. The descriptor contents are irrelevant and are ignored. The function corresponding to the symbol index is resolved and the entry point and GOT address for that function are written to the descriptor. The R_SH_FUNCDESC_VALUE relocation is also used to initialize function descriptors used as addresses for static, non-overridable functions. When used for this purpose, the "r_info" member encodes the symbol table index for the section in which the function is found. The low word of the descriptor contains the offset to the function and the high word contains the segment index. The segment index can be used to speed up the computation of the address of the symbol, if the dynamic linker maintains internally an array that maps a segment number to the offset by which it was relocated. Such a map is not required, though, and the dynamic linker is free to ignore segment index information. Assembler operators ------------------- Below is a list of additional operators for writing assembly code. The existing @GOT and @GOTOFF operators are also extended to be usable in movi20 instructions. Name Corresponding relocations ---- ------------------------- @PCREL R_SH_REL32 @GOTFUNCDESC R_SH_GOTFUNCDESC, R_SH_GOTFUNCDESC20 @GOTOFFFUNCDESC R_SH_GOTOFFFUNCDESC, R_SH_GOTOFFFUNCDESC20 @FUNCDESC R_SH_FUNCDESC ELF Header ---------- The SH processor specific flag for the "e_flags" field in the ELF header which indicates the use of this ABI is EF_SH_FDPIC. The value for this flag is 0x00008000. A flag EF_SH_PIC, value 0x00000100, is also defined. When both EF_SH_FDPIC and EF_SH_PIC are set, it means each segment of the binary can be loaded at an arbitrary address, which means sharing of text segments is possible. If EF_SH_FDPIC is set but EF_SH_PIC is clear, all segments must be relocated by the same amount. The linker should warn and clear EF_SH_PIC when linking FDPIC binaries if it finds any inter-segment relocation, and set it otherwise. Examples of inter-segment relocations are a PC-relative relocation referencing a symbol that is not in the text segment, or a GOTOFF relocation referencing a symbol that is not in the data segment. Start up -------- At the program's entry point, the stack pointer must be set to an address close to the end of the stack segment. The size of the stack segment is specified by the PT_GNU_STACK program header, and is derived from the value of the symbol __stacksize, that can be defined to an absolute value when linking a program. The default stack size is 128Kb. Starting at the address pointed to by sp, the program should be able to find its arguments, environment variables, and auxiliary vector table and load maps. Here's what the stack looks like: sp: argc sp+4: argv[0] ... sp+4*argc: argv[argc-1] sp+4+4*argc: NULL sp+8+4*argc: envp[0] ... NULL The NULL terminator of envp is immediately followed by the Auxiliary Vector Table. Each entry is a pair of words, the first being an entry type, the second being either an integer value or a pointer. An entry type of value zero (AT_NULL) marks the end of the auxiliary vector. Load maps go somewhere on the stack. They use the following data structure: struct elf32_fdpic_loadmap { /* Protocol version number, must be zero. */ Elf32_Half version; /* Number of segments in this map. */ Elf32_Half nsegs; /* The actual memory map. */ struct elf32_fdpic_loadseg segs[/*nsegs*/]; }; /* This data structure represents a PT_LOAD segment. */ struct elf32_fdpic_loadseg { /* Core address to which the segment is mapped. */ Elf32_Addr addr; /* VMA recorded in the program header. */ Elf32_Addr p_vaddr; /* Size of this segment in memory. */ Elf32_Word p_memsz; }; At program start-up, register r8 should hold a pointer to a struct elf32_fdpic_loadmap that describes where the kernel mapped each of the PT_LOAD segments of the executable. At start-up of an interpreter for another program (e.g., ld.so), r9 will be set to the load map of the interpreter, and r10 will be set to a pointer to the PT_DYNAMIC section of the interpreter, if it was mapped as part of any loadable segment, or 0 otherwise. In the absence of an interpreter, r9 will be 0, and r10 will be the main program's PT_DYNAMIC address. All other callee-saved registers are supposed to be initialized to 0 by the kernel before it transfers control to userland, but applications shouldn't rely on this (except for r11, see below) since future extensions of the ABI may assign other meanings to these registers. Caller-saved registers have indeterminate value. Both static and dynamic executables are responsible for self-relocating and initializing the PIC register. Self-relocation is accomplished by adjusting, according to the link map stored in r8, every pointer in the range [__ROFIXUP_LIST__,__ROFIXUP_END__-4). The addresses of __ROFIXUP_LIST__ and __ROFIXUP_END__ can be computed by means of PC-relative addressing, since they are known to be in the text segment. The pointers in the .rofixup section are created by the linker; FDPIC object files should not contain .rofixup sections. The linker emits rofixup entries in static or dynamic executables that are not linked with -pie wherever it would emit a dynamic relocation in PIEs or dynamic libraries. The linker also emits, as the last entry of the .rofixup section, the value of the _GLOBAL_OFFSET_TABLE_ symbol. The code that performs self-relocation should not dereference this last entry to relocate its contents; instead, it should simply compute the relocated value of the entry itself, thus obtaining the PIC register value without using any non-PIC or inter-segment relocation, that would force the executable to relocate as a unit. In case a dynamic loader is used, it may set r11 to the address of a function descriptor that represents a function to be called at program termination time. The dynamic loader, however, must not depend on this function being called for proper termination. The dynamic loader may change the stack pointer such that it is not aligned to a double-word boundary, but rather to a single-word boundary. It is recommended that every program's start up code adjusts the stack pointer after obtaining the program arguments from the top of the stack. Chunks of code inserted in .init and .fini sections (_init and _fini functions, respectively) must not assume r12 to hold the value of the PIC register. _init and _fini prologues are expected to save the initial r12 at @(fp,8), the initial r14 at @(fp,4) and the initial pr at @(fp,0). Debugger Support - Overview --------------------------- Debugger support is substantially different from what is normally done on GNU/Linux for the following reasons: 1) The usual method for finding the dynamic linker data structures won't work since the text and data area for the main program itself are dynamically located. Normally, the debugger is able to find the address of the executable's sections by looking in the executable itself. This, in turn allows the debugger to find the dynamic section in which it looks for the value of the DT_DEBUG tag. The DT_DEBUG value provides the debugger with the address of the r_debug struct which, in turn, provides access to the necessary relocation information for shared objects. But, since none of this will work, an alternate method must be found for locating the dynamic linker data structures. 2) The debugger must relocate different sections by different amounts due to the fact that the text and data areas (and perhaps other sections too) are relocated independently. The dynamic linker's debug interface must allow the debugger to find out how much each section has been relocated by. 3) It must be possible for the debugger to attach to a process at an arbitrary point of its execution. 4) Text areas are truly shared among processes which means there must be some sort of kernel level support for breakpoints. Debugger Support - Locating the Dynamic Linker's Data Structures ---------------------------------------------------------------- In a given process, for all possible values of FDPIC (which is in R12 at function entry time), the word at FDPIC+8 - which is in the dynamic linker reserve area - contains a pointer to the dynamic linker's data structures. This means that each data area for a shared library or the main executable in a given process contains a pointer to dynamic linker data structures describing the various load objects and their relocations. Unfortunately, R12 may not keep its value throughout the execution of a function. It may be overwritten and used for any other computation. If it's needed again, it can be copied to another register or to a stack slot. It might be possible for the debugger to locate the PIC value at such alternate locations by using call-frame debug information, but to do so, it would need the PC value as in the executable, not the relocated PC value in the memory location the kernel chose to map the text segment of the executable, or of any of the shared libraries it may have been linked with. To enable a debugger to find where an executable is located in memory, the initial load maps that the kernel passes to the program in r8 and r9 are made available with ptrace calls, as described below: #define PTRACE_GETFDPIC 31 /* get the ELF fdpic loadmap address */ #define PTRACE_GETFDPIC_EXEC ((void*)0) /* [addr] request the executable loadmap */ #define PTRACE_GETFDPIC_INTERP ((void*)1) /* [addr] request the interpreter loadmap */ struct elf32_fdpic_loadmap *x; ptrace (PTRACE_GETFDPIC, pid, PTRACE_GETFDPIC_EXEC /* or _INTERP */, &x); With these maps plus the executable (and/or interpreter) symbol table, the debugger can locate the program's GOT in memory, and thus obtain the link_map doubly-linked list (see below), from which it can obtain the loadmaps of all loaded modules. Obtaining r_debug requires the dynamic loader's link map and symbol tables only, to locate the _dl_debug_addr symbol defined in the dynamic loader. If there is no dynamic loader, or if it hasn't got to the point at which it sets up the main program's GOT reserve area, r_debug won't be available. Debugger Support - Data structures ---------------------------------- The word at R12+8 is a pointer to a struct of the following form: struct link_map { /* These first few members are part of the protocol with the debugger. This is the same format used in SVR4. */ struct elf32_fdpic_loadaddr l_addr; char *l_name; /* Absolute file name object was found in. */ ElfW(Dyn) *l_ld; /* Dynamic section of the shared object. */ struct link_map *l_next, *l_prev; /* Chain of loaded objects. */ }; Where l_addr's type definition is: struct elf32_fdpic_loadaddr { struct elf32_fdpic_loadmap *map; void *got_value; }; (struct elf32_fdpic_loadaddr is the type of field dlpi_addr in struct dl_phdr_info as well) _dl_debug_addr (a global symbol defined in the dynamic loader) is a pointer to the following type: struct r_debug { int r_version; /* Version number for this protocol. */ struct link_map *r_map; /* Head of the chain of loaded objects. */ /* This is the address of a function internal to the run-time linker, that will always be called when the linker begins to map in a library or unmap it, and again when the mapping change is complete. The debugger can set a breakpoint at this address if it wants to notice shared object mapping changes. Being a pointer to a function, it is actually a pointer to a function descriptor. */ ElfW(Addr) r_brk; enum { /* This state value describes the mapping change taking place when the "r_brk" address is called. */ RT_CONSISTENT, /* Mapping change is complete. */ RT_ADD, /* Beginning to add a new object. */ RT_DELETE /* Beginning to remove an object mapping. */ } r_state; ElfW(Addr) r_ldbase; /* GOT pointer of the dynamic loader. */ }; The version number for this protocol will be 1. Debugger Support - Finding GOT Addresses ---------------------------------------- The field "got_value" in the link_map struct provides the debugger with the GOT address for all functions in the load module described by that link_map entry. Debugger Support - Breakpoint Considerations -------------------------------------------- Debugger applications implement software breakpoints by causing a trap instruction to be written at the address at which a breakpoint is desired. (The debugger will first fetch the contents of the location under consideration so that it may be restored when the breakpoint is removed.) In order to implement software breakpoints, the text sections for the process being debugged must reside in writable memory. It is okay for the text section of non-debugged processes to reside in read-only memory, but some provision must be made to run a process being debugged in read/write memory. Furthermore, this determination must be made at the time the process is started. (Trying to migrate a running process from read-only to read/write memory would involve attempting to fix text section pointers on the stack and heap.) When a process that is being ptrace()d runs exec()s, the kernel must not share the text segment of the newly-exec()ed program, nor those of an interpreter it might require. Also, the mmap() system call must not share text segments used by libraries of such a process, which it would normally do in response to the presence of MAP_EXECUTABLE and MAP_DENYWRITE in the flags passed to mmap(). This arrangement will not make processes that the debugger attaches to after they are mapped in look like they have independent sets of breakpoints; they may just crash instead, if they reach a breakpoint instruction set with ptrace for another process. The ABI does not specify any support for this case; if required, kernel interfaces to insert or remove a breakpoint at a specified address could be added. The kernel would have responsibility to remove and replace them at context switches, and would refuse to insert breakpoints for code running execute-in-place (XIP) from ROM. Provisioning for Native Posix Thread Library -------------------------------------------- The Native Posix Thread Library (NPTL) requires a register to be used as the thread context pointer. Register GBR is reserved for this purpose, as on GNU/Linux. Revision History ---------------- Version 1.0 (11 May 2010): - Relocation numbers changed to start at 201. - Debug ABI finalized. - Various corrections (mainly to assembler examples). Version 0.2 (17 March 2008): - Various changes after comments on initial draft. Version 0.1 (25 February 2008): - Initial draft for public comment. References ---------- [1] "IA-64 Software Conventions and Runtime Architecture Guide", Intel, 2000, pp. 8-1 thru 8-4. [2] "Unix System V Application Binary Interface" (for IA-64), Intel, 2000, pp. 5-4 thru 5-9. [3] FR-V FDPIC ABI . [4] Blackfin FDPIC ABI . Copyright 2008, 2010 CodeSourcery, Inc. Based on FR-V FDPIC ABI Version 1.0a, Copyright 2004 Red Hat, Inc. This specification is licensed under the Open Publication License, version 1.0 with the further limitation that distribution of substantively modified versions of this specification is prohibited without the explicit permission of the copyright holder. Adaptation of the specification to a specific processor is not considered a substantive modification, and the copyright holder grants express permission for such adaptations. Such adaptations should be attributed as this specification as adapted for the specific processor. Further, the copyright holder grants permission to copy and modify text from this specification into a new specification so long as the new specification is not identified as being related to or a modification of this specification or in any way endorsed by the copyright holder.