Power Architecture® 32-bit Application
Binary Interface Supplement 1.0 -
Linux®

Ryan S. Arnold
IBM
Greg Davis
Green Hills
Brian Deitrich
Freescale Semiconductor
Michael Eager
Eager Consulting
Emil Medve
Freescale Semiconductor
Steven J. Munroe
IBM
Joseph S. Myers
CodeSourcery
Steve Papacharalambous
Freescale Semiconductor
Anmol P. Paralkar
Freescale Semiconductor
Katherine Stewart
Freescale Semiconductor
Edmar Wienskoski
Freescale Semiconductor
The ATR-LINUX portions of this document are derived from the 64-bit PowerPC ELF Application Binary Interface Supplement 1.8, originally written by Ian Lance Taylor under contract for IBM, with later revisions by: David Edelsohn, Torbjorn Granlund, Mark Mendell, Kristin Thomas, Alan Modra, Steve Munroe, and Chris Lorenze.

The ATR-TLS and ATR-SECURE-PLT sections of this document are original contributions of IBM written by Alan Modra and Steven Munroe.

The ATR-SPE and ATR-EABI portions of this document are derived from material used to write the E500 ABI and are contributed by Freescale Semiconductor.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is available from http://www.gnu.org/licenses/fdl-1.3.txt.

The following terms are trademarks or registered trademarks of International Business Machines Corporation in the United States and/or other countries: AIX®, PowerPC®, VMX®, POWER™. A full list of U.S. trademarks owned by IBM may be found at http://www.ibm.com/legal/copytrade.shtml.

The following terms are trademarks or registered trademarks of Freescale Semiconductor in the United States and/or other countries: AltiVec™, e500™. Information on the list of U.S. trademarks owned by Freescale Semiconductor may be found at http://www.freescale.com/files/abstract/help_page/TERMSOFUSE.html.

The following terms are trademarks or registered trademarks of Power.org in the United States and/or other countries: Power ISA™, Power Architecture®. Information on the list of U.S. trademarks owned by Power.org may be found at http://www.power.org/brand_center/home/.

Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. Further information on this trademark can be found at http://www.linuxfoundation.org/programs/legal/trademark.

Revision History
Revision 1.0 April 19, 2011 Revised by: Power.org PowerABI TSC
Table of Contents

Preface .......................................................................................................................................................ix
1. How To Read This Document ................................................................................................................ix
2. Section Numbering ...................................................................................................................................x
1. Introduction ............................................................................................................................................1
1.1. Reference Documentation ..................................................................................................................1
2. Software Installation ...............................................................................................................................3
2.1. Physical Distribution Media and Formats ........................................................................................3
3. Low Level System Information .............................................................................................................4
3.1. Machine Interface ................................................................................................................................4
  3.1.1. Processor Architecture .............................................................................................................4
  3.1.2. Data Representation ...................................................................................................................4
    3.1.2.1. Byte Ordering ......................................................................................................................4
    3.1.2.2. Fundamental Types ............................................................................................................7
    3.1.2.3. Aggregates and Unions .......................................................................................................11
    3.1.2.4. Bit-fields ..............................................................................................................................14
  3.1.2. Limited-Access Bits ....................................................................................................................25
  3.1.2. General Stack Frame Requirements ............................................................................................29
    3.1.2.1. Limited-Access Bits ............................................................................................................25
    3.1.2.2. Optional Save Areas ..........................................................................................................30
  3.1.2. Parameter Passing ......................................................................................................................38
    3.1.2.1. Parameter Passing Register Selection Algorithm .............................................................40
    3.1.2.2. Parameter Passing Examples ............................................................................................45
  3.1.2. Variable Argument Lists ............................................................................................................50
  3.1.2. Return Values ..............................................................................................................................51
3.2. Function Calling Sequence .................................................................................................................19
  3.2.1. Registers ....................................................................................................................................19
    3.2.1.1. Register Roles .....................................................................................................................20
    3.2.1.2. Limited-Access Bits ............................................................................................................25
  3.2.2. The Stack Frame .........................................................................................................................29
    3.2.2.1. General Stack Frame Requirements ..................................................................................29
    3.2.2.2. Optional Save Areas ..........................................................................................................30
  3.2.3. Parameter Passing ......................................................................................................................38
    3.2.3.1. Parameter Passing Register Selection Algorithm .............................................................40
    3.2.3.2. Parameter Passing Examples ............................................................................................45
  3.2.4. Variable Argument Lists ............................................................................................................50
  3.2.5. Return Values ..............................................................................................................................51
3.3. Coding Examples ...............................................................................................................................53
  3.3.2. Code Model Overview ..............................................................................................................53
  3.3.3. Function Prologue and Epilogue .................................................................................................54
    3.3.3.1. The Purpose of a Function’s Prologue ...............................................................................54
    3.3.3.2. The Purpose of a Function’s Epilogue ...............................................................................54
    3.3.3.3. Rules for Prologue and Epilogue Sequences ......................................................................54
  3.3.4. Register Saving and Restoring Functions ..................................................................................55
    3.3.4.1. Details about the Functions ...............................................................................................57
    3.3.4.2. Register Saving and Restoring Functions (Vector) .............................................................63
  3.3.5. Profiling .....................................................................................................................................65
  3.3.6. Data Objects ..............................................................................................................................65
  3.3.7. Function Calls ............................................................................................................................68
  3.3.8. Branching .....................................................................................................................................69
  3.3.9. Dynamic Stack Space Allocation ..............................................................................................71
3.4. DWARF Definition ............................................................................................................................72
3.5. Exception Handling ...........................................................................................................................73
4. Object Files ........................................................................................................................................... 74
   4.3. ELF Header ...................................................................................................................................... 74
   4.4. Special Sections .......................................................................................................................... 74
   4.6. Symbol Table .............................................................................................................................. 76
      4.6.1. Symbol Values ...................................................................................................................... 76
   4.7. Small Data Area ........................................................................................................................... 77
      4.7.1. Use of the Small Data Area in Executables ........................................................................ 78
      4.7.2. Use of the Small Data Area in Shared Objects ...................................................................... 78
   4.9. DWARF Additions ......................................................................................................................... 79
   4.10. APU Information Section ............................................................................................................ 79
   4.13. Relocation Types ........................................................................................................................ 81
      4.13.1. Relocation Fields ................................................................................................................ 82
      4.13.2. SPE Specific Relocation Fields ............................................................................................ 83
      4.13.4. Relocation Notations ......................................................................................................... 84
      4.13.5. Relocation Types Table ...................................................................................................... 85
      4.13.6. Relocation Descriptions ..................................................................................................... 89
   4.15. Thread Local Storage ABI ............................................................................................................ 91
      4.15.1. TLS Background ................................................................................................................ 92
      4.15.2. TLS Runtime Handling .................................................................................................... 92
      4.15.3. TLS Access Models ......................................................................................................... 94
         4.15.3.1. General Dynamic TLS Model .................................................................................... 94
         4.15.3.2. Local Dynamic TLS Model ..................................................................................... 94
         4.15.3.3. Initial Exec TLS Model ............................................................................................ 95
         4.15.3.4. Local Exec TLS Model ........................................................................................... 96
      4.15.4. TLS Link Editor Optimizations .......................................................................................... 96
         4.15.4.1. General Dynamic to Initial Exec .............................................................................. 97
         4.15.4.2. General Dynamic to Local Exec ............................................................................. 97
         4.15.4.3. Local Dynamic to Local Exec .................................................................................. 98
         4.15.4.4. Initial Exec to Local Exec ....................................................................................... 99
      4.15.5. ELF TLS Definitions ............................................................................................................. 100
5. Program Loading and Dynamic Linking ............................................................................................ 105
   5.1. Program Loading .......................................................................................................................... 105
      5.1.1. Addressing Models ............................................................................................................... 108
   5.2. Dynamic Linking .......................................................................................................................... 108
      5.2.1. Program Interpreter ............................................................................................................. 108
      5.2.2. Dynamic Section ................................................................................................................ 108
      5.2.3. Global Offset Table .............................................................................................................. 109
         5.2.3.1. Global Offset Table Under The Secure-PLT ABI .................................................... 109
         5.2.3.2. Global Offset Table Under The BSS-PLT ABI .......................................................... 110
      5.2.4. Function Addresses ............................................................................................................ 111
      5.2.5. Procedure Linkage Table .................................................................................................... 112
         5.2.5.1. BSS Procedure Linkage Table .................................................................................... 112
         5.2.5.2. Secure Procedure Linkage Table .................................................................................. 116
6. Libraries .............................................................................................................................................120

6.1. Library Requirements ...............................................................................................................120

6.1.1. C Library Conformance with Generic ABI ...............................................................120

6.1.1.1. Malloc Routine Return Pointer Alignment .....................................................120

6.1.1.2. Library Handling of Limited-access Bits in Registers ...........................................120

6.1.2. Save and Restore Routines ..................................................................................120

6.1.2.1. Save and Restore Routine Suffixes ................................................................120

6.1.2.2. Save and Restore Routine Templates ...............................................................122

6.1.3. Types Defined In Standard Header .........................................................................124

A. Taxonomy ......................................................................................................................................127

B. Attribute Inclusion and ABI Conformance ..............................................................................131

B.1. ATR-LINUX Inclusion and Conformance .................................................................131

B.2. ATR-EABI Inclusion and Conformance .........................................................................132

C. APUs and Power ISA Categories .........................................................................................134
## List of Figures

3-1. Structure Smaller Than a Word ................................................................. 11  
3-2. Structure With No Padding ...................................................................... 12  
3-3. Structure With Internal Padding ............................................................. 12  
3-4. Structure With Internal and Tail Padding ................................................. 13  
3-5. Union Allocation ................................................................................... 13  
3-6. Simple Bit-field Allocation ...................................................................... 15  
3-7. Bit-Field Allocation With Boundary Alignment ...................................... 16  
3-8. Bit-Field Allocation With Storage Unit Sharing ................................... 17  
3-9. Bit-Field Allocation In A Union .............................................................. 17  
3-10. Bit-Field Allocation With Unnamed Bit-Fields ................................... 18  
3-11. Stack Frame Organization ................................................................... 29  
3-12. Example Minimum Stack Frame Allocation ........................................ 31  
3-13. General-Purpose and Floating-Point Register Save Areas .................. 31  
3-15. CR Save Area ..................................................................................... 33  
3-16. CR Save Area With Floating-Point Save Area ....................................... 33  
3-17. VRSAVE and Vector Register Save Areas ........................................... 34  
3-18. SPE 64-bit General-Purpose Register Save Area ................................. 36  
3-19. Parameter Save Area and Local Variable Space ................................. 37  
3-20. Parameter Passing Example ................................................................. 45  
3-21. Vector Parameter Passing Example ..................................................... 47  
3-22. SPE Parameter Passing Example ......................................................... 48  
3-23. Decimal Floating-Point Parameter Passing Example ......................... 49  
3-24. Profiling Example .............................................................................. 65  
3-25. Absolute Load and Store Example ...................................................... 66  
3-26. Small Model Position-Independent Load and Store ......................... 67  
3-27. Large Model Position-Independent Load and Store ............................ 67  
3-28. Direct Function Call ........................................................................... 68  
3-29. Absolute Indirect Function Call ......................................................... 68  
3-30. Small Model Position-Independent Indirect Function Call ............... 69  
3-31. Large Model Position-Independent Indirect Function Call ............... 69  
3-32. Before Dynamic Stack Allocation ....................................................... 71  
3-33. Example code to allocate n bytes: .................................................... 71  
3-34. After Dynamic Stack Allocation .......................................................... 72  
4-1. Section Ordering Under the BSS-PLT ................................................ 77  
4-2. Section Ordering Under the Secure-PLT ............................................. 77  
4-4. Thread Pointer Addressable Memory ................................................ 93  
4-5. TLS Block Diagram ........................................................................... 93  
4-6. Local Exec TLS Model Sequences ...................................................... 96  
5-1. File Image to Process Memory Image Mapping .................................... 106  
5-2. Loading the Address of _GLOBAL_OFFSET_TABLE_ Under the Secure-PLT ABI ... 109  
5-3. Loading the Address of _GLOBAL_OFFSET_TABLE_ Under the BSS-PLT ABI .... 110  
5-4. Example BSS-PLT .plt Section Implementation .................................. 113  
5-5. Example BSS-PLT Entries Post Resolution ........................................ 115  
A-1. Taxonomy ....................................................................................... 129
List of Tables

3-1. Bit and Byte Numbering in Halfwords................................................................. 5
3-2. Bit and Byte Numbering in Words ................................................................. 5
3-3. Bit and Byte Numbering in Doublewords .................................................... 5
3-4. Bit and Byte Numbering in Quadwords ....................................................... 5
3-5. Fundamental Types ....................................................................................... 7
3-6. SPE Types ..................................................................................................... 8
3-7. Vector Types ................................................................................................ 8
3-8. Decimal Floating-Point Types ..................................................................... 9
3-9. IBM® AIX® Long Double 128 Type ............................................................. 9
3-10. Long Double Is Double Type ..................................................................... 10
3-11. Bit-Field Types .......................................................................................... 14
3-12. Bit Numbering for 0x01020304 ................................................................. 15
3-13. Register Roles ......................................................................................... 20
3-14. TLS ABI Register Role for General-Purpose Register 2 ......................... 22
3-15. Register Roles for the _Complex float and _Complex double Types ....... 22
3-16. Register Roles for the _Complex Long Double Type ............................. 22
3-17. Secure-PLT Register Role for General-Purpose Register 30 ................. 22
3-18. Floating-Point Register Roles for Binary Floating-Point Types ............ 23
3-19. Floating-Point Register Roles for Decimal Floating-Point Types .......... 23
3-20. Soft-Float General-Purpose Register Roles for Binary Floating-Point Types ....... 24
3-21. Soft-Float General-Purpose Register Roles for Decimal Floating-Point Types ....... 24
3-22. Vector Register Roles ............................................................................. 25
3-23. SPE Register Roles ............................................................................... 25
3-24. Parameter Passing Using IBM AIX 128-bit Long Double ....................... 45
3-25. Parameter Passing Using IBM AIX 128-bit Long Double and Soft-Float... 46
3-26. Parameter Passing Using long double is double ..................................... 46
3-27. Parameter Passing Using long double is double and Soft-Float ............. 47
3-28. Parameter Passing of Vector Data Types ................................................. 48
3-29. Parameter Passing of SPE Data Types ...................................................... 48
3-30. Decimal Floating-Point Parameter Passing on Classic Power Architecture (with FPU) ....... 49
3-31. Decimal Floating-Point Parameter Passing with Soft-Float (without FPU) ....... 50
3-32. SPE Save And Restore Rules ................................................................. 57
3-33. Register Mappings ............................................................................... 73
4-1. e_flags Bit Masks .................................................................................... 74
4-2. _#_ev64Opaque Support ......................................................................... 79
4-3. Typical Elf Note Section Format ........................................................... 79
4-4. Object File a.o ..................................................................................... 80
4-5. Object File b.o .................................................................................. 80
4-6. Merged Object File b.o ................................................................. 80
4-7. APU Identifiers ................................................................................... 81
4-8. Object File b.o .................................................................................. 81
4-9. Relocation Table .................................................................................. 86
4-10. Relocation Table - Continued ........................................................... 86
4-11. General Dynamic Initial Relocations ................................................. 94
4-12. General Dynamic Outstanding Relocations ....................................... 94
4-13. Local Dynamic Initial Relocations .................................................... 95
4-14. Local Dynamic Outstanding Relocations .......................................... 95
4-15. Local Dynamic Outstanding Relocations .......................................... 95
1. How To Read This Document

Implementations of this *Power Architecture 32-bit Application Binary Interface Supplement* should indicate which *ABI software features* (see Appendix A) and Power ISA™ *categories* are implemented. When reading this document, the reader should reference those constraints and selectively read this text based upon them.

Appendix A provides a taxonomy of the information in this ABI document. The core of the ABI is common to all implementations and appears as nonconditional text, tables, and graphics.

Optional *ABI software feature* text or Power ISA category specific text is represented in the taxonomy as conditional attributes of the form ATR-FOO (where “FOO” is one of the attributes described in Appendix A). These attributes are used in the ABI text as element tags which aid in selective reading (and the generation) of this ABI document. These attributes describe the relationship of the optional elements of this document to a specific implementation.

This version of the *Power Architecture 32-bit Application Binary Interface Supplement* may take one of the following forms:

**Linux & Embedded**

The unified ABI document contains all text from all implementations of the ABI.

**Linux**

The technical conditions governing implementations of the Linux ABI are described by attribute conformance and inclusion rules in Appendix B, Section B.1. The attribute tags described in that part of the appendix are used to conditionally generate the Linux ABI variant of this document.

**Embedded**

The technical conditions governing implementations of the Embedded ABI are described by attribute conformance and inclusion rules in Appendix B, Section B.2. The attribute tags described in that part of the appendix are used to conditionally generate the Embedded ABI variant of this document.

Document elements representing *Categories* of the Power ISA are required for a software implementation based upon the implementation's conformance with either Book III-S or Book III-E of the Power ISA.

The following bounding box exemplifies a document element which corresponds to a *category* of the Power ISA.

---

**ATR-SPE**

This is an example of conditional text that applies to implementations that support the Signal Processing Engine (SPE) ABI, an optional *category* of the Power ISA.
This document also contains elements that correspond to optional ABI software features that may or may not be present in specific implementations. A prime differentiation would be software features used in embedded environments vs. those used in server environments, e.g., support for threading as defined by the Thread Local Storage ABI, support for the secure-PLT, or support for dynamic linking.

This is an example of conditional text that applies to an implementation which does not support a specific software feature.

2. Section Numbering

The subsection numbering of the unified Linux & Embedded version of the Power Architecture 32-bit Application Binary Interface Supplement is sequential and does not skip digits between sibling subsections since it contains all of the text, tables, and graphics available.

The individual Linux and Embedded versions of the Power Architecture 32-bit Application Binary Interface Supplement contain a subset of the text, tables, and graphics available. The subsection numbers of these subset documents remain congruent with those of the Linux & Embedded version of the Power Architecture 32-bit Application Binary Interface Supplement (and with each other where they overlap) in order to prevent confusion during cross-reference and therefore subsection numbering can appear to skip digits between sibling subsections.
Chapter 1. Introduction

The Executable and Linkable Format (ELF) defines a linking interface for executables and shared objects in two parts. The first part is the generic System V ABI. The second part is a processor-specific supplement.

This document is the processor-specific supplement for use with ELF on 32-bit Power Architecture processor systems. This is not a complete System V Application Binary Interface Supplement because it does not define any library interfaces.

Furthermore, this document establishes both big-endian and little-endian application binary interfaces (see Section 3.1.2.1). Processors in the 32-bit Power Architecture can execute in either big-endian or little-endian mode. Executables and executable generated data (in general) that subscribe to either byte ordering are not portable to a system running in the other mode.

Note: This ABI specification does not address little-endian byte ordering prior to Power ISA 2.03.

The Power Architecture 32-bit Application Binary Interface Supplement is not the same as the 64-bit PowerPC ELF ABI.

The Power Architecture 32-bit Application Binary Interface Supplement is intended to use the same structural layout now followed in practice by other processor specific ABIs.

1.1. Reference Documentation

The archetypal ELF ABI is described by the System V ABI. Supersessions and addenda that are 32-bit Power Architecture processor-specific are described in this document.

The following cited documents are complementary to this document and equally binding:


ATR-SPE


ATR-VECTOR

Chapter 1. Introduction

ATR-DFP


ATR-CXX


ATR-TLS


The following documents are of interest for their historical information but are not normative in any way.

• The 32-bit AIX ABI.
• The PowerOpen ABI.
Chapter 2. Software Installation

2.1. Physical Distribution Media and Formats

This document does not specify any physical distribution media or formats. Any agreed-upon distribution media may be used.
Chapter 3. Low Level System Information

3.1. Machine Interface

3.1.1. Processor Architecture

This Application Binary Interface (ABI) is not explicitly predicated on a minimum Power ISA version. All nonoptional instructions that are defined by the Power Architecture® can be assumed to be implemented and work as specified. ABI conforming implementations must provide these instructions through software emulation if they are not provided by the processor.

Note: The exceptions to this rule are the Fixed-point Load and Store Multiple and Fixed-point Move Assist instructions which are not available in little-endian implementations because they would cause alignment exceptions.

Processors may support additional instructions beyond the published Instruction Set Architecture (ISA) and the Power Architecture optional ones, through Auxiliary Processing Units (APUs). This ABI provides a method for describing the additional instructions in section information (see Section 4.4 and Section 4.10) but does not address these additional instructions directly and executing them may result in undefined behavior.

This ABI does not explicitly impose any performance constraints on systems.

3.1.2. Data Representation

3.1.2.1. Byte Ordering

The following standard data formats are recognized:

- 8-bit byte
- 16-bit halfword
- 32-bit word
- 64-bit doubleword
- 128-bit quadword

In big-endian byte ordering, the most significant byte is located in the lowest addressed byte position in memory (byte 0). This byte ordering is alternately referred to as Most Significant Byte (MSB) ordering.

In little-endian byte ordering, the least significant byte is located in the lowest addressed byte position in memory (byte 0). This byte ordering is alternately referred to as Least Significant Byte (LSB) ordering.

A specific processor implementation must state which type of byte ordering is to be used.
Although it is possible on some processors to map some pages as little-endian, and other pages as big-endian in the same application, such an application does not conform to the ABI.

Table 3-1, Table 3-2, Table 3-3, and Table 3-4 show the conventions being assumed in big-endian and little-endian byte ordering at the bit and byte levels. These conventions are applied to integer and floating-point data types. Byte numbers are indicated in the upper corners, and bit numbers in the lower corners. Little-endian byte numbers are indicated on the right side; big-endian byte numbers are indicated on the left side.

**Table 3-1. Bit and Byte Numbering in Halfwords**

<table>
<thead>
<tr>
<th>0 1</th>
<th>1 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>msb</td>
<td>lsb</td>
</tr>
<tr>
<td>0 7</td>
<td>8 15</td>
</tr>
</tbody>
</table>

**Table 3-2. Bit and Byte Numbering in Words**

<table>
<thead>
<tr>
<th>0 3</th>
<th>1 2</th>
<th>2 1</th>
<th>3 0</th>
</tr>
</thead>
<tbody>
<tr>
<td>msb</td>
<td>lsb</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 7</td>
<td>8 15</td>
<td>16 23</td>
<td>24 31</td>
</tr>
</tbody>
</table>

**Table 3-3. Bit and Byte Numbering in Doublewords**

<table>
<thead>
<tr>
<th>0 7</th>
<th>1 6</th>
<th>2 5</th>
<th>3 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>msb</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 7</td>
<td>8 15</td>
<td>16 23</td>
<td>24 31</td>
</tr>
<tr>
<td>4 3</td>
<td>5 2</td>
<td>6 1</td>
<td>7 0</td>
</tr>
<tr>
<td>32 39</td>
<td>40 47</td>
<td>48 55</td>
<td>56 63</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>0 7</th>
<th>1 6</th>
<th>2 5</th>
<th>3 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>msb</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0 7</td>
<td>8 15</td>
<td>16 23</td>
<td>24 31</td>
</tr>
<tr>
<td>4 3</td>
<td>5 2</td>
<td>6 1</td>
<td>7 0</td>
</tr>
<tr>
<td>32 39</td>
<td>40 47</td>
<td>48 55</td>
<td>56 63</td>
</tr>
</tbody>
</table>
Chapter 3. Low Level System Information

Table 3-4. Bit and Byte Numbering in Quadwords

<table>
<thead>
<tr>
<th></th>
<th>0</th>
<th>15</th>
<th>1</th>
<th>14</th>
<th>2</th>
<th>13</th>
<th>3</th>
<th>12</th>
</tr>
</thead>
<tbody>
<tr>
<td>msb</td>
<td>0</td>
<td>7</td>
<td>8</td>
<td>15</td>
<td>16</td>
<td>23</td>
<td>24</td>
<td>31</td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>11</td>
<td>5</td>
<td>10</td>
<td>6</td>
<td>9</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td></td>
<td>32</td>
<td>39</td>
<td>40</td>
<td>47</td>
<td>48</td>
<td>55</td>
<td>56</td>
<td>63</td>
</tr>
<tr>
<td>lsb</td>
<td>8</td>
<td>7</td>
<td>9</td>
<td>6</td>
<td>10</td>
<td>5</td>
<td>11</td>
<td>4</td>
</tr>
<tr>
<td></td>
<td>64</td>
<td>71</td>
<td>72</td>
<td>79</td>
<td>80</td>
<td>87</td>
<td>88</td>
<td>95</td>
</tr>
<tr>
<td></td>
<td>12</td>
<td>3</td>
<td>13</td>
<td>2</td>
<td>14</td>
<td>1</td>
<td>15</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td>96</td>
<td>103</td>
<td>104</td>
<td>111</td>
<td>112</td>
<td>119</td>
<td>120</td>
<td>127</td>
</tr>
</tbody>
</table>

Note: In the Power ISA, the figures are generally only shown in big-endian byte order. The bits in these data format specification are numbered from left to right (MSB to LSB).

ATR-SPE

Note: SPE documentation uses 64-bit numbering throughout, including for registers such as the CR that only contain 32 bits. This numbering can lead to some confusion. For example, although the CR bits are now numbered from 32 to 63, the same assembly instructions still work: `{crx}or 6,6,6` operates on bit 32 + 6, that is, CR[38]. When discussing register contents, the bits are numbered 0 : 63 for 64-bit registers and 32 : 63 for 32-bit registers. When discussing memory contents, the bits are numbered naturally (for example, 0 : 7 for bits within one byte and 0 : 15 for bits within halfwords).

The bit numbering in the Power ISA is all 64-bit except for the following registers indicated in Power ISA section 1.4:

• Opcodes marking 0-31

ATR-VECTOR

• Vector registers and the VSCR (see Section 3.2.1).

ATR-CLASSIC-FLOAT

• As of Power ISA version 2.05 the FPSCR has been extended from 32-bits to 64-bits. The fields of the original 32-bit FPSCR are now held in bits 32-63 of the 64-bit FPSCR. The assembly instructions which operate upon the 64-bit FPSCR have either had a `W Instruction Field` added to select the operative word for the instruction, e.g., `mtf sfi`, or the instruction has been extended to operate upon
the entire 64-bit FPSCR, e.g., mffs. Reference to fields of the FPSCR, representing 1 or more bits, is
done by field number with an indication of the operative word rather than by bit-number.

If the Power ISA version 2.05 DFP category is not needed by an implementation the FPSCR may
continue to be referenced as a 32-bit register using the old forms of the instructions to support binary
compatibility of ELF files built against an older Power ISA version. See Section 3.2.1 for more
information on the FPSCR.

3.1.2.2. Fundamental Types

The following tables map the data format specifications described in the Power ISA to ISO C scalar
types. Each scalar type has a required alignment, which is indicated in the alignment column. Usage of
these types in data structures must follow the alignment specified in the order encountered to ensure
consistent mapping. When using variables individually, more strict alignment may be imposed if it has
optimization benefits.

Table 3-5. Fundamental Types

<table>
<thead>
<tr>
<th>Type</th>
<th>ISO C Types</th>
<th>sizeof</th>
<th>Alignment</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Boolean</td>
<td>_Bool</td>
<td>1</td>
<td>byte</td>
<td>boolean</td>
</tr>
<tr>
<td>Character</td>
<td>char</td>
<td>1</td>
<td>byte</td>
<td>unsigned byte</td>
</tr>
<tr>
<td></td>
<td>unsigned char</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>signed char</td>
<td>1</td>
<td>byte</td>
<td>signed byte</td>
</tr>
<tr>
<td></td>
<td>short</td>
<td>2</td>
<td>halfword</td>
<td>signed halfword</td>
</tr>
<tr>
<td></td>
<td>signed short</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>unsigned short</td>
<td>2</td>
<td>halfword</td>
<td>unsigned halfword</td>
</tr>
<tr>
<td>Enumeration</td>
<td>signed enum</td>
<td>4</td>
<td>word</td>
<td>signed word</td>
</tr>
<tr>
<td></td>
<td>unsigned enum</td>
<td>4</td>
<td>word</td>
<td>unsigned word</td>
</tr>
<tr>
<td>Integral</td>
<td>int</td>
<td>4</td>
<td>word</td>
<td>signed word</td>
</tr>
<tr>
<td></td>
<td>signed int</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>long int</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>signed long</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>unsigned int</td>
<td>4</td>
<td>word</td>
<td>unsigned word</td>
</tr>
<tr>
<td></td>
<td>unsigned long</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>long long</td>
<td>8</td>
<td>doubleword</td>
<td>signed doubleword</td>
</tr>
<tr>
<td></td>
<td>signed long long</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>unsigned long long</td>
<td>8</td>
<td>doubleword</td>
<td>unsigned doubleword</td>
</tr>
<tr>
<td>Pointer</td>
<td>any *</td>
<td>4</td>
<td>word</td>
<td>unsigned word</td>
</tr>
<tr>
<td></td>
<td>any (*) ()</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Floating</td>
<td>float</td>
<td>4</td>
<td>word</td>
<td>single-precision float</td>
</tr>
<tr>
<td></td>
<td>double</td>
<td>8</td>
<td>doubleword</td>
<td>double-precision float</td>
</tr>
</tbody>
</table>
A NULL pointer has all bits zero.

Note: A boolean value is represented as a byte with value 0 or 1. If a byte with a value other than 0 or 1 is evaluated as a boolean value (for example, through the use of unions), the behavior is undefined.

Note: If an enumerated type contains a negative value, it is compatible with and has the same representation and alignment as int; otherwise it is compatible with and has the same representation and alignment as unsigned int.

Note: For each real floating-point type there is a corresponding complex type. This has the same alignment as the real type and twice the size; the representation is the real part followed by the imaginary part.

### Table 3-6. SPE Types

<table>
<thead>
<tr>
<th>Type</th>
<th>SPEIM C Types</th>
<th>sizeof</th>
<th>Alignment</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>vector-64</td>
<td><strong>ev64_u16</strong></td>
<td>8</td>
<td>doubleword</td>
<td>vector of four unsigned halfwords</td>
</tr>
<tr>
<td></td>
<td><strong>ev64_s16</strong></td>
<td>8</td>
<td>doubleword</td>
<td>vector of four signed halfwords</td>
</tr>
<tr>
<td></td>
<td><strong>ev64_u32</strong></td>
<td>8</td>
<td>doubleword</td>
<td>vector of two unsigned words</td>
</tr>
<tr>
<td></td>
<td><strong>ev64_s32</strong></td>
<td>8</td>
<td>doubleword</td>
<td>vector of two signed words</td>
</tr>
<tr>
<td></td>
<td><strong>ev64_fs</strong></td>
<td>8</td>
<td>doubleword</td>
<td>vector of two single-precision floats</td>
</tr>
<tr>
<td></td>
<td><strong>ev64_u64</strong></td>
<td>8</td>
<td>doubleword</td>
<td>1 unsigned doubleword</td>
</tr>
<tr>
<td></td>
<td><strong>ev64_s64</strong></td>
<td>8</td>
<td>doubleword</td>
<td>1 signed doubleword</td>
</tr>
<tr>
<td></td>
<td><strong>ev64_opaque</strong></td>
<td>8</td>
<td>doubleword</td>
<td>any of the above</td>
</tr>
</tbody>
</table>
### ATR-VECTOR

**Table 3-7. Vector Types**

<table>
<thead>
<tr>
<th>Type</th>
<th>ALTIVECPIM C Types</th>
<th>sizeof</th>
<th>Alignment</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>vector-128</td>
<td>vector unsigned char</td>
<td>16</td>
<td>quadword</td>
<td>vector of sixteen unsigned bytes</td>
</tr>
<tr>
<td></td>
<td>vector signed char</td>
<td>16</td>
<td>quadword</td>
<td>vector of sixteen signed bytes</td>
</tr>
<tr>
<td></td>
<td>vector unsigned short</td>
<td>16</td>
<td>quadword</td>
<td>vector of eight unsigned halfwords</td>
</tr>
<tr>
<td></td>
<td>vector signed short</td>
<td>16</td>
<td>quadword</td>
<td>vector of eight signed halfwords</td>
</tr>
<tr>
<td></td>
<td>vector unsigned int</td>
<td>16</td>
<td>quadword</td>
<td>vector of four unsigned words</td>
</tr>
<tr>
<td></td>
<td>vector signed int</td>
<td>16</td>
<td>quadword</td>
<td>vector of four signed words</td>
</tr>
<tr>
<td></td>
<td>vector float</td>
<td>16</td>
<td>quadword</td>
<td>vector of four single-precision floats</td>
</tr>
</tbody>
</table>

### ATR-SPE & ATR-VECTOR

Note: Availability of Vector data types is subject to conformance to a Power ISA category where the categories “Vector” and “SPE” are mutually exclusive.

### ATR-DFP

**Table 3-8. Decimal Floating-Point Types**

<table>
<thead>
<tr>
<th>Type</th>
<th>ISO TR 24732 C Types</th>
<th>sizeof</th>
<th>Alignment</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Decimal Floating</td>
<td>_Decimal32</td>
<td>4</td>
<td>word</td>
<td>single-precision decimal float</td>
</tr>
<tr>
<td></td>
<td>_Decimal64</td>
<td>8</td>
<td>doubleword</td>
<td>double-precision decimal float</td>
</tr>
<tr>
<td></td>
<td>_Decimal128</td>
<td>16</td>
<td>quadword</td>
<td>quad-precision decimal float</td>
</tr>
</tbody>
</table>
Chapter 3. Low Level System Information

### ATR-LONG-DOUBLE-IBM

**Table 3-9. IBM® AIX® Long Double 128 Type**

<table>
<thead>
<tr>
<th>Type</th>
<th>ISO C Types</th>
<th>sizeof</th>
<th>Alignment</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>IBM AIX long double</td>
<td>long double</td>
<td>16</td>
<td>quadword</td>
<td>two double-precision floats</td>
</tr>
</tbody>
</table>

### ATR-LONG-DOUBLE-IS-DOUBLE

**Table 3-10. Long Double Is Double Type**

<table>
<thead>
<tr>
<th>Type</th>
<th>ISO C Types</th>
<th>sizeof</th>
<th>Alignment</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>long double is double</td>
<td>long double</td>
<td>8</td>
<td>doubleword</td>
<td>double-precision float</td>
</tr>
</tbody>
</table>

### ATR-LONG-DOUBLE-IBM && ATR-LONG-DOUBLE-IS-DOUBLE

Note: Availability of the long double data type is subject to conformance to a long double standard where the IBM AIX 128-bit Long Double format and the Long Double is Double format are mutually exclusive.

### ATR-LONG-DOUBLE-IS-DOUBLE || ATR-LONG-DOUBLE-IBM

This ABI provides the following choices for implementation of long double in compilers and systems:

**ATR-LONG-DOUBLE-IS-DOUBLE**

- Do not support any floating-point types with greater precision than double. In this case, long doubles and doubles have the same size and precision.

**ATR-LONG-DOUBLE-IBM**

- Provide support for the IBM AIX 128-bit Long Double format. In this format, double precision numbers with different magnitudes that do not overlap, provide an effective precision of 106-bits. The high-order double-precision value (the one that comes first in storage) must have the larger magnitude. The high-order double-precision value must equal the sum of the two values, rounded to nearest double.
Chapter 3. Low Level System Information

- Extended precision provides the same range of double-precision (about $10^{-308}$ to $10^{308}$ but more precision (a variable amount, about 31 decimal digits or more).

- As the absolute value of the magnitude decreases (near the denormal range), the precision available in the low-order double also decreases.

- When the value represented is in the denormal range, this representation provides no more precision than 64-bit (double) floating-point.

- The actual number of bits of precision can vary. If the low-order part is much less than 1 unit of least precision (ULP) of the high-order part, significant bits (either all 0s or all 1s) are implied between the significands of high-order and low-order numbers. Some algorithms that rely on having a fixed number of bits in the significand can fail when using extended-precision.

This implementation differs from the IEEE 754 Standard in the following ways:

- The software support is restricted to round-to-nearest mode. Programs that use extended-precision must ensure that this rounding mode is in effect when extended-precision calculations are performed.

- Does not fully support the IEEE special numbers NaN and INF. These values are encoded in the high-order double value only. The low-order value is not significant, but the low-order value of an infinity must be positive or negative zero.

- Does not support the IEEE status flags for overflow, underflow, and other conditions. These flags have no meaning in this format.

---

3.1.2.3. Aggregates and Unions

The following are the rules for aggregates (structures and arrays) and unions that apply to their alignment and size.

- The entire aggregate or union must be aligned to its most strictly aligned member, which corresponds to the member with the largest alignment, including flexible array members.

- Each member is assigned the lowest available offset that meets the alignment requirements of the member. Depending on the previous member, internal padding can be required.

- The entire aggregate or union must have a size that is a multiple of its alignment. Depending on the last member, tail padding can be required.

For the following figures, the big-endian byte offsets are located in the upper left corners, and the little-endian byte offsets are located in the upper right corners.
Figure 3-1. Structure Smaller Than a Word

```c
struct {
    char c;
};
```

byte aligned, sizeof is 1

<table>
<thead>
<tr>
<th>0</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td></td>
</tr>
</tbody>
</table>

Figure 3-2. Structure With No Padding

```c
struct {
    char c;
    char d;
    short s;
    int n;
};
```

word-aligned, sizeof is 8

**little-endian**

<table>
<thead>
<tr>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>s</td>
<td>d</td>
<td>c</td>
</tr>
<tr>
<td>4</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**big-endian**

<table>
<thead>
<tr>
<th>0</th>
<th>1</th>
<th>2</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>d</td>
<td>s</td>
</tr>
<tr>
<td>4</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Figure 3-3. Structure With Internal Padding

```c
struct {
    char c;
    short s;
};
```

halfword-aligned, sizeof is 4

**little-endian**

<table>
<thead>
<tr>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>s</td>
<td>pad</td>
<td>c</td>
</tr>
</tbody>
</table>
big-endian

<table>
<thead>
<tr>
<th>0</th>
<th>1</th>
<th>2</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>pad</td>
<td>s</td>
</tr>
</tbody>
</table>

**Figure 3-4. Structure With Internal and Tail Padding**

```c
struct {
    char c;
    double d;
    short s;
};
```

doubleword-aligned, sizeof is 24

little-endian

<table>
<thead>
<tr>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>pad</td>
<td>c</td>
</tr>
<tr>
<td>pad</td>
<td>4</td>
</tr>
<tr>
<td>pad</td>
<td>8</td>
</tr>
<tr>
<td>d</td>
<td>12</td>
</tr>
<tr>
<td>d</td>
<td>18</td>
</tr>
<tr>
<td>pad</td>
<td>16</td>
</tr>
<tr>
<td>pad</td>
<td>20</td>
</tr>
</tbody>
</table>

big-endian

<table>
<thead>
<tr>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>pad</td>
</tr>
<tr>
<td>4</td>
<td>pad</td>
</tr>
<tr>
<td>8</td>
<td>d</td>
</tr>
<tr>
<td>12</td>
<td>d</td>
</tr>
<tr>
<td>16</td>
<td>18</td>
</tr>
<tr>
<td>s</td>
<td>pad</td>
</tr>
<tr>
<td>20</td>
<td>pad</td>
</tr>
</tbody>
</table>
Figure 3-5. Union Allocation

union {
    char  c;
    short s;
    int   j;
};

word-aligned, sizeof is 4

little-endian

<table>
<thead>
<tr>
<th></th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>pad</td>
<td></td>
<td>c</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>pad</td>
<td></td>
<td>s</td>
</tr>
<tr>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>j</td>
</tr>
</tbody>
</table>

big-endian

<table>
<thead>
<tr>
<th>0</th>
<th>1</th>
<th>pad</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>2</td>
<td>pad</td>
</tr>
<tr>
<td>s</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td></td>
<td>j</td>
</tr>
</tbody>
</table>

3.1.2.4. Bit-fields

Bit-fields can be present in definitions of C structures and unions. These bit-fields define whole objects within the structure or union where the number of bits in the bit-field is specified.

In the following table, a signed range goes from \(-2^{(w-1)}\) to \(2^{(w-1)} - 1\) and an unsigned range goes from 0 to \(2^w - 1\).
Table 3-11. Bit-Field Types

<table>
<thead>
<tr>
<th>Bit-field Type</th>
<th>Width (w)</th>
</tr>
</thead>
<tbody>
<tr>
<td>_Bool</td>
<td>1</td>
</tr>
<tr>
<td>signed char</td>
<td>1 to 8</td>
</tr>
<tr>
<td>unsigned char</td>
<td></td>
</tr>
<tr>
<td>signed short</td>
<td>1 to 16</td>
</tr>
<tr>
<td>unsigned short</td>
<td></td>
</tr>
<tr>
<td>signed int</td>
<td>1 to 32</td>
</tr>
<tr>
<td>signed long</td>
<td></td>
</tr>
<tr>
<td>unsigned int</td>
<td></td>
</tr>
<tr>
<td>unsigned long</td>
<td></td>
</tr>
<tr>
<td>enum</td>
<td></td>
</tr>
<tr>
<td>signed long long</td>
<td>1 to 64</td>
</tr>
<tr>
<td>unsigned long long</td>
<td></td>
</tr>
</tbody>
</table>

Bit-fields can be signed or unsigned of type short, int, long, or long long. However, bit-fields shall have the same range for each corresponding type; for example, signed short must have the same range as unsigned short. All members of structures and unions must comply with the size and alignment rules including bit-fields. The following list of size and alignment rules additionally apply to bit-fields:

- The allocation of bit-fields is determined by the system endianess. For little-endian implementations the bit allocation is from the least significant (right) end to the most significant (left) end. The reverse is true for big-endian implementations; the bit allocation is from most significant (left) end to the least significant (right) end.
- A bit-field cannot cross its unit boundary; it must occupy the storage unit allocated for its declared type.
- If there is enough space within a storage unit, bit-fields must share the storage unit with other structure members, including members that are not bit-fields. Clearly all the structure members occupy different parts of the storage unit.
- The types of unnamed bit-fields have no effect on the alignment of a structure or union. However the offsets of an individual bit-field’s member must comply with the alignment rules. An unnamed bit-field of zero width causes sufficient padding (possibly none) to be inserted for the next member, or the end of the structure if there are no more nonzero width members, to have an offset from the start of the structure that is a multiple of the size of the declared type of the zero-width member.

The byte offsets for structure and union members are shown in the examples below. The little-endian byte offsets are given in the upper right corners, and the big-endian byte offsets are given in the upper left corners. The bit numbers are given in the lower corners.

Table 3-12. Bit Numbering for 0x01020304

<table>
<thead>
<tr>
<th>0</th>
<th>3</th>
<th>1</th>
<th>2</th>
<th>2</th>
<th>1</th>
<th>3</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>01</td>
<td>02</td>
<td>03</td>
<td>04</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>07</td>
<td>15</td>
<td>23</td>
<td>31</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Figure 3-6. Simple Bit-field Allocation

```c
struct {
    int j : 5;
    int k : 6;
    int m : 7;
};
```

word-aligned, sizeof is 4

little-endian

<table>
<thead>
<tr>
<th>pad</th>
<th>m</th>
<th>k</th>
<th>j</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>13</td>
<td>20</td>
<td>27</td>
</tr>
</tbody>
</table>

big-endian

<table>
<thead>
<tr>
<th>j</th>
<th>k</th>
<th>m</th>
<th>pad</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>4</td>
<td>10</td>
<td>18</td>
</tr>
</tbody>
</table>

Figure 3-7. Bit-Field Allocation With Boundary Alignment

```c
struct {
    short s : 9;
    int j : 9;
    char c;
    short t : 9;
    short u : 9;
    char d;
};
```

word-aligned, sizeof is 12

little-endian

<table>
<thead>
<tr>
<th>3</th>
<th>c</th>
<th>pad</th>
<th>j</th>
<th>s</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>7</td>
<td>13</td>
<td>22</td>
<td>23</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>pad</th>
<th>u</th>
<th>pad</th>
<th>t</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>6</td>
<td>15</td>
<td>22</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>pad</th>
<th>d</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>23</td>
</tr>
</tbody>
</table>

big-endian

<table>
<thead>
<tr>
<th>0</th>
<th>s</th>
<th>j</th>
<th>pad</th>
<th>3</th>
<th>c</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>8</td>
<td>9</td>
<td>17</td>
<td>18</td>
<td>23</td>
</tr>
<tr>
<td>4</td>
<td>t</td>
<td>pad</td>
<td>6</td>
<td>u</td>
<td>pad</td>
</tr>
<tr>
<td>0</td>
<td>8</td>
<td>9</td>
<td>15</td>
<td>16</td>
<td>24</td>
</tr>
<tr>
<td>8</td>
<td>d</td>
<td>pad</td>
<td>9</td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>7</td>
<td>8</td>
<td></td>
<td></td>
<td>31</td>
</tr>
</tbody>
</table>

**Figure 3-8. Bit-Field Allocation With Storage Unit Sharing**

```c
struct {
    char  c;
    short s : 8;
};
```

halfword-aligned, sizeof is 2

little-endian

<table>
<thead>
<tr>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>s</td>
<td>c</td>
</tr>
<tr>
<td>0</td>
<td>7</td>
</tr>
</tbody>
</table>

big-endian

<table>
<thead>
<tr>
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>s</td>
</tr>
<tr>
<td>0</td>
<td>7</td>
</tr>
</tbody>
</table>

**Figure 3-9. Bit-Field Allocation In A Union**

```c
union {
    char  c;
    short s : 8;
};
```

halfword-aligned, sizeof is 2
Figure 3-10. Bit-Field Allocation With Unnamed Bit-Fields

```
struct {
    char c;
    int : 0;
    char d;
    short : 9;
    char e;
};
```

byte aligned, sizeof is 9

little-endian

```
0:1
pad 0 7 8 15

1:0
pad 0 7 8 15
```

big-endian

```
0:1
pad 0 7 8 15

0:1
pad 0 7 8 15
```

Chapter 3. Low Level System Information
Chapter 3. Low Level System Information

3.2. Function Calling Sequence

The standard sequence for function calls is outlined in this section. The layout of the stack frame, the parameter passing convention, and the register usage is also detailed in this section. Standard library functions use these conventions, except as documented for the register save and restore functions.

The conventions given in this chapter are adhered to by C programs. Further information on the implementation of C is given in Section 3.3.

Note: While it is recommended that all functions use the standard calling sequence, the requirements of the standard calling sequence are only applicable to global functions. Different calling sequences and conventions can be employed by local functions which cannot be reached from other compilation units, if they comply with the stack back trace requirements.

ATR-LONG-Doubles-IS-Doubles

Note: If long double has the same representation as double, then all statements about how double values are passed to and returned from functions also apply to long double, and all statements about how _Complex double values are passed to and returned from functions also apply to _Complex long double.

3.2.1. Registers

Programs and compilers may freely use all registers except those reserved for system use. The system signal handlers are responsible for preserving the original values upon return to the original execution.
Chapter 3. Low Level System Information

path. Signals that can interrupt the original execution path are documented in (BA-OS) in the System V Interface Definition.

The tables in *Section 3.2.1.1* give an overview of the registers that are global during program execution. The tables use three terms to describe register *Preservation Rules*:

**nonvolatile**

A *caller* can expect that the contents of all registers marked *nonvolatile* are valid after control returns from a function call.

A *callee* shall save the contents of all registers marked *nonvolatile* prior to modification. The callee must restore the contents of all such registers before returning to its caller.

**volatile**

A *caller* cannot trust that the contents of registers marked *volatile* have been preserved across a function call.

A *callee* need not save the contents of registers marked *volatile* before modification.

**limited-access**

The contents of registers marked *limited-access* have special preservation rules. These registers have mutability restricted to certain bit-fields as defined by the Power ISA. The individual bits of these bit-fields are defined by this ABI to be *limited-access*.

Under normal conditions a *caller* can expect that these bits have been preserved across a function call. Under the special conditions, indicated in *Section 3.2.1.2*, a *caller shall expect* that these bit will have changed across function calls even if they have not.

A *callee* may only permanently modify these bits without preserving the state upon entrance to the function if the *callee* satisfies the special conditions indicated in *Section 3.2.1.2*; otherwise, these bits must be preserved before modification and restored before returning to the caller.

### 3.2.1.1. Register Roles

In the 32-bit Power Architecture, there are always 32 general-purpose registers, each 32 bits wide. Throughout this document the symbol $rN$ is used, where $N$ is a register number, to refer to general-purpose register $N$. 
Table 3-13. Register Roles

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>r0</td>
<td>volatile</td>
<td>Optional in function linkage</td>
</tr>
<tr>
<td>r1</td>
<td>nonvolatile</td>
<td>Stack frame pointer</td>
</tr>
<tr>
<td>r2</td>
<td>nonvolatile</td>
<td>See the following table</td>
</tr>
<tr>
<td>r3-r6</td>
<td>volatile</td>
<td>Parameter and return value</td>
</tr>
<tr>
<td>r7-r10</td>
<td>volatile</td>
<td>Additional function parameters</td>
</tr>
<tr>
<td>r11-r12</td>
<td>volatile</td>
<td>Optional in function linkage</td>
</tr>
<tr>
<td>r13</td>
<td>nonvolatile</td>
<td>Small data area pointer</td>
</tr>
<tr>
<td>r14-r31</td>
<td>nonvolatile</td>
<td>Local variables</td>
</tr>
<tr>
<td>LR</td>
<td>volatile</td>
<td>Link register</td>
</tr>
<tr>
<td>CTR</td>
<td>volatile</td>
<td>Loop count register</td>
</tr>
<tr>
<td>XER</td>
<td>volatile</td>
<td>Fixed point exception register</td>
</tr>
<tr>
<td>CR0-CR1</td>
<td>volatile</td>
<td>Condition register fields</td>
</tr>
<tr>
<td>CR2-CR4</td>
<td>nonvolatile</td>
<td>Condition register fields</td>
</tr>
<tr>
<td>CR5-CR7</td>
<td>volatile</td>
<td>Condition register fields</td>
</tr>
</tbody>
</table>

**Optional Function Linkage**

A function cannot depend on the values of those registers optional in the function linkage (r0, r11, and r12) because they may be altered by inter-library calls.

**Stack Frame Pointer**

The stack pointer always points to the lowest allocated valid stack frame. It must maintain quadword alignment and grow toward the lower addresses. The contents of the word at that address always points to the previously allocated stack frame. A called function is permitted to decrement it if required. See Section 3.3.9 for additional information.

**Small Data Area Pointer**

Register r13 is the small data area pointer. Process start up code for executables that reference data in the small data area with 16-bit offset addressing relative to r13 must load the base of the small data area (the value of the dynamic linker-defined symbol _SDA_BASE_) into r13. Shared objects shall not alter the value in r13. See Section 4.7 for more details.

**Link Register**

The link register contains the address a called function normally returns to. It is volatile across function calls.
**Condition Register Fields**

In the condition register, the bit-fields CR2, CR3, and CR4 are nonvolatile and the value on entry must be restored on exit. The other bit-fields are volatile. The bit-field CR6 shall be set by the caller of a variable argument list function as described in Section 3.2.4.

---

**ATR-TLS**

Table 3-14. TLS ABI Register Role for General-Purpose Register 2

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>r2</td>
<td>nonvolatile</td>
<td>Thread pointer</td>
</tr>
</tbody>
</table>

---

**ATR-PASS-COMPLEX-IN-GPRS**

Table 3-16. Register Roles for the _Complex float and _Complex double Types

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>r3-r10</td>
<td>volatile</td>
<td>Used for _Complex float and _Complex double parameters and return values.</td>
</tr>
</tbody>
</table>

---

**ATR-PASS-COMPLEX-IN-GPRS & ATR-LONG-Double-IBM**

Table 3-17. Register Roles for the _Complex Long Double Type

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>r3-r10</td>
<td>volatile</td>
<td>Used for the _Complex long double parameters and return values.</td>
</tr>
</tbody>
</table>

---

**ATR-SECURE-PLT**

Under the Secure-PLT ABI, when using the Position-Independent Code (PIC) addressing model, register r30 is used (by convention between compiler & link editor) in nonleaf functions to hold the Global Offset Table (GOT) pointer. See Section 5.2.5.2 for information on the Secure-PLT.

Table 3-18. Secure-PLT Register Role for General-Purpose Register 30

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>r30</td>
<td>nonvolatile</td>
<td>GOT pointer under the Secure-PLT with the Position-Independent Code (PIC) addressing model</td>
</tr>
</tbody>
</table>
Chapter 3. Low Level System Information

ATR-CLASSIC-FLOAT

On Power Architecture processors that support Power ISA category Floating-point, there are always 32 floating-point registers, each 64 bits wide, and an associated special-purpose register to provide floating-point status and control. Throughout this document the symbol $fN$ is used, where $N$ is a register number, to refer to floating-point register $N$.

Table 3-19. Floating-Point Register Roles for Binary Floating-Point Types

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>$f0$</td>
<td>volatile</td>
<td></td>
</tr>
<tr>
<td>$f1$</td>
<td>volatile</td>
<td>Used for parameter passing and return values of binary float types.</td>
</tr>
<tr>
<td>$f2-f8$</td>
<td>volatile</td>
<td>Used for parameter passing of binary float types.</td>
</tr>
<tr>
<td>$f9-f13$</td>
<td>volatile</td>
<td></td>
</tr>
<tr>
<td>$f14-f31$</td>
<td>nonvolatile</td>
<td></td>
</tr>
<tr>
<td>FPSCR</td>
<td>limited-access</td>
<td>Floating point status and control register limited-access bits. Preservation rules governing the limited-access bits for the bit-fields [VE], [OE], [UE], [ZE], [XE], and [RN] are presented in Section 3.2.1.2.</td>
</tr>
</tbody>
</table>

ATR-CLASSIC-FLOAT & ATR-DFP

The ISA Decimal Floating-Point category extends the Power Architecture by adding a decimal floating-point unit. It uses the existing 64-bit floating-point registers and extends the FPSCR register to 64-bits, where it defines a decimal rounding-control field in the extended space.

Table 3-20. Floating-Point Register Roles for Decimal Floating-Point Types

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>$f0$</td>
<td>volatile</td>
<td></td>
</tr>
<tr>
<td>$f1$</td>
<td>volatile</td>
<td>Used for parameter passing and return values of single-precision and double-precision decimal floating-point types.</td>
</tr>
<tr>
<td>$f2-f8$</td>
<td>volatile</td>
<td>Used for parameter passing and return values of quad-precision decimal floating-point types.</td>
</tr>
<tr>
<td>$f9-f13$</td>
<td>volatile</td>
<td></td>
</tr>
<tr>
<td>$f14-f31$</td>
<td>nonvolatile</td>
<td></td>
</tr>
<tr>
<td>FPSCR</td>
<td>limited-access</td>
<td>Floating point status and control register limited-access bits. Preservation rules governing the limited-access bits for the bit-field [DRN] are presented in Section 3.2.1.2.</td>
</tr>
</tbody>
</table>
### ATR-SOFT-FLOAT

#### Table 3-21. Soft-Float General-Purpose Register Roles for Binary Floating-Point Types

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
</table>
| r3-r10   | volatile           | Volatile parameter and return value registers for float, double, and long double binary floating-point types. If the parameters are within the first eight words of the parameter list:  
  - Float values occupy a single GPR.  
  - Double values occupy adjacent GPRs.  
  - Long double values occupy four adjacent GPRs.  
  There are special rules governing how parameters that span multiple GPRs should be split between registers and the parameter save area outlined in Section 3.2.3. |

### ATR-SOFT-FLOAT & ATR-DFP

#### Table 3-22. Soft-Float General-Purpose Register Roles for Decimal Floating-Point Types

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
</table>
| r3-r10   | volatile           | Volatile parameter and return value registers for _Decimal32, _Decimal64, and _Decimal128 Decimal floating-point types. If the parameters are within the first eight words of the parameter list:  
  - _Decimal32 values occupy a single GPR.  
  - _Decimal64 values occupy adjacent GPRs.  
  - _Decimal128 values occupy four adjacent GPRs.  
  There are special rules governing how parameters that span multiple GPRs should be split between registers and the parameter save area outlined in Section 3.2.3. |
Chapter 3. Low Level System Information

ATR-VECTOR

The ISA Vector category extends the Power Architecture and provides 32 vector registers, each 128 bits wide, a special-purpose register VRSAVE, and a special-purpose register VSCR. Throughout this document the symbol vN is used, where N is a register number, to refer to vector register N.

Table 3-23. Vector Register Roles

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>v0-v1</td>
<td>volatile</td>
<td>Used for parameter passing and return values</td>
</tr>
<tr>
<td>v2</td>
<td>volatile</td>
<td>Used for parameter passing</td>
</tr>
<tr>
<td>v3-v13</td>
<td>volatile</td>
<td></td>
</tr>
<tr>
<td>v14-v19</td>
<td>volatile</td>
<td></td>
</tr>
<tr>
<td>v20-v31</td>
<td>nonvolatile</td>
<td></td>
</tr>
<tr>
<td>VRSAVE</td>
<td>nonvolatile</td>
<td>32-bit VR Save Register.</td>
</tr>
<tr>
<td>VSCR</td>
<td>limited-access</td>
<td>32-bit vector status and control register. Preservation rules governing the limited-access bits for the bit-field [NJ] are presented in Section 3.2.1.2.</td>
</tr>
</tbody>
</table>

ATR-SPE

The ISA Signal Processing Engine (SPE) category provides upper words for the 32 general-purpose registers, thus allowing them to be used in SPE APU operations to hold two 32-bit words. The Signal Processing Engine category also provides several special-purpose registers. The volatility of all 64-bit registers is the same for the upper and lower word. If only the lower word is modified by a function, only the lower word need be saved and restored.

Table 3-24. SPE Register Roles

<table>
<thead>
<tr>
<th>Register</th>
<th>Preservation Rules</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPEFSCR</td>
<td>limited-access</td>
<td>Signal processing and embedded floating-point status and control register. Preservation rules governing the limited-access bits for the bit-fields [FINXE], [FINVE], [FDBZE], [FUNFE], [FOVFE], and [FRMC] are presented in Section 3.2.1.2.</td>
</tr>
<tr>
<td>ACC</td>
<td>volatile</td>
<td>64-bit SPE accumulator register.</td>
</tr>
</tbody>
</table>

3.2.1.2. Limited-Access Bits

The Power ISA identifies a number of registers which have mutability limited to the specific bit-fields indicated in the following list:
### ATR-CLASSIC-FLOAT

**FPSCR [VE]**

The *Floating-Point Invalid Operation Exception Enable* bit [VE] of the FPSCR register.

---

**FPSCR [OE]**

The *Floating-Point Overflow Exception Enable* bit [OE] of the FPSCR register.

---

**FPSCR [UE]**

The *Floating-Point Underflow Exception Enable* bit [UE] of the FPSCR register.

---

**FPSCR [ZE]**

The *Floating-Point Zero Divide Exception Enable* bit [ZE] of the FPSCR register.

---

**FPSCR [XE]**

The *Floating-Point Inexact Exception Enable* bit [XE] of the FPSCR register.

---

**FPSCR [RN]**

The *Binary Floating-Point Rounding Control* field [RN] of the FPSCR register.
Chapter 3. Low Level System Information

-- ATR-DFP --

FPSCR [DRN]

The *DFP Rounding Control* field [DRN] of the 64-bit FPSCR register.

-- ATR-VECTOR --

VSCR [NJ]

The *Vector Non-Java Mode* field [NJ] of the VSCR register.

-- ATR-SPE --

SPEFSCR [FINXE]

The *Embedded Floating-Point Round (Inexact) Exception Enable* field [FINXE] of the SPEFSCR register.

-- ATR-SPE --

SPEFSCR [FINVE]

The *Embedded Floating-Point Invalid Operation/Input Error Exception Enable* field [FINVE] of the SPEFSCR register.

-- ATR-SPE --

SPEFSCR [FDBZE]

The *Embedded Floating-Point Divide By Zero Exception Enable* field [FDBZE] of the SPEFSCR register.

-- ATR-SPE --

SPEFSCR [FUNFE]

The *Embedded Floating-Point Underflow Exception Enable* field [FUNFE] of the SPEFSCR register.
Chapter 3. Low Level System Information

### ATR-SPE

**SPEFSCR [FOVFE]**

The *Embedded Floating-Point Overflow Exception Enable* field [FOVFE] of the SPEFSCR register.

---

### ATR-SPE

**SPEFSCR [FRMC]**

The *Embedded Floating-Point Rounding Mode Control* field [FRMC] of the SPEFSCR register.

The bits composing these bit-fields are identified as *limited-access* because this ABI manages how they are to be modified and preserved across function calls.

*Limited-access* bits may be changed across function calls only if the called function has specific permission to do so as indicated by the following conditions.

A function without permission to change the *limited-access* bits across a function call shall save the value of the register before modifying the bits and restore it before returning to its calling function.

**Limited-Access Conditions**

- Standard library functions expressly defined to change the state of limited-access bits are not constrained by nonvolatile preservation rules, e.g., the *fesetround()* and *feenableexcept()* functions.

- All other standard library functions shall save the old value of these bits on entry, change the bits for their purpose, and restore the bits before returning.

- Where a standard library function such as *qsort()* calls functions provided by an application the following rules shall be observed:
  - The limited-access bits on entry to the first call to such a callback must have the values they had on entry to the library function.
  - The limited-access bits on entry to a subsequent call to such a callback must have the values they had on exit from the previous call to such a callback.
  - The limited-access bits on exit from the library function must have the values they had on exit from the last call to such a callback.

- The compiler can directly generate code that saves and restores the limited-access bits.

- The values of the limited-access bits are unspecified on entry into a signal handler because a library or user function can temporarily modify the limited-access bits when the signal was taken.
• When setjmp() returns from a direct invocation, the limited-access bits must have the values they had on entry to setjmp; when it returns from a call to longjmp(), the limited-access bits must have the values they had on entry to longjmp().

ATR-CLASSIC-FLOAT
• C Library intrinsics, such as _FPU_SETCW(), may modify the limited-access bits of the FPSCR.

ATR-VECTOR
• The ALTIVEC PIM vec_mtvscr() intrinsic may change the limited-access NJ bit.

ATR-SPE
• The following intrinsics defined by the SPE PIM may change the limited-access bits of the SPEFCSR register:

  __ev_clr_spefscr_sovh() __ev_clr_spefscr_sov() __ev_clr_spefscr_finxs()
  __ev_clr_spefscr_finvs() __ev_clr_spefscr_fdbzs() __ev_clr_spefscr_funfs()
  __ev_clr_spefscr_fovfs() __ev_set_spefscr_frmc()

ATR-SOFT-FLOAT
• Any data stored internally by software floating-point code to describe rounding modes and enabled exceptions is subject to the same rules as limited-access register bits.

Note: The unwinder does not need to make specific allowances for limited-access bits.

3.2.2. The Stack Frame
A function shall establish a stack frame if it requires the use of nonvolatile registers, its local variable usage can’t be optimized into registers, or it calls another function. It need only allocate space for the required stack frame elements, namely the backchain pointer, the LR save area, and padding to the required alignment.

Figure 3-11 shows the relative layout of an allocated stack frame following a nonleaf function call, where the stack pointer points to the backchain word of the caller’s stack frame. In general the stack pointer always points to the backchain word of the most recently allocated stack frame.
In Figure 3-11 the green areas indicate an *optional* save area of the stack frame. Refer to Section 3.2.2.2 for a description of the optional save areas described by this ABI.

### 3.2.2.1. General Stack Frame Requirements

The following general requirements apply to all stack frames:

- The stack shall be quadword-aligned.
- The minimum stack frame size shall be 16 bytes. A minimum stack frame consists of the first two words (*backchain* word and *LR save word*), with padding to meet the 16-byte alignment requirement.
- There is no maximum stack frame defined.
- Padding shall be added to the *local variable space* of the stack frame to maintain the defined stack frame alignment in the absence of register save areas.
Chapter 3. Low Level System Information

• The stack pointer (r1), shall always point to the lowest address word of the most recently allocated stack frame.

• The stack shall start at high addresses and grow downward toward lower addresses.

• The lowest address word (the backchain word in Figure 3-11) shall point to the previously allocated stack frame. An exception occurs with the first stack frame, which shall have a value of 0 (NULL).

• If required, the stack pointer shall be decremented in the called function’s prologue and restored in the called function’s epilogue.

• The stack pointer shall be updated atomically so that, at all times, it points to a valid backchain word. This update may be achieved in a number of ways, as indicated in Section 3.3.3.3.

• Before a function calls any other functions, it shall save the value of the LR register into the LR save area of the caller’s stack frame.

  Note: An optional frame pointer may be created if necessary (e.g., as a result of dynamic allocation on the stack as described in Section 3.3.9) to address arguments or local variables.

A sample of a minimum stack frame allocation is demonstrated in Figure 3-12 containing these requirements.

Figure 3-12. Example Minimum Stack Frame Allocation

stwu 1,-32(1) - Store backchain, decr SP
mf1r 0 - Copy LR to R0
stw 0,36(1) - Store LR in previous LR save area

3.2.2.2. Optional Save Areas

This ABI provides a stack frame with a number of optional save areas. This section will indicate the relative position of these save areas in relation to each other and the primary elements of the stack frame.

Because the back chain word of a stack frame must maintain quadword alignment the following save area diagrams indicate that an optional special purpose padding element might be necessary near the low-address end of a stack frame (above the link register save).

An optional alignment padding to quadword boundary element might be necessary near the high-address end of the stack in order to quadword-align the low-address beginning of a register save area immediately below it, e.g., Figure 3-18.

Register Save Areas

ATR-CLASSIC-FLOAT

Floating-Point Register Save Area

If a function is to change the value in any nonvolatile floating-point register frn it shall first save the value frn in the Floating-Point Register Save Area in a doubleword located 8 × (32 - n) bytes before the back chain word of the previous frame, as shown in Figure 3-13.
ATR-CLASSIC-FLOAT

General-Purpose Register Save Area (with floating-point registers available)

If a function is to change the value in any nonvolatile general-purpose register \( r_n \), it shall first save the value of \( r_n \) in the general register save area, in a word located \( 4 \times (32 - n) \) bytes before the low-addressed end of the Floating-Point Register Save Area, as shown in Figure 3-13.

---

CR Save Area

CR Save-Register Save Area

If a function changes the value in any nonvolatile field of the condition register, it shall first save the
value in all the nonvolatile fields of the condition register in the CR Save Area, which is the word below the low address end of the general register save area, as shown in Figure 3-15.

**Figure 3-15. CR Save Area**

![Diagram of CR Save Area](image1)

**Figure 3-16. CR Save Area With Floating-Point Save Area**

![Diagram of CR Save Area with Floating-Point Save Area](image2)

*Figure 3-16 shows the location of the CR save area when a floating-point save area is present.*
Category Specific Save-Register Save Area

ATR-VECTOR

VRSAVE Register Save Area

Functions must ensure that the appropriate bits in the VRSAVE register are set for any vector registers they use. A function that changes the value of the VRSAVE register shall save the original value of VRSAVE into the VRSAVE save area. If the CR save area is present, the VRSAVE save area is located in the word below the CR save area. Otherwise, the VRSAVE save area is located in the word below the low address end of the general register save area. Both options are shown in Figure 3-17.
Figure 3-17. VRSAVE and Vector Register Save Areas

High Address

Address of Caller’s Back Chain

-8 x (32 - n) bytes

Start of FPR Save Area

-4 x (32 - n) bytes

Start of GPR Save Area

-4 bytes

Optional -4 bytes

(Optional Quad Word Boundary)

-16 bytes

-16 x (32 - n) bytes

Start of VR Save Area

(Quad Word Boundary)

Current Stack Frame

Back Chain

Floating-Point Register Save Area

General-Purpose Register Save Area

Optional CR Save Word

VRSAVE Save Word

Alignment Padding to Quadword Boundary

vr(31) 16-bytes

... Vector Register Save Area ...

vr(n+1) 16-bytes

vr(n) 16-bytes

Special Purpose Padding

LR Save Area

Back Chain

GPR1

Stack Pointer

Caller’s Stack Frame
Category-Specific Register Save Areas

**ATR-VECTOR**

**Vector Register Save Area**

If a function changes the value in any nonvolatile vector register \( \text{vr}_n \), it shall first save the value of \( \text{vr}_n \) in the **Vector Register Save Area**, in a quadword located 16 \( \times \) (32 - \( n \)) bytes before the low-addressed end of the **VRSAVE save area** (plus any required padding), as shown in Figure 3-17. The **Vector Register Save Area** shall have quadword alignment.

---

**Additional Category Specific Register Save Areas**

**ATR-SPE**

**SPE 64-bit General-Purpose Register Save Area**

If a function changes the value in the upper word of any nonvolatile general-purpose register \( \text{r}_n \), it shall first save the value of \( \text{r}_n \) in the **64-bit general-purpose register save area**, in a doubleword located 8 \( \times \) (32 - \( n \)) bytes before the low-addressed end of the **CR save area** (plus any required padding) if the **CR Save Area** is present. Otherwise, it is located in a doubleword 8 \( \times \) (32 - \( n \)) bytes before the low-addressed end of the **General-Purpose Register Save Area** (plus any required padding). The **64-bit General-Purpose Save Area** shall have quadword alignment. While not technically necessary, quadword alignment is required for congruence with AltiVec and VMX technology.
Chapter 3. Low Level System Information

Figure 3-18. SPE 64-bit General-Purpose Register Save Area

Note: The purpose of providing both 32-bit and 64-bit general register save areas is to reduce the stack usage for routines that use only the lower word of some nonvolatile registers, and both the lower and upper word of some other nonvolatile registers. A compiler may choose to save and restore all 64 bits of each modified nonvolatile general-purpose register, as long as the debugging information reflects this choice.
Chapter 3. Low Level System Information

Figure 3-19. Parameter Save Area and Local Variable Space

Parameter Save Area

The Parameter Save Area shall be allocated by the caller, and shall be large enough to contain the parameters needed by the caller. The calling function cannot expect that the contents of this save area are valid when returning from the callee. Refer to Figure 3-19 for information on the location of this space.

Local Variable Space

The Local Variable Space is used for allocation of local variables. If the Parameter Save Area is in use, the Local Variable Space is located immediately above it, at a higher address, otherwise it is located immediately above the LR Save word. There is no restriction on the size of this area. Refer to Figure 3-19 for information on the location of this space.

3.2.3. Parameter Passing

For the Power Architecture, it is more efficient to pass arguments to functions in registers, rather than through memory. For the Power Architecture, the following parameters can be passed in registers.

- Up to eight arguments can be passed in general-purpose registers r3 through r10

<table>
<thead>
<tr>
<th>ATR-SPE</th>
</tr>
</thead>
<tbody>
<tr>
<td>Up to eight 64-bit doubleword vector arguments are passed in general-purpose registers.</td>
</tr>
</tbody>
</table>
Chapter 3. Low Level System Information

ATR-CLASSIC-FLOAT

• Up to eight floating-point arguments can be passed in floating-point registers f1 through f8.

ATR-CLASSIC-FLOAT && ATR-DFP

• Up to eight single-precision or double-precision decimal floating-point arguments can be passed in floating-point registers f1 through f8.

ATR-CLASSIC-FLOAT && ATR-DFP

• Up to three quad-precision decimal floating-point arguments can be passed in even-odd floating-point register pairs f2 through f7.

ATR-VECTOR

• Up to 12 vector parameters can be passed in v2 through v13.

If fewer arguments are needed, then the unused registers defined previously will contain undefined values on entry to the called function.

If there are more arguments than registers, then a function must provide space for the arguments in its stack frame. When this happens, only the minimum storage needed to contain the extra arguments needs to be allocated in the stack frame.

The following algorithm describes where arguments are passed for the C language. In this algorithm, arguments are assumed to be ordered from left (first argument) to right. The actual order of evaluation for arguments is unspecified.

gr contains the number of the next available general-purpose register.

fr contains the number of the next available floating-point register.
3.2.3.1. Parameter Passing Register Selection Algorithm

Note: The following types refer to the type of the argument as declared by the function prototype. The argument values will be converted (if necessary) to the types of the prototype arguments before passing them to the called function.

If a prototype is not present, or it is a variable argument prototype and the argument is after the ellipsis, the type refers to the type of the data objects being passed to the called function.

- **INITIALIZE**: If the function return type requires a storage buffer, set gr = 4, else set gr = 3.

  **ATR-CLASSIC-FLOAT**

  Set fr = 1

- **ATR-VECTOR**

  Set vr = 2

  Set starg to the address of parameter word 1.

- **SCAN**: If there are no more arguments, terminate. Otherwise, select one of the following depending on the type of the next argument:

  - **SINGLE_GP**:

    - A single integer no more than 32 bits

  - **ATR-SOFT-FLOAT**

    - A single-precision floating-point value if prototype is present

  - **ATR-SPE**

    - A 64-bit vector if the called function is not a variable-argument function

- A pointer to a data object
A struct or union that shall be treated as a pointer to the data object, or to a copy of the data object when necessary to enforce call-by-value semantics. Only if the caller can ascertain that the data object is constant can it pass a pointer to the data object itself.

**ATR-SOFT-FLOAT & ATR-DFP**

- A single-precision decimal float

If \( gr > 10 \), go to OTHER. Otherwise, load the argument value into general-purpose register \( gr \), set \( gr = gr + 1 \), and go to SCAN. Values shorter than 32 bits are sign-extended or zero-extended, depending on whether they are signed or unsigned.

**DUAL_GP:**

- A 64-bit integer

**ATR-SOFT-FLOAT**

- A double-precision floating-point value

**ATR-SPE**

- A 64-bit vector being passed to a variable-argument function

**ATR-PASS-COMPLEX-IN-GPRS**

- A complex single-precision float

**ATR-SOFT-FLOAT & ATR-DFP**

- A double-precision decimal float

If \( gr > 9 \), go to OTHER. If \( gr \) is even, set \( gr = gr + 1 \). Load the lower-addressed word of the argument into \( gr \) and the higher-addressed word into \( gr + 1 \), set \( gr = gr + 2 \), and go to SCAN.

**QUAD_GP:**
### Chapter 3. Low Level System Information

<table>
<thead>
<tr>
<th><strong>ATR-PASS-COMPLEX-IN-GPRS</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>- A complex double-precision float</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th><strong>ATR-SOFT-FLOAT &amp;&amp; ATR-LONG-DOUBLE-IBM</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>- A long double type of IBM AIX 128-bit Long Double format when no floating-point unit is present.</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th><strong>ATR-SOFT-FLOAT &amp;&amp; ATR-DFP</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>- A quad-precision decimal float</td>
</tr>
</tbody>
</table>

If $gr > 7$, go to OTHER. Load the words of the argument, in memory-address order, into $gr$, $gr + 1$, $gr + 2$ and $gr + 3$, set $gr = gr + 4$, and go to SCAN.

<table>
<thead>
<tr>
<th><strong>ATR-LONG-DOUBLE-IBM</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>- <strong>EIGHT_GP:</strong></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th><strong>ATR-PASS-COMPLEX-IN-GPRS</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>- A complex long double type of IBM AIX 128-bit Long Double format.</td>
</tr>
</tbody>
</table>

If $gr > 3$, go to OTHER. Load the words of the argument, in memory-address order, into $gr$ through $gr + 7$, set $gr = gr + 8$, and go to SCAN.

<table>
<thead>
<tr>
<th><strong>ATR-CLASSIC-FLOAT</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>- <strong>SINGLE_FP:</strong></td>
</tr>
<tr>
<td>- A single-precision floating-point value or a double-precision floating-point value</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th><strong>ATR-DFP</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>- A single-precision decimal floating-point value or a double-precision decimal floating-point value</td>
</tr>
</tbody>
</table>
if \( fr > 8 \), go to \textit{OTHER}. Otherwise load the argument into register \( fr \), set \( fr \) to \( fr + 1 \), and go to \textit{SCAN}

---

\textbf{ATR-LONG-DOUBLE-IBM \| ATR-DFP}

\textbf{DOUBLE_FP:}

\textbf{ATR-LONG-DOUBLE-IBM}

- An extended-precision floating-point value of IBM AIX 128-bit Long Double format

\textbf{ATR-DFP}

- A quad-precision decimal floating-point value

If \( fr > 7 \), go to \textit{OTHER}.

\textbf{ATR-DFP}

If argument is quad-precision decimal floating-point value and \( fr > 6 \), go to \textit{OTHER}.

\textbf{ATR-DFP}

If argument is quad-precision decimal floating-point value and \( fr \) is odd, set (increment) \( fr \) to \( fr + 1 \), load the argument into \( fr \) [even] and \( fr + 1 \) [odd], set \( fr \) to \( fr + 2 \), and go to \textit{SCAN}.

Otherwise load the argument into \( fr \) and \( fr + 1 \), set \( fr \) to \( fr + 2 \), and go to \textit{SCAN}.

---

\textbf{ATR-VECTOR}

\textbf{SINGLE_VR:}

- A 128-bit vector type, unless being passed as one of the variable arguments to a variable-argument function.

if \( vr > 13 \), go to \textit{OTHER}. Otherwise, load the argument on register \( vr \), set \( vr \) to \( vr + 1 \), and go to \textit{SCAN}.

---

43
OTHER:

- Arguments not otherwise handled are passed in the parameter save area of the caller’s stack frame. Most of the types handled in SINGLE_GP, as defined previously, are considered to have 4-byte size and alignment, with simple integer types shorter than 32 bits sign- or zero-extended to 32 bits. Long long arguments are considered to have 8-byte size and alignment. The same 8-byte arguments that must go in aligned pairs or registers are 8-byte aligned on the stack.

ATR-PASS-COMPLEX-IN-GPRS

Complex single-precision float arguments are considered to have 8-byte size and alignment.

ATR-LONG-Double-IBM & ATR-CLASSIC-FLOAT

A long double type of IBM AIX 128-bit Long Double format is considered to have 8-byte alignment.

ATR-DFP & ATR-CLASSIC-FLOAT

Decimal floating-point data types _Decimal128, _Decimal64, and _Decimal32 are considered to have 8-byte, 8-byte, and 4-byte alignment respectively.

ATR-SPE

64-bit vector arguments are considered to have 8-byte size and alignment.

Round $starg$ up to a multiple of the alignment requirement of the argument and copy the argument byte-for-byte, beginning with its lowest addressed byte, into $starg$, ..., $starg + size - 1$. Set $starg$ to $starg + size$, and go to SCAN.

Types handled in QUAD_GP, as defined previously, are only 4-byte aligned when passed on the stack.

ATR-LONG-Double-IBM

Complex long double values of IBM AIX 128-bit Long Double format are only 4-byte aligned when passed on the stack.
If $fr > 7$ and the type is DOUBLE_FP, then set $fr = 9$ (to prevent subsequent SINGLE_FPs from being placed in registers after DOUBLE_FP arguments that would no longer fit in the registers).

If $gr > 9$ and the type is DUAL_GP, or $gr > 7$ and the type is QUAD_GP, or $gr > 3$ and the type is EIGHT_GP, then set $gr = 11$ (to prevent subsequent SINGLE_GPs from being placed in registers after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit in the registers).

3.2.3.2. Parameter Passing Examples

The following section provides some examples using the algorithm described in Section 3.2.3.1.

```c
typedef struct {
    int    a;
    double dd;
} sparm;

sparm s, t;
int    c, d, e;
long double ld;
double ff, gg, hh;

x = func(c, ff, d, ld, s, gg, t, e, hh);
```
### ATR-CLASSIC-FLOAT & ATR-LONG-DOUBLE-IBM

Table 3-25. Parameter Passing Using IBM AIX 128-bit Long Double

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>r3</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>ff</td>
<td>f1</td>
<td>(not stored)</td>
</tr>
<tr>
<td>d</td>
<td>r4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ld</td>
<td>f2, f3</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>r5</td>
<td>(not stored)</td>
</tr>
<tr>
<td>gg</td>
<td>f4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to t</td>
<td>r6</td>
<td>(not stored)</td>
</tr>
<tr>
<td>e</td>
<td>r7</td>
<td>(not stored)</td>
</tr>
<tr>
<td>hh</td>
<td>f5</td>
<td>(not stored)</td>
</tr>
</tbody>
</table>

### ATR-SOFT-FLOAT & ATR-LONG-DOUBLE-IBM

Table 3-26. Parameter Passing Using IBM AIX 128-bit Long Double and Soft-Float

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>r3</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>ff</td>
<td>r5.r6</td>
<td>(not stored)</td>
</tr>
<tr>
<td>d</td>
<td>r7</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ld</td>
<td>(none)</td>
<td>08-23 (stored in parameter save area)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>(none)</td>
<td>24-27 (stored)</td>
</tr>
<tr>
<td>gg</td>
<td>(none)</td>
<td>32-39 (stored)</td>
</tr>
<tr>
<td>ptr to t</td>
<td>(none)</td>
<td>40-43 (stored)</td>
</tr>
<tr>
<td>e</td>
<td>(none)</td>
<td>43-46 (stored)</td>
</tr>
<tr>
<td>hh</td>
<td>(none)</td>
<td>47-54 (stored)</td>
</tr>
</tbody>
</table>
Chapter 3. Low Level System Information

ATR-CLASSIC-FLOAT && ATR-LONG-Double-IS-Double

Table 3-27. Parameter Passing Using long double is double

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>r3</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>d</td>
<td>r4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ld</td>
<td>f1</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>r5</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ff</td>
<td>f2</td>
<td>(not stored)</td>
</tr>
<tr>
<td>gg</td>
<td>f3</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to t</td>
<td>r6</td>
<td>(not stored)</td>
</tr>
<tr>
<td>e</td>
<td>r7</td>
<td>(not stored)</td>
</tr>
<tr>
<td>hh</td>
<td>f4</td>
<td>(not stored)</td>
</tr>
</tbody>
</table>

ATR-SOFT-FLOAT && ATR-LONG-Double-IS-Double

Table 3-28. Parameter Passing Using long double is double and Soft-Float

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>r3</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>ff</td>
<td>r5,r6</td>
<td>(not stored)</td>
</tr>
<tr>
<td>d</td>
<td>r7</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ld</td>
<td>r9,r10</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>(none)</td>
<td>08-11 (stored in parameter save area)</td>
</tr>
<tr>
<td>gg</td>
<td>(none)</td>
<td>16-23 (stored)</td>
</tr>
<tr>
<td>ptr to t</td>
<td>(none)</td>
<td>24-27 (stored)</td>
</tr>
<tr>
<td>e</td>
<td>(none)</td>
<td>28-31 (stored)</td>
</tr>
<tr>
<td>hh</td>
<td>(none)</td>
<td>32-39 (stored)</td>
</tr>
</tbody>
</table>

ATR-VECTOR

Figure 3-21. Vector Parameter Passing Example

typedef struct {
    int    a;
    double dd;
} sparm;
sparm s, t;
int    c;
vector int va, vb;
long double ld;
double ff, gg, hh;
x = func(c, ff, va, ld, s, gg, t, vb, hh);

## ATR-VECTOR

Table 3-29. Parameter Passing of Vector Data Types

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>r3</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>ff</td>
<td>f1</td>
<td>(not stored)</td>
</tr>
<tr>
<td>va</td>
<td>v2</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ld</td>
<td>f2, f3</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>r4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>gg</td>
<td>f4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to t</td>
<td>r5</td>
<td>(not stored)</td>
</tr>
<tr>
<td>vb</td>
<td>v3</td>
<td>(not stored)</td>
</tr>
<tr>
<td>hh</td>
<td>f5</td>
<td>(not stored)</td>
</tr>
</tbody>
</table>

## ATR-SPE

Figure 3-22. SPE Parameter Passing Example

typedef struct {
    int   a;
    double dd;
} sparm;

sparm s;
int c;
__ev64_opaque__ va, vb;
float ff;
double gg;
x = func(c, ff, va, gg, vb, s);
Table 3-30. Parameter Passing of SPE Data Types

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>c</td>
<td>r3</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>ff</td>
<td>r4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>va</td>
<td>r5</td>
<td>(not stored)</td>
</tr>
<tr>
<td>gg</td>
<td>r7, r8</td>
<td>(not stored)</td>
</tr>
<tr>
<td>vb</td>
<td>r9</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>r10</td>
<td>(not stored)</td>
</tr>
</tbody>
</table>

Figure 3-23. Decimal Floating-Point Parameter Passing Example

typedef struct {
  _Decimal32 df;
  _Decimal64 dd;
  _Decimal128 dl;
} sparm;
sparm s, t;
_Decimal32 d32;
_Decimal64 d64, e64;
_Decimal128 d128, e128;

x = func(d128, d64, d32, s, t, d128, e64, e128);

Table 3-31. Decimal Floating-Point Parameter Passing on Classic Power Architecture (with FPU)

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>d128</td>
<td>f2-f3</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>d64</td>
<td>f4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>d32</td>
<td>f5</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>r3</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to t</td>
<td>r4</td>
<td>(not stored)</td>
</tr>
<tr>
<td>e64</td>
<td>f6</td>
<td>(not stored)</td>
</tr>
<tr>
<td>e128</td>
<td>(none)</td>
<td>08-23 (stored in parameter save area)</td>
</tr>
</tbody>
</table>
### ATR-SOFT-FLOAT & ATR-DFP

#### Table 3-32. Decimal Floating-Point Parameter Passing with Soft-Float (without FPU)

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Register</th>
<th>Byte Offset In Parameter Save Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>d128</td>
<td>r3-r6</td>
<td>(not stored in parameter save area)</td>
</tr>
<tr>
<td>d64</td>
<td>r7-r8</td>
<td>(not stored)</td>
</tr>
<tr>
<td>d32</td>
<td>r9</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to s</td>
<td>r10</td>
<td>(not stored)</td>
</tr>
<tr>
<td>ptr to t</td>
<td>(none)</td>
<td>08-11 (stored in parameter save area)</td>
</tr>
<tr>
<td>e64</td>
<td>(none)</td>
<td>12-19 (stored)</td>
</tr>
<tr>
<td>e128</td>
<td>(none)</td>
<td>20-35 (stored)</td>
</tr>
</tbody>
</table>

### 3.2.4. Variable Argument Lists

C programs that are intended to be portable across different compilers and architectures must use the header file `<stdarg.h>` to deal with variable argument lists. This header file contains a set of macro definitions that define how to step through an argument list. The implementation of this header file may vary across different architectures, but the interface is the same.

C programs that do not use this variable argument list header file, and assume that all the arguments are passed on the stack in increasing order on the stack are not portable, especially on architectures that pass some of the arguments in registers. The Power Architecture is one of the architectures that passes some of the arguments in registers.

### ATR-CLASSIC-FLOAT

CR bit 6 must be set by a variable argument list function caller that passes any arguments in floating-point registers. The recommended instruction to achieve this is: `creqv 6,6,6`. It is recommended that CR bit 6 be cleared by variable argument list function callers that do not pass any arguments in floating-point registers, using the instruction `crxor 6,6,6`.

The parameter list may be zero length and is only allocated when parameters are spilled.

### ATR-SPE

For variable argument functions, 64-bit vectors (both before and after the ellipsis) are passed in the low words of two consecutive registers, in the same manner as long long variables.
3.2.5. Return Values

**ATR-CLASSIC-FLOAT**

Functions that return float or double values shall place the result in register f1. The float values will be rounded to single-precision.

---

**ATR-CLASSIC-FLOAT && ATR-DFP**

Functions that return single-precision or double-precision decimal floating-point values shall return the result in register f1. Functions that return quad-precision decimal floating-point values shall return the result in the register pair f2 and f3.

---

**ATR-CLASSIC-FLOAT && ATR-LONG-Doubles-IBM**

Functions that return long double values shall place the result in registers f1 and f2.

---

**ATR-SOFT-FLOAT**

Functions shall return single-precision float values in r3, and double-precision values shall be returned with the low addressed word in r3 and the higher in r4.

---

**ATR-SOFT-FLOAT && ATR-DFP**

Functions shall return single-precision decimal floating-point values in r3, double-precision decimal float values in r3 and r4, and quad-precision decimal floating-point values in r3 through r6.
Chapter 3. Low Level System Information

**ATR-SOFT-FLOAT & ATR-LONG-DOUBLE-IBM**

Functions shall return long double values in r3 through r6.

---

**ATR-SPE**

Functions shall return values of 64-bit vector types in r3.

---

**ATR-VECTOR**

When the Vector facility is supported, functions shall return vector data type values in v2.

---

Functions that return values of the following list of types shall place the result in register r3 as signed or unsigned integers as appropriate, sign extended or zero extended to 32 bits where necessary:

- char
- enum
- short
- int
- long
- pointer to any type.
- _Bool

Aggregates or unions of any length will be returned in a storage buffer allocated by the caller. The caller will pass the address of this buffer as a hidden first argument in r3, causing the first explicit argument to be passed in r4. This hidden argument is treated as a normal formal parameter, and corresponds to the first doubleword of the parameter save area.

Functions that return values of type long long and unsigned long long shall place the result in registers r3 and r4. The lower addressed word shall be placed in register r3, and the higher addressed word shall be in register r4.

---

**ATR-PASS-COMPLEX-IN-GPRS**

Functions that return values of type _Complex float shall place the results in registers r3 and r4. The lower addressed word shall be placed in r3; the higher addressed word shall be in register r4.
3.3. Coding Examples

The following ISO C coding examples are provided as illustrations of how operations may be done, not
how they shall be done, for calling functions, accessing static data, and transferring control from one part
of a program to another. They are shown as code fragments with simplifications to explain addressing
modes, not necessarily show the optimal code sequences or compiler output. The small data area is not
used in any of them.

The previous sections explicitly specify what a program, operating system, and processor may and may
not assume and are the definitive reference to be used.

In these examples, absolute code and position-independent code are referenced.

When instructions hold absolute addresses, a program must be loaded at a specific virtual address in
order to permit the absolute code model to work.

When instructions hold relative addresses, a program can be loaded at various positions in virtual
memory and is referred to as position-independent code model.

3.3.2. Code Model Overview

A shared object file is mapped with virtual addresses to avoid conflicts with other segments in the
process. Because of this mapping, shared objects use position-independent code, which means that the
instructions do not contain any absolute addresses. Avoiding the use of absolute addresses allows shared
objects to be loaded into different virtual address spaces without code modification, which can allow
multiple processes to share the same text segment for a shared object file.

There are two techniques used to deal with position-independent code.

- First, branch instructions use an offset to the current EA (Effective Address) or use registers to hold
  addresses. The Power Architecture provides both EA-relative branch instructions and branch
  instructions that use registers. In both cases, absolute addressing is not required.
• Second, when absolute addressing is required, the value can be computed with a Global Offset Table (GOT), which holds the information for address computation. Position-independent executables or shared objects have a GOT in the data segment that holds addresses. When the system creates a memory image from the file, the GOT entries are updated to reflect the absolute virtual addresses that were assigned for the process. These data segments are private, while the text segments are shared. The Power Architecture will generate a more efficient GOT if it is less than 65,536 bytes. A larger GOT will require more general code in order to access all of its entries.

The GOT size gives programs two choices — more efficient code with a size restriction, or less efficient code without size restrictions. In the following sections, the term small model position-independent code refers to the use of efficient code with a smaller GOT (no more than 65,536 bytes), and the term large model position-independent code refers to the use of less efficient code without any restriction on the size of the GOT.

### 3.3.3. Function Prologue and Epilogue

A function’s prologue and epilogue is detailed in this section.

#### 3.3.3.1. The Purpose of a Function’s Prologue

• Create a stack frame when required.
• Save any nonvolatile registers that are used by the function.
• Save any limited-access bits that are used by the function, per the rules described earlier.

#### 3.3.3.2. The Purpose of a Function’s Epilogue

• Restore all registers and limited-access bits that were saved by the function’s prologue.
• Restore the last stack frame.
• Return to the caller.

#### 3.3.3.3. Rules for Prologue and Epilogue Sequences

Set function prologue and function epilogue code sequences are not imposed by this ABI. There are several rules that must be adhered to in order to ensure reliable and consistent call chain backtracing.

• Before a function calls any other function, it shall establish its own stack frame, whose size shall be a multiple of 16 bytes, and shall save the link register at the time of entry in the LR save area of its caller’s stack frame.
• The calling sequence does not restrict how languages leverage the local variable space of the stack frame, and there is no restriction on the size of this section.
• The parameter save area shall be allocated by the caller, and shall be large enough to contain the parameters needed by the caller. Its contents are not saved across function calls.
In instances where a function’s prologue creates a stack frame, the backchain word of the stack frame shall be updated atomically with the value of the stack pointer (r1). This task can be done by using one of the following Store Word with Update instructions:

- Store Word with Update instruction with relevant negative displacement for stack frames that are smaller than 32 KB.
- Store Word with Update Indexed instruction where the two’s complement size of the stack frame has been computed, using addis and addi or ori instructions, and then loaded into a volatile register for stack frames that are 32 KB or greater.

The deallocation of a function’s stack frame must be an atomic operation. This task can be accomplished by one of the following methods given below:

- Increment the stack pointer by the identical value that it was originally decremented in the prologue when the stack frame was created.
- Load the stack pointer (r1) with the value in the backchain word in the stack frame.

If any nonvolatile registers are to be used by the function the contents of the register must be saved into a register save area. See Section 3.2.2.2 for information on all of the optional register save areas. Saving and/or restoring nonvolatile registers used by the function can be accomplished using in-line code. Alternatively one of the system subroutines described in Section 3.3.4 may offer a more efficient alternative to in-line code, especially in cases where there are many registers to be saved or restored.

Unlike some other processors that implement the Power Architecture embedded processors may support load and store multiple Power Architecture instructions in little-endian mode. On big-endian implementations they may or may not be slower than the register-at-a-time saves, but reduce the instruction footprint.

Position independent functions which make external data references will need to load a nonvolatile register with a pointer to a Global Offset Table as show in Figure 3-26. In cases where external data references are only made from within conditional code the loading of a Global Offset Table pointer can be delayed until it is needed.

### 3.3.4. Register Saving and Restoring Functions

This section describes functions that can be used to save and restore contents of nonvolatile registers. The use of these routines, rather than performing these saves and restores inline in the prologue and epilogue of functions, can help reduce code footprint.

This section details register saving and restoring functions. The calling conventions of these functions are not standard and the executables or shared objects that use these functions must statically link them. The specific calling convention for each of these functions is described in Section 6.1.2.

---

**ATR-SPE && ATR-SOFT-FLOAT**

The use of a merged register file removes the need for distinct routines for saving and restoring floating-point registers. However, in order to conserve stack space, this ABI describes several new
routines to allow the compiler to use the minimum stack space for holding copies of nonvolatile registers. See Section 3.3.4.1 for information on the routines.

---

**ATR-SPE**

For situations where stack space is not at a premium, the compiler can elect to only use the 64-bit save and restore functions for functions that require some use of the upper halves of the registers, and traditional 32-bit save and restore functions for code that uses only classic instructions.

---

There are several cases to consider with respect to saving/restoring nonvolatile registers for a function:

- No nonvolatile registers need saving or restoring.
- Only 32-bit nonvolatile registers need to be saved or restored. In this case, the classic (32-bit) save and restore functions, or the `stmw` and `lmw` instructions, can be used.

---

**ATR-SPE**

- Only 64-bit nonvolatile registers need to be saved or restored. In this case, 64-bit versions of the classic save and restore functions can be used. There is no equivalent to `stmw/lmw` for both halves of a 64-bit register.

- A mixture of 32-bit and 64-bit nonvolatile registers need saving or restoring. To minimize complexity, the 32-bit nonvolatile registers shall be contiguous and at the upper end of the registers (rN - r31). This also allows the `stmw` and `lmw` instructions to still be used, if desired. The 64-bit nonvolatile registers shall also be contiguous (rM - r(N - 1)). The registers are saved or restored by calling both a 32-bit save and restore function and a 64-bit save and restore function.

---

Saving and restoring functions also have variants (._g for register save routines, _x and _t for register restore routines) that bundle some common prologue and epilogue operations to reduce overhead and code footprint by a few instructions. These are described in more detail in the following paragraphs.

The 32-bit save and restore functions restore consecutive 32-bit registers from register m through register 31.

---

**ATR-SPE**

The simple 64-bit save and restore functions restore consecutive 64-bit registers from register m through register 31. The more complex (CTR-based) 64-bit save and restore functions save and restore consecutive 64-bit registers from register m through register n, and use the value N - m + 1 in the CTR register to determine how many registers to save.
Higher-numbered registers are saved at higher addresses within a save area.
All of the 32-bit save and restore functions in this section expect the address of the backchain word to be contained in r11. The back chain word is the next word after the end of the 32-bit general register save area. r11 is not modified by these functions.

**ATR-SPE**

The value held in r11 for the 64-bit save and restore functions varies on the type of function.

- All the non-CTR 64-bit save and restore functions described in this section expect r11 to contain the address of the backchain word, adjusted by subtracting 144. The adjustment by 144 allows the immediate form of the 64-bit load/store instructions to be used (they have an unsigned immediate).

- The CTR-based 64-bit save and restore functions described in this section expect the CTR to contain the number of registers to save (1:18). Register r11 should be calculated by taking the 8-byte aligned address pointing to the doubleword beyond the 64-bit general register save area, adjusting it by subtracting 8 times the last (highest) 64-bit nonvolatile register number to be saved or restored and adding $8 \times 13 = 104$. These two adjustments allow positive offsets, and adjust so that the last register saved is placed directly below the 32-bit general register save area. These two adjustments allow a single routine, with fixed offsets, to be used across all potential cases. The doubleword beyond the 64-bit general-purpose register save area could be the low word of the 32-bit general-purpose register save area, the CR save word, or a pad word, depending on the number of 32-bit registers saved and the presence or absence of a CR save word.

These rules are summarized in the following table.

**Table 3-33. SPE Save And Restore Rules**

<table>
<thead>
<tr>
<th>Function Type</th>
<th>r11 Contents</th>
</tr>
</thead>
<tbody>
<tr>
<td>save &amp; restore 32-bit values (rM - r31)</td>
<td>address of backchain</td>
</tr>
<tr>
<td>save &amp; restore 64-bit values (rM - r31)</td>
<td>address of backchain (or pad word below CR save word if CR is saved) - 144</td>
</tr>
<tr>
<td>save &amp; restore 64-bit values (rM - rN, where N ≠ 32)</td>
<td>address of low end of 32-bit save area/CR save word/padding, adjusted by subtracting $(8 \times N)$ and adding 104.</td>
</tr>
</tbody>
</table>

**3.3.4.1. Details about the Functions**

Each function described in this section is a family of 18 functions with identical behavior except for the number and kind of registers affected.
Chapter 3. Low Level System Information

ATR-SPE
The function names use the notation [32/64] to designate the use of a 32 for the 32-bit general-purpose register functions and a 64 for the 64-bit general-purpose register functions. The suffix _m_ designates the portion of the name that would be replaced by the first register to be saved. That is, to save registers 18 through 31, call _save32gpr_18()_.

There are two families of register saving functions:

- The following simple register saving functions save the indicated registers and return

  _savegpr_m()_

  _savefpr_m()_

  _save64gpr_m()_

  _save64gpr_ctr_m()_

  _save64gpr_m_g()_

  _savefpr_m_g()_

  _save64gpr_m_g()_

  _save64gpr_ctr_m_g()_

  Instead these functions branch to _GLOBAL_OFFSET_TABLE_−4, relying on a blrl instruction at that address to return to the caller of the save function with the address of a Global Offset Table in the link register.

  There are three families of register restoring functions.
The following simple register restoring functions restore the indicated registers and return:

- _restgpr_m()
- _restfpr_m()

### ATR-CLASSIC-FLOAT

- _restgpr_m()
- _restfpr_m()

### ATR-SPE

- _rest32gpr_m() and _rest32gpr_m_t()
- _rest64gpr_m() and _rest64gpr_ctr_m()

The following exit functions restore the indicated registers and, relying on the registers being restored to be adjacent to the backchain word, restore the link register from the LR save word, remove the stack frame, and return through the link register:

- _restgpr_m_x()
- _restfpr_m_x()

### ATR-CLASSIC-FLOAT

- _restgpr_m_x()
- _restfpr_m_x()

### ATR-SPE

- _rest32gpr_m_x()
- _rest64gpr_m_x()

The following tail functions restore the registers, place the LR save word into r0, remove the stack frame, and return to their caller:

- _restgpr_m_t()
- _restfpr_m_t()

### ATR-CLASSIC-FLOAT

- _restgpr_m_t()
- _restfpr_m_t()
The caller can thus implement a tail call by moving r0 into the link register and branching to the tail function. The tail function then detects the call from the function above the one that made the tail call and, when done, returns directly to it.

**Note:** There are no functions _rest64gpr_ctr_m_x() or _reset64gpr_ctr_m_t(), because the backchain word is not directly above the location of the 64-bit save area in these cases. In this case, the 64-bit registers shall be restored first, followed by a call to _rest32gpr_m_x() or _rest32gpr_m_t().

**Note:** If a CR save word is used, even if only 64-bit registers are saved, _rest64gpr_m_x() and _rest64gpr_m_t() cannot be used, because the backchain word is not directly above the end of the 64-bit save area.

The following assembly code shows an example of an implementation.

```assembly
_save32gpr_14:    stw r14,-72(r11)
_save32gpr_15:    stw r15,-68(r11)
...
_save32gpr_30:    stw r30,-8(r11)
_save32gpr_31:    stw r31,-4(r11)
blr

_save64gpr_14:    evstdd r14,0(r11)
_save64gpr_15:    evstdd r15,8(r11)
...
_save64gpr_30:    evstdd r30,128(r11)
_save64gpr_31:    evstdd r31,136(r11)
blr

_save64gpr_ctr_14:  evstdd r14,0(r11)
bdz _save64gpr_ctr_done
_save64gpr_ctr_15:  evstdd r15,8(r11)
bdz _save64gpr_ctr_done
...
_save64gpr_ctr_30:  evstdd r30,128(r11)
bdz _save64gpr_ctr_done
_save64gpr_ctr_31:  evstdd r31,144(r11)
```
Chapter 3. Low Level System Information

The GOT forms of the save routines (with a suffix of \_g) all replace the \texttt{blr} with \texttt{b \_GLOBAL_OFFSET_TABLE\_ - 4}.

The exit forms of the restore routines (with a suffix of \_x) perform the following tasks in place of the \texttt{blr}:

\textbf{ATR-CLASSIC-FLOAT}

\_rest\[fg\]pr\_m\_x replaces the \texttt{blr} with
\begin{verbatim}
lwz r0,4(r11)
mr r1,r11
mtlr r0
blr
\end{verbatim}
Chapter 3. Low Level System Information

ATR-SPE

`_rest32gpr_m_x` replaces the blr with

```
lwz r0,4(r11)
mr r1,r11
mtlr r0
blr
```

`_rest64gpr_m_x` replaces the blr with

```
lwz r0,148(r11)
addi r1,r11,144
mtlr r0
blr
```

The tail functions (with a suffix of `_t`) are similar to the exit functions, except they skip the `mtlr` instruction.

ATR-SPE

Note: The CTR-based 64-bit restore functions cannot perform the exit and tail optimizations as implemented here, because the address of the backchain word and the return address are not at a fixed offset from `r11`.

Note: For slightly higher performance in the restore function variants, the `lwz` of `r0` and the restore of `r31` could be reordered (but the label for `_rest[32/64]gpr_31*()` shall now point to the `lwz` of `r0`, not the load of `r31`).

ATR-SPE

The following assembly source code provides an example restore function variant using `/_rest32gpr_m_x/`.

```
...  
_rest32gpr_30_x: lwz r30,-8(r11)  
_rest32gpr_31_x: lwz r0,4(r11)       
_lwz r31,-4(r11)       
_mtlr r0       
_mr r1,r11     
  # Change to addi r1,r11,144       
  # for _rest64gpr* blr
```

62
Chapter 3. Low Level System Information

### ATR-SPE

The following figure shows sample prologue and epilogue code with full saves of all the nonvolatile general-purpose registers (r14 through r25 as 64-bit, r26 through r31 as 32-bit) and a stack frame size of less than 32 KB. The variable \textit{len} refers to the size of the stack frame. The example assumes that the function does not alter the nonvolatile fields of the CR register and does no dynamic stack allocation.

**Note:** The following code assumes that the size of the executable or shared object in which the code appears is small enough that a relative branch can reach from any part of the text section to any part of the \textit{Global Offset Table} or the \textit{Procedure Linkage Table}. Because relative branches can reach ± 32 MB, this restriction is not considered serious. See Chapter 5 for more information.

**function:**

```assembly
  mflr r0       # Save return addr in caller’s frame
  stw r0,4(r1)  # . . .
  li r0,12      # Set up CTR with number of 64-bit
                  # registers to save.
  mr r11,r1     # Set up r11 with backchain pointer
  mtctr r0      # Establish new frame
  stwu r1,-len(r1) # Save 32-bits of some GPRs
  bl _save32gpr_26 # Adjust r11 down 24 bytes to bottom
                    # of 32-bit area, and down another 96
                    # bytes for the offset
  mflr r31      # Place GOT ptr in r31
  bl _save64gpr_ctr_14_g # Save 64-bit nonvolatile GPRs and
                          # fetch the GOT ptr
                          # Save CR here if necessary
  li r0,12      # Set up CTR with number of regs to
                  # restore
  mtctr r0      # Body of function
  addi r11,r1,-120 # Compute offset from low end of
                  # 32-bit save area
  bl _rest64gpr_ctr_14 # Restore 64-bit GPRs
                        # Restore CR here if necessary
  addi r11,r1,len # Compute backchain word address
  b _rest32gpr_26_x # Restore 32-bit GPRs and return
```

### ATR-VECTOR

#### 3.3.4.2. Register Saving and Restoring Functions (Vector)

The vector register saving and restoring functions described in this section are not part of the ABI. They are defined here only to encourage uniformity among compilers in the code used to save and restore VRs.
Chapter 3. Low Level System Information

On entry to the functions described in this section, r0 contains the address of the word just beyond the end of the vector register save area, and they leave r0 undisturbed. They modify the value of r12. The following code is an example of saving a vector register.

```assembly
_savevr_20: addi r12, r0, -192
    stvx v20, r12, r0   # save v20
_savevr_21: addi r12, r0, -176
    stvx v21, r12, r0   # save v21
_savevr_22: addi r12, r0, -160
    stvx v22, r12, r0   # save v22
_savevr_23: addi r12, r0, -144
    stvx v23, r12, r0   # save v23
_savevr_24: addi r12, r0, -128
    stvx v24, r12, r0   # save v24
_savevr_25: addi r12, r0, -112
    stvx v25, r12, r0   # save v25
_savevr_26: addi r12, r0, -96
    stvx v26, r12, r0   # save v26
_savevr_27: addi r12, r0, -80
    stvx v27, r12, r0   # save v27
_savevr_28: addi r12, r0, -64
    stvx v28, r12, r0   # save v28
_savevr_29: addi r12, r0, -48
    stvx v29, r12, r0   # save v29
_savevr_30: addi r12, r0, -32
    stvx v30, r12, r0   # save v30
_savevr_31: addi r12, r0, -16
    stvx v31, r12, r0   # save v31
    blr                  # return to epilogue
```

The following code shows how to restore a vector register.

```assembly
_restvr_20: addi r12, r0, -192
    lvx v20, r12, r0   # restore v20
_restvr_21: addi r12, r0, -176
    lvx v21, r12, r0   # restore v21
_restvr_22: addi r12, r0, -160
    lvx v22, r12, r0   # restore v22
_restvr_23: addi r12, r0, -144
    lvx v23, r12, r0   # restore v23
_restvr_24: addi r12, r0, -128
    lvx v24, r12, r0   # restore v24
_restvr_25: addi r12, r0, -112
    lvx v25, r12, r0   # restore v25
_restvr_26: addi r12, r0, -96
    lvx v26, r12, r0   # restore v26
_restvr_27: addi r12, r0, -80
    lvx v27, r12, r0   # restore v27
_restvr_28: addi r12, r0, -64
    lvx v28, r12, r0   # restore v28
_restvr_29: addi r12, r0, -48
    lvx v29, r12, r0   # restore v29
_restvr_30: addi r12, r0, -32
```

64
3.3.5. Profiling

This section describes how profiling (counting the number of times that a function is called) can be performed on the Power Architecture. Profiling is not required for ABI compliance. If profiling is supported, this implementation is one of those possible.

The code in Figure 3-24 can be inserted at the beginning of any function, before the execution of the prologue code. The following is a high-level explanation of this code.

- The link register is saved in the LR save word of the caller stack frame.
- The register r0 contains the address of the count variable, which is initialized to 0.
- The function, \_mcount(), gets called. This function increments the count variable. It also needs to restore the link register to its original value so that it can handle the case where the profiled function does not save the link register itself.

Figure 3-24. Profiling Example

```
.function_mc:
  .data
  .align 2
  .long 0
  .text
function:
  mflr r0
  addis r11,r0,.function_mc@ha
  stw r0,4(r1)
  addi r0,r11,.function_mc@l
  bl _mcount
```

NOTE: In the figure, the assembler expression `symbol@l` represents the lower-order 16 bits of the value for `symbol`. The assembly expression `symbol@ha` represents the higher-order 16 bits of the value for `symbol`, adjusted so that the addition of `symbol@l` and the shifted value of `symbol@ha` added together create the correct value of `symbol`. The adjustment is needed because `symbol@l` is a signed value.
3.3.6. Data Objects

Data objects with static storage duration are detailed here; stack resident data objects are omitted because the virtual address of stack resident data objects are derived relative to the stack or frame pointers.

The only instructions that can access memory in the Power Architecture are load and store instructions. Programs typically access memory by placing the address of the memory location into a register and accessing the memory location indirectly through the registers because Power Architecture instructions cannot hold 32-bit addresses directly. The values of symbols or their absolute virtual address are placed directly into instructions for symbolic references in absolute code.

Absolute addresses are not permitted in position-independent instructions. The signed offset into the Global Offset Table of the symbol is held in position-independent instructions that reference symbols. Then the absolute address of the table entry for the particular symbol can be derived by adding the offset to the appropriate Global Offset Table address using a general-purpose register. Figure 3-25 shows an example of this method, r31 loaded in the sample prologue.

Examples of absolute and position-independent compilations are shown in the following figures. These examples show the C language statements together with the generated assembly language. The assumption for the following figures is that only executables can use absolute addressing while shared objects can use position-independent code addressing. The figures are intended to demonstrate the compilation of each C statement independent of its context, hence there can be redundant operations in the code.

Figure 3-25. Absolute Load and Store Example

C code                                Assembly code
extern int src;                      .extern src
extern int dst;                      .extern dst
extern int *ptr;                     .extern ptr
.section       ".text"

dst = src;
lis 9,src@ha
lwz 0,src@l(9)
lis 9,dst@ha
stw 0,dst@l(9)

ptr = &dst;
lis 11,ptr@ha
lis 9,dst@ha
la 0,dst@l(9)
stw 0,ptr@l(11)

*ptr = src;
lis 9,ptr@ha
lwz 11,ptr@l(9)
lis 9,src@ha
lwz 0,src@l(9)
stw 0,0(11)
Note: The offset in the Global Offset Table where the value of the symbol is stored is given by the assembly syntax `symbol@got`. This syntax represents the address of the variable named `symbol`. The offset for this assembly syntax cannot be any larger than 16 bits. In cases where the offset is greater than 16 bits, the assembly syntax that is used is:
- High adjusted part of the offset: `symbol@got@ha`
- High part of the offset: `symbol@got@h`
- Low part of the offset: `symbol@got@l`

**Figure 3-26. Small Model Position-Independent Load and Store**

C code

```c
extern int src; .extern src
extern int dst; .extern dst
extern int *ptr; .extern ptr

.section " .text"
# GOT pointer in r31
dst = src;
lwz 9, src@got(31)
lwz 0, 0(9)
lwz 9, dst@got(31)
stw 0, 0(9)
ptr = &dst;
lwz 9, ptr@got(31)
lwz 0, dst@got(31)
stw 0, 0(9)
*ptr = src;
lwz 9, ptr@got(31)
lwz 11, 0(9)
lwz 9, src@got(31)
lwz 0, 0(9)
stw 0, 0(11)
```

**Figure 3-27. Large Model Position-Independent Load and Store**

C code

```c
extern int src; .extern src
extern int dst; .extern dst
int *ptr; .extern ptr

.section " .text"
# Assumes GOT pointer in r31
dst = src;
addis r6, r31, src@got@ha
lwz r6, src@got@l(r6)
addis r7, r31, dst@got@ha
lwz r7, dst@got@l(r7)
lwz r0, 0(r6)
stw r0, 0(r7)
ptr = & dst;
addis r6, r31, dst@got@ha
lwz r0, dst@got@l(r6)
addis r7, r31, ptr@got@ha
lwz r7, ptr@got@l(r7)
stw r0, 0(r7)
```


3.3.7. Function Calls

Direct function calls are made in programs with the Power Architecture `bl` instruction. A `bl` instruction can reach 32 MB backwards or forwards from the current position due to a self-relative branch displacement in the instruction. Therefore the size of the text segment in an executable or shared object is constrained when a `bl` instruction is used to make a function call. As depicted in the figure following, the `bl` instruction is generally used by a compiler to call a function. Two possibilities exist for the location of the function with respect to the caller:

- The called function is in the same executable or shared object as the caller. In this case the symbol is resolved by the link editor and the `bl` instructions branches directly to the called function as in Figure 3-28.

**Figure 3-28. Direct Function Call**

```
C code                                             Assembly code
---------------------------------------------------------------
extern void function();
function(); bl function
```

- The called function is not in the same executable or shared object as the caller. In this case the symbol cannot be directly resolved by the link editor. The link editor generates a branch to glue code. Subsequently the dynamic linker changes the glue code to branch to the function requested by the caller. See *Procedure Linkage Table* in Section 5.2.5.

For indirect function calls, the address of the function to be called is placed in the CTR register and a `bctrl` instruction is used to perform the indirect branch as shown in Figure 3-29, Figure 3-30, and Figure 3-31.

**Figure 3-29. Absolute Indirect Function Call**

```
C Code                                             Asm Code
---------------------------------------------------------------
extern void function();
extern void (*ptrfunc)();

.section .text
ptrfunc = function;
     lis  r11,ptrfunc@ha
     lis  r9,function@ha
     la  r0,function@l(r9)
     stw  r0,ptrfunc@l(r11)

return (*ptrfunc)(); lis  r9,ptrfunc@ha
```
Branches less than or equal to ± 64 KB (16-bit signed offset ± 32 KB) may use small model addressing. Figure 3-30 demonstrates how to make an indirect function call using small model position-independent branching.

**Figure 3-30. Small Model Position-Independent Indirect Function Call**

<table>
<thead>
<tr>
<th>C Code</th>
<th>Assembly Code</th>
</tr>
</thead>
</table>
| extern void function(); | .section .text
| extern void (*ptrfunc)(); | /* GOT pointer is in r11 */
| | ptrfunc = function;
| | lwz r9,ptrfunc@got(r11)
| | lwz r0,function@got(r11)
| | stw r0,0(r9)
| | return (*ptrfunc)(); |
| | lwz r9,ptrfunc@got(r11)
| | lwz r0,0(r9)
| | mtctr r0 |
| | bctrl |

Branches in excess of ± 64 KB must use large model addressing. Figure 3-31 demonstrates how to make an indirect function call using large model position-independent branching.

**Figure 3-31. Large Model Position-Independent Indirect Function Call**

<table>
<thead>
<tr>
<th>C Code</th>
<th>Assembly Code</th>
</tr>
</thead>
</table>
| extern void function(); | .section .got
| extern void (*ptrfunc)(); | /* got_base is the start of the .got section */
| | /* offset -0x8000 from the GOT pointer. */
| | got_base = .+32768
| | .ptrfunc .long ptrfunc
| | .function .long function
| | .section " .text"
| | /* GOT pointer in r10 */
| | ptrfunc=function
| | lwz 9,.ptrfunc@got-.got_base(r11)
| | lwz 0,.function@got-.got_base(r11)
| | stw 0,0(9)
| | (*ptrfunc) () |
| | lwz 9,.ptrfunc@got-.got_base(r11)
| | lwz 0,0(9)
| | mtctr 0 |
| | bctrl |
3.3.8. Branching

The flow of execution in a program is controlled by the use of branch instructions. Branch instructions can jump to locations up to 32 MB in either direction since they hold a value with a 64 MB range that is relative to the current location of the program execution, which is defined by the architecture.

The following figure shows the model for branch instructions.

C code
```
label: .L01:
... ...
goto label;  b .L01
```

Branch selection is provided in C with switch statements. An address table is used by the compiler to implement the switch statement selections in cases where the case labels satisfy grouping constraints. Details that are not relevant are not shown by the use of simplifying constraints in the examples that follow.

- r12 holds the selection expression.
- Case label constants begin at zero.
- The assembler names .Lcasei, .Ldefault, and .Ltab are used for the case labels, the default, and the address table respectively.

**Absolute Switch Code**

C code
```
switch(j) {
  case 0:
   ...
  case 1:
    ...
  case 3:
    ...
  default:
    ...
}
```

Assembly code
```
switch(j) {  cmplwi r12, 4
  bge  .Ldefault
  slwi r12, 2
  addis r12, r12, .Ltab@ha
  lwz r0, .Ltab@l(r12)
  mtctr r0
  ...
  .rodata
  default: .Ltab:
    ...
    .long  .Lcase0
    .long  .Lcase1
    .long  .Ldefault
    .long  .Lcase3
    .text
```

**Position-Independent Switch Code, All Models**

C code
```
switch(j) {
  case 0:
    bl  .L1
    ...
    .L1: slwi r12, 2
  case 1:
    mflr  r11
    ...
    add  r12, r12,.Ltab-.L1
    ...
  case 3:
    add  r0, r12, r11
    ...
  default:
    bc
    ...
    .Ltab:
    b  .Lcase0
```

70
3.3.9. Dynamic Stack Space Allocation

When allocated, a stack frame may be grown or shrunk dynamically as many times as necessary across the lifetime of a function. Standard calling conventions must be maintained because a subfunction can be called after the current frame is grown and that subfunction may stack, grow, shrink, and tear down a frame between dynamic stack frame allocations of the caller. The following constraints apply when dynamically growing or shrinking a stack frame:

- Maintain 16-byte alignment.
- Stack pointer adjustments shall be performed atomically so that at all times the value of the backchain word is valid.
- Maintain addressability to the previously allocated local variables.

Note: Using a frame pointer is the recognized method for maintaining addressability to arguments or local variables. For correct behavior in the cases of `setjmp()` and `longjmp()` the frame pointer shall be allocated in a nonvolatile general-purpose register.

Figure 3-32. Before Dynamic Stack Allocation

![Figure 3-32. Before Dynamic Stack Allocation](image)

An example organization of a stack frame before a dynamic allocation.

Figure 3-33. Example code to allocate n bytes:

```c
#define n 13
char *a = alloca(n);
rnd(x) = round x to be multiple of stack alignment
psave = size of parameter save area (may be zero).
p = rnd(sizeof(psave) + 8) ; Offset to the start of the dynamic allocation
```
Chapter 3. Low Level System Information

lwz 0,0(1) ; Load backchain word.
 mr 31,1 ; Frame pointer to access previously allocated.
 stwu 0,-rnd(n+15)(1) ; Store new backchain, quadword-aligned.
 addi 3,1,p ; R3 = new data area following parameter save area.

Note: Additional instructions might be needed to align the allocated data area or the stack pointer. Additional instructions will be necessary for an allocation of variable size.

Figure 3-34. After Dynamic Stack Allocation

An example organization of a stack frame after a dynamic allocation.

3.4. DWARF Definition

Although this ABI itself does not define a debugging format, DWARF (Debug with Arbitrary Record Format) (see Section 1.1) is defined here for systems that implement the DWARF specification.

The DWARF specification is used by compilers and debuggers to aid source-level or symbolic debugging. However, the format is not biased toward any particular compiler or debugger.
Per the DWARF specification, a mapping from Power Architecture registers to register numbers is required as described in Table 3-34.

Special Purpose Registers or SPRs are mapped into DWARF as 100 plus their SPR number. Performance Monitor Registers or PMRs are mapped into DWARF as 2048 plus the PMR number. Kernel debuggers that display privileged registers are to use the following DWARF register number mapping.

All instances of the Power Architecture use the following mapping for encoding registers into DWARF.

Table 3-34. Register Mappings

<table>
<thead>
<tr>
<th>Register Name</th>
<th>Number</th>
<th>Abbreviation</th>
</tr>
</thead>
<tbody>
<tr>
<td>General-purpose registers</td>
<td>0-31</td>
<td>R0-R31</td>
</tr>
<tr>
<td>Floating-point registers</td>
<td>32-63</td>
<td>F0-F31</td>
</tr>
<tr>
<td>Condition register</td>
<td>64</td>
<td>CR</td>
</tr>
<tr>
<td>Floating-point status and control register</td>
<td>65</td>
<td>FPSCR</td>
</tr>
<tr>
<td>Machine state register</td>
<td>66</td>
<td>MSR</td>
</tr>
<tr>
<td>Accumulator</td>
<td>99</td>
<td>ACC</td>
</tr>
</tbody>
</table>
| SPEs                            | 100-1123 | LR, CTR, etc.
| Vector registers                | 1124-1155| V0-V31       |
| Reserved                        | 1156-1199|              |
| SPE high parts of GPRs          | 1200-1231|              |
| Reserved                        | 1232-2047|              |
| Device control registers        | 3072-4095| DCRs         |
| Performance monitor registers   | 4096-5120| PMRs         |

3.5. Exception Handling

Where exceptions can be thrown or caught by a function, or thrown through that function, or where a thread can be canceled from within a function, the locations where nonvolatile registers have been saved must be described with unwind information. The format of this information is based on the DWARF Call Frame Information with extensions.

Any implementation that generates unwind information must also provide exception handling functions that are the same as those described in the Itanium C++ ABI, the normative text on the issue. See Section 1.1 for directions on obtaining this information.
Chapter 4. Object Files

4.3. ELF Header

The file class member of the ELF header identification array, e_ident[EI_CLASS], identifies the ELF file as 32-bit encoded by holding the value 1, defined as class ELFCLASS32.

For a big-endian encoded ELF file the data encoding member of the ELF header identification array, e_ident[EI_DATA], holds the value 2, defined as data encoding ELFDATA2MSB. For a little-endian encoded ELF file it holds the value 1, defined as data encoding ELFDATA2LSB.

The ELF header e_flags member may hold the following bit masks that are applicable on the Power Architecture.

Table 4-1. e_flags Bit Masks

<table>
<thead>
<tr>
<th>Mask</th>
<th>Value</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>EF_PPC_EMB</td>
<td>0x80000000</td>
<td>Power Architecture Embedded Flag.</td>
</tr>
<tr>
<td>EF_PPC_RELOCATABLE_LIB</td>
<td>0x00008000</td>
<td>Mark ELF file as relocatable (containing Position Independent Code, see Section 5.1.1) and intended for use in a library.</td>
</tr>
<tr>
<td>EF_PPC_RELOCATABLE</td>
<td>0x00010000</td>
<td>Mark ELF file as relocatable (containing Position Independent Code, see Section 5.1.1).</td>
</tr>
</tbody>
</table>

The ELF header e_machine member identifies the architecture of the ELF file as the Power Architecture by holding the value 20, defined as machine name EM_PPC.

4.4. Special Sections

For the Power Architecture the following special sections with their corresponding section types and attributes apply:

.got

This section holds the Global Offset Table (GOT). Further information on accessing data in the GOT is contained in Section 3.3.6. Information on the layout of the Global Offset Table is in Section 5.2.3.

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>sh_name</td>
<td>.got</td>
</tr>
<tr>
<td>sh_type</td>
<td>SHT_PROGBITS</td>
</tr>
<tr>
<td>sh_flags</td>
<td>SHF_ALLOC + SHF_WRITE</td>
</tr>
</tbody>
</table>

.plt

This section holds the Procedure Linkage Table (PLT) (see Section 5.2.5).
### ATR-SECURE-PLT

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>sh_name</td>
<td>.plt</td>
</tr>
<tr>
<td>sh_type</td>
<td>SHT_PROGBITS</td>
</tr>
<tr>
<td>sh_flags</td>
<td>SHF_ALLOC + SHF_WRITE</td>
</tr>
</tbody>
</table>

### ATR-BSS-PLT

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>sh_name</td>
<td>.plt</td>
</tr>
<tr>
<td>sh_type</td>
<td>SHT_NOBITS</td>
</tr>
<tr>
<td>sh_flags</td>
<td>SHF_ALLOC + SHF_WRITE + SHF_EXECEINSTR</td>
</tr>
</tbody>
</table>

### .sdata

Initialized data can be held in this section, which is part of the Small Data Area (SDA). Further information is found in Section 4.7.

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>sh_name</td>
<td>.sdata</td>
</tr>
<tr>
<td>sh_type</td>
<td>SHT_PROGBITS</td>
</tr>
<tr>
<td>sh_flags</td>
<td>SHF_ALLOC + SHF_WRITE</td>
</tr>
</tbody>
</table>

### .sbss

Uninitialized data (set to zero on program execution) can be held in this section, which is part of the SDA (Small Data Area). Further information is found in Section 4.7.

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>sh_name</td>
<td>.sbss</td>
</tr>
<tr>
<td>sh_type</td>
<td>SHT_NOBITS</td>
</tr>
<tr>
<td>sh_flags</td>
<td>SHF_ALLOC + SHF_WRITE</td>
</tr>
</tbody>
</table>
.PPC.EMB.apuinfo

If an APU is required this section will contain records describing which are required for a program to execute properly. See Section 4.10 for further details.

<table>
<thead>
<tr>
<th>Name</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>sh_name</td>
<td>.PPC.EMB.apuinfo</td>
</tr>
<tr>
<td>sh_type</td>
<td>SHT_NOTES</td>
</tr>
<tr>
<td>sh_flags</td>
<td>0</td>
</tr>
</tbody>
</table>

### 4.6. Symbol Table

#### 4.6.1. Symbol Values

An executable file that contains a symbol reference that is to be resolved dynamically by an associated shared object will have a symbol table entry for that symbol. This entry will identify the symbol as undefined by setting the `st_shndx` member to `SHN_UNDEF`.

An executable file that needs to compare the value of two symbol references will have a symbol table entry for that symbol where the `st_value` member is nonzero.

If the `st_value` of an undefined symbol is nonzero, the loader must resolve every reference to the named symbol to the same value. This insures that all pointers to the symbol will be identical. If `st_value` is zero, the loader may resolve these symbols to different values, for example, to point directly to the symbol in some cases or into the GOT in other cases. If no PLT entry is allocated for the symbol, then `st_value` is zero.

**ATR-SECURE-PLT**

Under the Secure-PLT ABI, if a PLT entry is allocated for a symbol reference in the executable file the value of this `st_value` member is the address of an executable PLT call code stub. This executable stub is used for branching to the virtual address held by the nonexecutable PLT entry for the symbol. The content of the PLT entry defaults to the address of a PLT symbol resolver stub, which will direct the dynamic linker to resolve the reference to the symbol. Following resolution the PLT entry holds the absolute virtual address of the symbol.

**ATR-BSS-PLT**

Under the BSS-PLT ABI this `st_value` member holds the R_PPC_REL32 relocated address into the `.plt` section for the PLT entry used to resolve the undefined symbol. This PLT entry contains executable code used to dynamically resolve the address of the target symbol. The number of instructions in this code stub varies on the distance to the target.
Referencing GOT nonlocal statics is shown in Figure 3-26 and Figure 3-27. Taking the address of nonstatic function pointers is indicated by `<symbol>@plt`. Figure 3-30 and Figure 3-31 demonstrate how to perform this action.

### 4.7. Small Data Area

The **small data area** resides within the *Data segment*. It is composed of the `.sdata` and `.sbss` sections which contain initialized and uninitialized data items, respectively. The data items in these sections are addressed by 16-bit signed offsets with respect to the base of the small data area.

The use of small data areas for data items typically results in smaller programs and faster program execution.

The small data area is adjacent to the initialized and uninitialized data in the Data segment of both executables and shared objects.

---

**ATR-BSS-PLT**

The typical order of sections in the Data segment (some possibly empty) under the BSS-PLT ABI is shown in *Figure 4-1*.

*Figure 4-1. Section Ordering Under the BSS-PLT*

```
.data
.got
.sdata
.sbss
.plt
.bss
```

---

**ATR-SECURE-PLT**

Under the Secure-PLT ABI, for security reasons, the `.got` and `.plt` may be marked read-only after relocation, which requires placing the `.got` and `.plt` with other sections that are similarly made read-only after relocation, before sections that remain read-write as shown in *Figure 4-2*. If an implementation does not mark the `.got` and `.plt` sections as read-only after relocation it may still reorder the sections as indicated or it may use the section layout as described in *Figure 4-1*. See Section 5.2.5.2 for information on the Secure-PLT ABI.

*Figure 4-2. Section Ordering Under the Secure-PLT*

```
.got
.plt
.data
.sdata
```
The size of the small data area is limited. A data item is placed in the small data area by a compiler that supports small data relative addressing based on its size. All data items up to a certain specified size (with 8 bytes being the typical default size) are placed into the small data area.

The link editor fails to build the executable file or shared object file if the default or specified size for the placement of items into the small data area results in the small data area being too large to be addressed with 16-bit relative offsets. In such a situation, recompilation with a smaller value for the size criterion must be done.

### 4.7.1. Use of the Small Data Area in Executables

In the case of executable files, the small data area may contain up to 64 KB of data items with local or global scope. The link editor defines the symbol `__SDA_BASE__` (small data area base) to be an address relative to which all data in the `.sdata` and `.sbss` sections may be addressed with 16-bit signed offsets. In case there is not a `.sdata` or a `.sbss` section, the symbol `__SDA_BASE__` is defined to be 0.

For a data item in the `.sdata` or `.sbss` sections, a compiler may generate short-form one instruction references. In an executable file, such a reference is relative to the address of `__SDA_BASE__` symbol, which is held in the small data area pointer register, r13.

At process initialization time, r13 is loaded with the value of the symbol `__SDA_BASE__`. General-purpose register r13 retains this value subsequently, i.e., its contents remain intact.

### 4.7.2. Use of the Small Data Area in Shared Objects

In a shared object under the Secure-PLT ABI, addressing `.sdata` and `.sbss` using short (16-bit) offsets is not supported and therefore using the small data area in shared objects is not supported, which is a change from the SYSV ABI.

Because the small data area follows the *Global Offset Table* in a shared object, the data in the small data area can be addressed relative to the GOT pointer. For each shared object, the symbol `__SDA_BASE__` shall have the same value possessed by the symbol `__GLOBAL_OFFSET_TABLE__`. 
Chapter 4. Object Files

Since the small data area pointer register, r13, holds the value of the executable file’s _SDA_BASE_ symbol, a shared object may not modify r13 and should not attempt to use it for referencing the shared object’s small data area.

The _GLOBAL_OFFSET_TABLE_ and _SDA_BASE_ symbols are relative to each shared object and therefore the small data area of a shared object may only contain data items having local (i.e., non global) scope.

When _GLOBAL_OFFSET_TABLE_ relative addressing is used in a shared object to access the small data area, the size of the small data area can be 32 KB at the maximum, although it can be less if it happens that the Global Offset Table is large.

A compiler may generate short-form one instruction references relative to a register that contains the address of the shared object’s _SDA_BASE_ symbol.

<table>
<thead>
<tr>
<th>Operation</th>
<th>Value</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>DW_OP_ev64_opaque_regn</td>
<td>0xe0-0xff</td>
<td>The data object addressed is in the upper and lower halves of register n, where n is 0 through 31.</td>
</tr>
</tbody>
</table>

4.10. APU Information Section

This section allows disassemblers and debuggers to properly interpret the instructions within the binary, and could also be used by operating systems to provide emulation or error checking of the APU revisions. The format matches that of typical ELF note sections, as shown in Table 4-4.
Table 4-4. Typical Elf Note Section Format

<table>
<thead>
<tr>
<th>length of name (in bytes)</th>
<th>length of data (in bytes)</th>
<th>type</th>
<th>name (null-terminated, padded to 4-byte alignment)</th>
<th>data</th>
</tr>
</thead>
</table>

For the .PPC.EMB.apuinfo section, the name shall be `APUinfo\0`, the type shall be 2, and the data shall contain a series of words containing APU information, one per word as in Table 4-5 and Table 4-6. The APU information contains two unsigned halfwords: the upper half contains the unique APU identifier, and the lower half contains the revision of that APU.

Table 4-5. Object File a.o

<table>
<thead>
<tr>
<th>Offset</th>
<th>Value</th>
<th>Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0x00000008</td>
<td>8 bytes in &quot;APUinfo\0&quot;</td>
</tr>
<tr>
<td>4</td>
<td>0x0000000C</td>
<td>12 bytes (3 words) of APU information</td>
</tr>
<tr>
<td>8</td>
<td>0x00000002</td>
<td>NOTE type 2</td>
</tr>
<tr>
<td>12</td>
<td>&quot;APUinfo\0&quot;</td>
<td>string identifying this as APU information</td>
</tr>
<tr>
<td>20</td>
<td>0x00010001</td>
<td>APU #1, revision 1</td>
</tr>
<tr>
<td>24</td>
<td>0x00020003</td>
<td>APU #2, revision 3</td>
</tr>
<tr>
<td>28</td>
<td>0x00040001</td>
<td>APU #4, revision 1</td>
</tr>
</tbody>
</table>

Table 4-6. Object File b.o

<table>
<thead>
<tr>
<th>Offset</th>
<th>Value</th>
<th>Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0x00000008</td>
<td>8 bytes in &quot;APUinfo\0&quot;</td>
</tr>
<tr>
<td>4</td>
<td>0x00000008</td>
<td>8 bytes (2 words) of APU information</td>
</tr>
<tr>
<td>8</td>
<td>0x00000002</td>
<td>NOTE type 2</td>
</tr>
<tr>
<td>12</td>
<td>&quot;APUinfo\0&quot;</td>
<td>string identifying this as APU information</td>
</tr>
<tr>
<td>20</td>
<td>0x00010002</td>
<td>APU #1, revision 2</td>
</tr>
<tr>
<td>24</td>
<td>0x00040001</td>
<td>APU #4, revision 1</td>
</tr>
</tbody>
</table>

Linkers shall merge all .PPC.EMB.apuinfo sections in the individual relocatable files into one, with merging of per-APU information as demonstrated in Table 4-7.
Table 4-7. Merged Object File b.o

<table>
<thead>
<tr>
<th>Offset</th>
<th>Value</th>
<th>Comment</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0x00000008</td>
<td>8 bytes in &quot;APUinfo\0&quot;</td>
</tr>
<tr>
<td>4</td>
<td>0x0000000C</td>
<td>12 bytes (3 words) of APU information</td>
</tr>
<tr>
<td>8</td>
<td>0x00000002</td>
<td>NOTE type 2</td>
</tr>
<tr>
<td>12</td>
<td>&quot;APUinfo\0&quot;</td>
<td>string identifying this as APU information</td>
</tr>
<tr>
<td>20</td>
<td>0x00010002</td>
<td>APU #1, revision 2</td>
</tr>
<tr>
<td>24</td>
<td>0x00020003</td>
<td>APU #2, revision 3</td>
</tr>
<tr>
<td>28</td>
<td>0x00040001</td>
<td>APU #4, revision 1</td>
</tr>
</tbody>
</table>

Note: It is assumed that a later revision of any APU is compatible with an earlier one, but the converse is not true. Thus, the resultant .PPC.EMB.apuinfo section requires APU #1 revision 2 or greater to work, and will not work on APU #1 revision 1. If an APU revision breaks backwards compatibility, it must obtain a new unique APU identifier.

Table 4-8. APU Identifiers

<table>
<thead>
<tr>
<th>APU Identifier (16 Bits)</th>
<th>APU/Extension</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x003f</td>
<td>AltiVec</td>
</tr>
<tr>
<td>0x0040</td>
<td>ISEL</td>
</tr>
<tr>
<td>0x0041</td>
<td>PMR (Performance Monitor)</td>
</tr>
<tr>
<td>0x0042</td>
<td>RFMCI (Machine-check)</td>
</tr>
<tr>
<td>0x0043</td>
<td>CACHE_LOCK (Cache-locking)</td>
</tr>
<tr>
<td>0x0100</td>
<td>e500 SPE</td>
</tr>
<tr>
<td>0x0101</td>
<td>e500 SPFP/EFS</td>
</tr>
<tr>
<td>0x0102</td>
<td>e500 BRLOCK/BR_LOCK (Branch-locking/BTB locking)</td>
</tr>
<tr>
<td>0x0104</td>
<td>VLE</td>
</tr>
<tr>
<td>0x0000..0x003E</td>
<td>Reserved for legacy use</td>
</tr>
<tr>
<td>0x0044..0x00FF</td>
<td>Reserved</td>
</tr>
</tbody>
</table>

A link editor may optionally warn when different relocatable objects require different revisions of an APU, because moving the revision up may make the executable no longer work on processors with the older revision of the APU. In this example, the link editor could emit a warning like "Warning:bumping APU #1 revision number to 2, required by b.o."

4.13. Relocation Types

The relocation entries in a relocatable file are used by the link editor to transform the contents of said file into an executable file or shared object file. The application and result of a relocation are similar for both. Several relocatable files may be combined into one output file. The link editor merges the content of the files, sets the value of all function symbols, and performs relocations.

The 32-bit Power Architecture uses Elf32_Rel a relocation entries exclusively. A relocation entry may operate upon a halfword, word, or doubleword. The r_offset member of the relocation entry
designates the first byte of the address affected by the relocation. The subfield of \texttt{r_offset} affected by a relocation is implicit in the definition of the applied relocation type. The \texttt{r_addend} member of the relocation entry serves as the relocation addend which is described per relocation formula.

A relocation type defines a set of instructions and calculations necessary to alter the subfield data of a particular relocation field.

### 4.13.1. Relocation Fields

The following relocation fields identify a subfield of an address affected by a relocation.

Bit numbers appear at the bottom of the boxes. Byte numbers appear in the top of the boxes; big-endian in the upper left corners and little-endian in the upper right corners. The byte order specified in a relocatable file’s ELF header applies to all the elements of a relocation entry, the relocation field definitions, and relocation type calculations.

**word32**

Specifies a 32-bit bit-field taking up 4 bytes maintaining 4-byte alignment unless otherwise indicated.

```
0 3 1 2 2 1 3 0
word32
0
```

**word30**

Specifies a 30-bit bit-field taking up bits 0-29 of a word, maintaining 4-byte alignment unless otherwise indicated.

```
0 3 1 2 2 1 3 0
word30
0
```

**low24**

Specifies a 24-bit bit-field taking up bits 6-29 of a word, maintaining 4-byte alignment. The other bits remain unchanged. A branch instruction is an example of this field.

```
0 3 1 2 2 1 3 0
low24
0
```
Chapter 4. Object Files

low21

Specifies a 21-bit bit-field occupying the least significant bits of a word with 4-byte alignment.

\[
\begin{array}{cccccc}
0 & 3 & 1 & 2 & 2 & 1 \\
\end{array}
\]

\[
\begin{array}{cccccc}
0 & 10 & 1 & 3 & 0 & 31 \\
\end{array}
\]

low14

Specifies a 14-bit bit-field taking up bits 16-29 and possibly bit 10 (branch prediction bit) of a word, maintaining 4-byte alignment. The other bits remain unchanged. A conditional branch instruction is an example usage.

\[
\begin{array}{cccccc}
0 & 3 & 1 & 2 & 2 & 1 \\
\end{array}
\]

\[
\begin{array}{cccccc}
0 & 10 & 15 & 16 & 29 & 30 \\
\end{array}
\]

half16

Specifies a 16-bit bit-field taking up two bytes, maintaining 2-byte alignment. The immediate field of an Add Immediate instruction is an example of this field.

\[
\begin{array}{cccccc}
0 & 1 & 1 & 0 \\
\end{array}
\]

\[
\begin{array}{cccccc}
0 & 15 & \\
\end{array}
\]

ATR-SPE

4.13.2. SPE Specific Relocation Fields

mid5

Specifies a 5-bit bit-field occupying the most significant bits of the least-significant halfword of a word with 4-byte alignment. This relocation field is used primarily for the SPE APU load/store instructions.

\[
\begin{array}{cccccc}
0 & 3 & 1 & 2 & 2 & 1 \\
\end{array}
\]

\[
\begin{array}{cccccc}
0 & 15 & 16 & 20 & 21 & 31 \\
\end{array}
\]
mid10
Specifies a 10-bit bit-field occupying bits 11 through 20 of a word with 4-byte alignment. This relocation field is used primarily for the SPE APU load/store instructions.

<table>
<thead>
<tr>
<th>0</th>
<th>3</th>
<th>1</th>
<th>2</th>
<th>2</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>10</td>
<td>11</td>
<td>20</td>
<td>21</td>
</tr>
</tbody>
</table>

**ATR-SPE**

### 4.13.4. Relocation Notations

The following notations are used in the relocation table.

**A**
Represents the addend used to compute the value of the relocatable field.

**B**
Represents the base address at which a shared object file has been loaded into memory during execution. Generally, a shared object file is built with a 0 base virtual address, but the execution address will be different. See Program Header in the System V ABI for more information about the base address.

**G**
Represents the offset into the Global Offset Table, relative to the `__GLOBAL_OFFSET_TABLE__` symbol, at which the address of the relocation entry’s symbol will reside during execution. This implies the creation of a .got section. See Section 3.3 and the Section 5.2.3 for more information.

Reference in a calculation to the value G implicitly creates a GOT entry for the indicated symbol.

**L**
Represents the section offset or address of the procedure linkage table entry for the symbol. This implies the creation of a .plt section if one does not already exist. It also implies the creation of a PLT entry for resolving the symbol. For an unresolved symbol the PLT entry points to a PLT resolver stub. For a resolved symbol a Procedure Linkage Table entry holds the final effective address of a dynamically resolved symbol (see Section 5.2.5).

**P**
Represents the place (section offset or address) of the storage unit being relocated (computed using r_offset).

**R**
Represents the offset of the symbol within the section in which the symbol is defined (its section-relative address).
S

Represents the value of the symbol whose index resides in the relocation entry.

+

Denotes 32-bit modulus addition.

-

Denotes 32-bit modulus subtraction.

>>

Denotes arithmetic right-shifting.

#lo(value)

Denotes the least significant 16 bits of the indicated value, i.e.,

#lo(x) = (x & 0xffff).

#hi(value)

Denotes bits 16 through 31 of the indicated value, i.e.,

#hi(x) = ((x >> 16) & 0xffff).

#ha(value)

Denotes the high adjusted value: bits 16 through 31 of the indicated value, compensating for #lo() being treated as a signed number, i.e.,

#ha(x) = (((x >> 16) + ((x & 0x8000) ? 1 : 0)) & 0xffff).

_SDA_BASE_

A symbol defined by the link editor whose value in shared objects is the same as _GLOBAL_OFFSET_TABLE_, and in executable programs is an address within the small data area.

_BRNTAKEN_

_BRTAKEN_

Specify whether the branch prediction bit (bit 10) should indicate that the branch will be taken or not taken, respectively. For an unconditional branch, the branch prediction bit must be 0.

The following rules apply to the relocation types defined in the relocation table described later:

- For relocation types in which the names contain 14 or 16, the upper 17 bits of the value computed before shifting must all be the same. For relocation types whose names contain 24, the upper 7 bits of the value computed before shifting must all be the same. For relocation types whose names contain 14 or 24, the low 2 bits of the value computed before shifting must all be zero.

- The relocation types whose Field column entry contains an asterisk (*) are subject to failure if the value computed does not fit in the allocated bits.
### 4.13.5. Relocation Types Table

<table>
<thead>
<tr>
<th>Relocation Name</th>
<th>Value</th>
<th>Field</th>
<th>Expression</th>
</tr>
</thead>
<tbody>
<tr>
<td>R_PPC_NONE</td>
<td>0</td>
<td>none</td>
<td>none</td>
</tr>
<tr>
<td>R_PPC_ADDR32</td>
<td>1</td>
<td>word32</td>
<td>S + A</td>
</tr>
<tr>
<td>R_PPC_ADDR24</td>
<td>2</td>
<td>low24*</td>
<td>(S + A) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_ADDR16</td>
<td>3</td>
<td>half16*</td>
<td>S + A</td>
</tr>
<tr>
<td>R_PPC_ADDR16_LO</td>
<td>4</td>
<td>half16</td>
<td>#lo(S + A)</td>
</tr>
<tr>
<td>R_PPC_ADDR16_HI</td>
<td>5</td>
<td>half16</td>
<td>#hi(S + A)</td>
</tr>
<tr>
<td>R_PPC_ADDR16_HA</td>
<td>6</td>
<td>half16</td>
<td>#ha(S + A)</td>
</tr>
<tr>
<td>R_PPC_ADDR14</td>
<td>7</td>
<td>low14*</td>
<td>(S + A) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_ADDR14_BRTAKEN</td>
<td>8</td>
<td>low14*</td>
<td>(S + A) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_ADDR14_BRNTAKEN</td>
<td>9</td>
<td>low14*</td>
<td>(S + A) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_REL24</td>
<td>10</td>
<td>low24*</td>
<td>(S + A - P) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_REL14</td>
<td>11</td>
<td>low14*</td>
<td>(S + A - P) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_REL14_BRTAKEN</td>
<td>12</td>
<td>low14*</td>
<td>(S + A - P) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_REL14_BRNTAKEN</td>
<td>13</td>
<td>low14*</td>
<td>(S + A - P) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_GOT16</td>
<td>14</td>
<td>half16*</td>
<td>G</td>
</tr>
<tr>
<td>R_PPC_GOT16_LO</td>
<td>15</td>
<td>half16</td>
<td>#lo(G)</td>
</tr>
<tr>
<td>R_PPC_GOT16_HI</td>
<td>16</td>
<td>half16</td>
<td>#hi(G)</td>
</tr>
<tr>
<td>R_PPC_GOT16_HA</td>
<td>17</td>
<td>half16</td>
<td>#ha(G)</td>
</tr>
<tr>
<td>R_PPC_PLTREL24</td>
<td>18</td>
<td>low24*</td>
<td>(L + A - P) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_COPY</td>
<td>19</td>
<td>none</td>
<td>(see Section 4.13.6)</td>
</tr>
<tr>
<td>R_PPC_GLOB_DAT</td>
<td>20</td>
<td>word32</td>
<td>S + A (see Section 4.13.6)</td>
</tr>
<tr>
<td>R_PPC_JMP_SLOT</td>
<td>21</td>
<td>none</td>
<td>(see Section 4.13.6)</td>
</tr>
<tr>
<td>R_PPC_RELATIVE</td>
<td>22</td>
<td>word32</td>
<td>B + A (see Section 4.13.6)</td>
</tr>
<tr>
<td>R_PPC_LOCAL24PC</td>
<td>23</td>
<td>low24*</td>
<td>(see Section 4.13.6)</td>
</tr>
<tr>
<td>R_PPC_UADDR32</td>
<td>24</td>
<td>word32*</td>
<td>S + A (see Section 4.13.6)</td>
</tr>
<tr>
<td>R_PPC_UADDR16</td>
<td>25</td>
<td>half16*</td>
<td>S + A (see Section 4.13.6)</td>
</tr>
<tr>
<td>R_PPC_REL32</td>
<td>26</td>
<td>word32*</td>
<td>S + A - P</td>
</tr>
<tr>
<td>R_PPC_PLT32</td>
<td>27</td>
<td>word32*</td>
<td>L</td>
</tr>
<tr>
<td>R_PPC_PLTREL32</td>
<td>28</td>
<td>word32*</td>
<td>L - P</td>
</tr>
<tr>
<td>R_PPC_PLT16_LO</td>
<td>29</td>
<td>half16</td>
<td>#lo(L)</td>
</tr>
<tr>
<td>R_PPC_PLT16_HI</td>
<td>30</td>
<td>half16</td>
<td>#hi(L)</td>
</tr>
<tr>
<td>R_PPC_PLT16_HA</td>
<td>31</td>
<td>half16</td>
<td>#ha(L)</td>
</tr>
<tr>
<td>R_PPC_SECTOFF</td>
<td>33</td>
<td>half16*</td>
<td>R + A</td>
</tr>
<tr>
<td>R_PPC_SECTOFF_LO</td>
<td>34</td>
<td>half16</td>
<td>#lo(R + A)</td>
</tr>
<tr>
<td>R_PPC_SECTOFF_HI</td>
<td>35</td>
<td>half16</td>
<td>#hi(R + A)</td>
</tr>
<tr>
<td>R_PPC_SECTOFF_HA</td>
<td>36</td>
<td>half16</td>
<td>#ha(R + A)</td>
</tr>
<tr>
<td>R_PPC_ADDR30</td>
<td>37</td>
<td>word30</td>
<td>(S + A - P) &gt;&gt; 2</td>
</tr>
</tbody>
</table>
### Table 4-10. Relocation Table - Continued

<table>
<thead>
<tr>
<th>Relocation Name</th>
<th>Value</th>
<th>Field</th>
<th>Expression</th>
</tr>
</thead>
<tbody>
<tr>
<td>38</td>
<td>...</td>
<td></td>
<td>Assigned to the PowerPC 64-bit ABI.</td>
</tr>
<tr>
<td>66</td>
<td>...</td>
<td></td>
<td>Assigned to the TLS ABI. These relocations are described in the <em>TLS Relocation Table</em> in Section 4.15.</td>
</tr>
<tr>
<td>100</td>
<td>...</td>
<td></td>
<td>Assigned for embedded system use.</td>
</tr>
<tr>
<td>116</td>
<td>...</td>
<td></td>
<td>Reserved for future use.</td>
</tr>
<tr>
<td>186</td>
<td>...</td>
<td></td>
<td>Reserved for future embedded system use.</td>
</tr>
<tr>
<td>200</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
## Chapter 4. Object Files

### ATR-SPE

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Value</th>
<th>Access</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>R_PPC_EMB_SPE_DOUBLE</td>
<td>201</td>
<td>mid5*</td>
<td>(#lo(S + A)) &gt;&gt; 3</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_WORD</td>
<td>202</td>
<td>mid5*</td>
<td>(#lo(S + A)) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_HALF</td>
<td>203</td>
<td>mid5*</td>
<td>(#lo(S + A)) &gt;&gt; 1</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_DOUBLE_SDAREL</td>
<td>204</td>
<td>mid5*</td>
<td>(#lo(S + A_SDA_BASE_)) &gt;&gt; 3</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_WORD_SDAREL</td>
<td>205</td>
<td>mid5*</td>
<td>(#lo(S + A_SDA_BASE_)) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_HALF_SDAREL</td>
<td>206</td>
<td>mid5*</td>
<td>(#lo(S + A_SDA_BASE_)) &gt;&gt; 1</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_DOUBLE_SDA2REL</td>
<td>207</td>
<td>mid5*</td>
<td>(#lo(S + A_SDA2_BASE_)) &gt;&gt; 3</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_WORD_SDA2REL</td>
<td>208</td>
<td>mid5*</td>
<td>(#lo(S + A_SDA2_BASE_)) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_HALF_SDA2REL</td>
<td>209</td>
<td>mid5*</td>
<td>(#lo(S + A_SDA2_BASE_)) &gt;&gt; 1</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_DOUBLE_SDA0REL</td>
<td>210</td>
<td>mid5*</td>
<td>(#lo(S + A)) &gt;&gt; 3</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_WORD_SDA0REL</td>
<td>211</td>
<td>mid5*</td>
<td>(#lo(S + A)) &gt;&gt; 2</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_HALF_SDA0REL</td>
<td>212</td>
<td>mid5*</td>
<td>(#lo(S + A)) &gt;&gt; 1</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_DOUBLE_SDA</td>
<td>213</td>
<td>mid10*</td>
<td>Y</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_WORD_SDA</td>
<td>214</td>
<td>mid10*</td>
<td>Y</td>
</tr>
<tr>
<td>R_PPC_EMB_SPE_HALF_SDA</td>
<td>215</td>
<td>mid10*</td>
<td>Y</td>
</tr>
</tbody>
</table>

### ATR-VLE

216

... Assigned for VLE use.

233

### ATR-SECURE-PLT

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Value</th>
<th>Access</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>R_PPC_REL16</td>
<td>249</td>
<td>half16*</td>
<td>S + A - P</td>
</tr>
<tr>
<td>R_PPC_REL16_LO</td>
<td>250</td>
<td>half16</td>
<td>#lo(S + A - P)</td>
</tr>
<tr>
<td>R_PPC_REL16_HI</td>
<td>251</td>
<td>half16</td>
<td>#hi(S + A - P)</td>
</tr>
<tr>
<td>R_PPC_REL16_HA</td>
<td>252</td>
<td>half16</td>
<td>#ha(S + A - P)</td>
</tr>
</tbody>
</table>
4.13.6. Relocation Descriptions

The following list describes relocations which can require special handling or description.

**R_PPC_GOT16**

These relocation types resemble the corresponding R_PPC_ADDR16* types, except that they refer to the address of the symbol’s *Global Offset Table* entry and additionally instruct the link editor to build a *Global Offset Table*.

**ATR-SECURE-PLT**

**R_PPC_REL16**

These relocation types are used to compute the distance between a symbol address and the current address. These relocations types are used under the Secure-PLT ABI to compute the address of the .got section because the link editor knows the fixed distance between the *GLOBAL_OFFSET_TABLE_* symbol and an address in the .text section.

**R_PPC_PLTREL24**

This relocation indicates that reference to a symbol should be resolved through a call to the symbol’s *Procedure Linkage Table* entry. Additionally it instructs the link editor to build a procedure linkage table for the executable or shared object if one is not created.

**ATR-BSS-PLT**

Under the BSS-PLT ABI this relocation type may be implemented as a direct branch and link into the executable PLT slot which holds the absolute address (after resolution) of the specified symbol. There is an implicit assumption that the *Procedure Linkage Table* for a shared object or executable will be within ± 32 MB of an instruction that branches to it.

**ATR-SECURE-PLT**

Under the Secure PLT ABI this relocation type may be implemented as a branch to a stub used for loading the symbol’s absolute address (after resolution) from its PLT slot. There is an implicit assumption that the address of the PLT entry loading stub be within ± 32 MB of an instruction that branches to it, so that the R_PPC_PLTREL24 relocation type is the only one needed for accessing it.
R_PPC_COPY

The link editor creates this relocation type for dynamic linking. Its offset member refers to a location in a writable segment. The symbol table index specifies a symbol that should exist both in the current relocatable file and in a shared object file. During execution, the dynamic linker copies data associated with the shared object’s symbol to the location specified by the offset.

R_PPC_GLOB_DAT

This relocation type resembles R_PPC_ADDR, except that it sets a Global Offset Table entry to the address of the specified symbol. This special relocation type allows determination of the correspondence between symbols and Global Offset Table entries.

R_PPC_JMP_SLOT

The link editor creates this relocation type for dynamic linking. Its offset member gives the location of a Procedure Linkage Table entry. The dynamic linker modifies the Procedure Linkage Table entry to transfer control to the designated symbol’s address (see Section 5.2.5).

R_PPC_RELATIVE

The link editor creates this relocation type for dynamic linking. Its offset member gives a location within a shared object that contains a value representing a relative address. The dynamic linker computes the corresponding virtual address by adding the virtual address at which the shared object was loaded to the relative address. Relocation entries for this type must specify 0 for the symbol table index.

R_PPC_LOCAL24PC

This relocation type resembles R_PPC_REL24, except that it uses the value of the symbol within the object, not an interposed value, for S in its calculation. The symbol referenced in this relocation normally is _GLOBAL_OFFSET_TABLE_, which additionally instructs the link editor to build the Global Offset Table.

R_PPC_UADDR*

These relocation types are the same as the corresponding R_PPC32_ADDR* types, except that the datum to be relocated is allowed to be unaligned.

ATR-SPE

R_PPC_EMB_SDA21

ATR-SPE

The most significant 11 bits at the address pointed to by the relocation entry shall be left unchanged.

If the symbol whose index is in r_info is contained in .sdata or .sbss, then the link editor shall place in the next most significant 5 bits the value 13 (for r13); if the symbol is in .PPC.EMB.sdata2 or .PPC.EMB.sbss2, then the link editor shall place in those 5 bits the value 2 (for r2); if the symbol is in .PPC.EMB.sdata0 or .PPC.EMB.sbss0, then the link editor shall place in those 5 bits the value 0 (for r0); otherwise, the link shall fail. The least significant 16 bits of this field shall be set to the address of the symbol plus the relocation entry’s r_addend value minus the appropriate base for
Chapter 4. Object Files

the symbol’s section: _SDA_BASE_ for a symbol in .sdata or .sbss, _SDA2_BASE_ for a symbol in .PPC.EMB.sdata2 or .PPC.EMB.sbss2, or 0 for a symbol in .PPC.EMB.sdata0 or .PPC.EMB.sbss0.

Note: The source register in the ori, oris, xor, and xoris instructions (bits 6-10) are encoded differently than the addi, addis, ld, and st instructions (bits 11-15). This relocation type is appropriate for add and ld instructions, but not for or and xor instructions.

ATR-SPE

R_PPC_EMB_MRKREF

The symbol whose index is in r_info shall be in a different section from the section associated with the relocation entry itself. The relocation entry’s r_offset and r_addend fields shall be ignored. Unlike other relocation types, the link editor shall not apply a relocation action to a location because of this type. This relocation type is used to prevent a link editor that does section garbage collecting from deleting an important but otherwise unreferenced section.

ATR-SPE

R_PPC_EMB_BIT_FLD

The most significant 16 bits of the relocation entry’s r_addend field shall be a value between 0 and 31, representing a big-endian bit position within the entry’s 32-bit location (e.g., 6 means the sixth most significant bit). The least significant 16 bits of r_addend shall be a value between 1 and 32, representing a length in bits. The sum of the bit position plus the length shall not exceed 32. The link editor shall replace bits starting at the bit position for the specified length with the value of the symbol, treated as a signed entity.

ATR-SPE

R_PPC_EMB_RELSDA

The link editor shall set the 16-bits at the address pointed to by the relocation entry to the address of the symbol whose index is in r_info plus the value of r_addend minus the appropriate base for the section containing the symbol: _SDA_BASE_ for a symbol in .sdata or .sbss, _SDA2_BASE_ for a symbol in .PPC.EMB.sdata2 or .PPC.EMB.sbss2, or 0 for a symbol in .PPC.EMB.sdata0 or .PPC.EMB.sbss0. If the symbol is not in one of those sections, the link shall fail.
4.15. Thread Local Storage ABI

The document *ELF Handling for Thread-Local Storage* (see Section 1.1) is the authoritative TLS ABI specification that defines the context in which information in this 32-bit Power Architecture TLS ABI must be viewed. In order to maintain congruence with that document, in this section the term module refers to an executable or shared object since both are treated similarly.

4.15.1. TLS Background

Most C/C++ implementations support (as a proposed extension to the language) the keyword __thread (the ISO C1X draft uses _Thread_local as the keyword, while C++0X uses thread_local) to be used as a storage-class specifier in variable declarations and definitions of data objects with thread storage duration. A variable declared in this manner is automatically allocated local to each thread and its lifetime is defined to be the entire execution of the thread. Any initialization value is assigned once before thread startup.

4.15.2. TLS Runtime Handling

A thread-local variable is completely identified by the module in which it is defined, along with the offset of the variable relative to the start of the TLS block for the module. A module is referenced by its index (an integer starting with 1, assigned by the run-time environment) into the Dynamic Thread Vector. The offset of the variable is kept in the st_value field of the TLS variable’s symbol table entry.

The TLS data structures follow variant I of the ELF TLS ABI. For the 32-bit Power Architecture, the specific organization of the data structures is as follows.

The Thread Control Block (TCB) is 8 bytes long, with its first 4 bytes containing the pointer to the Dynamic Thread Vector (DTV). Modules that will not be unloaded will be present at startup time; the TLS blocks for these are created consecutively and immediately follow the TCB. The offset of the TLS block of an initially available module from the TCB remains fixed after program start.

The tlsoffset(m) values for a module with index m, where m ranges 1 through M, M being the total number of modules, are computed as follows.

\[
tlsoffset(1) = \text{round}(16, \text{align}(1))
\]

\[
tlsoffset(m + 1) = \text{round}(\text{tlsoffset}(m) + \text{tlssize}(m), \text{align}(m + 1))
\]

- The function \text{round()} returns its first argument rounded up to the next multiple of its second argument:

\[
\text{round}(x, y) = y \cdot \text{ceiling}(x / y)
\]

- The function \text{ceiling()} returns the smallest integer greater than or equal to its argument, where \( n \) is an integer satisfying: \( n - 1 < x \leq n \):

\[
\text{ceiling}(x) = n
\]
In the case of Dynamic Shared Objects (DSO), TLS blocks are allocated on an as-needed basis, with the details of allocation abstracted away by the `__tls_get_addr()` function which is used to retrieve the address of any TLS variable.

The prototype for the `__tls_get_addr()` function, is defined as follows.

```c
typedef struct
{
    unsigned long int ti_module;
    unsigned long int ti_offset;
} tls_index;

extern void *__tls_get_addr (tls_index *ti);
```

The Thread Pointer (TP) is held in r2 and is used to access the TCB. The TP is initialized to point 0x7000 bytes past the end of the TCB. The TP offset allows for efficient addressing of the TCB and up to 4K-8B of other thread library information (placed before the TCB).

The following diagram shows the region of memory before and after the TCB that can be efficiently addressed by the TP:

**Figure 4-4. Thread Pointer Addressable Memory**

Each DTV pointer points 0x8000 bytes past the start of each TLS block. (For implementation reasons, the actual value stored in the DTV may point to the start of a TLS block, however values returned by accessor functions will be offset by 0x8000 bytes). This offset allows the first 64 KB of each block to be addressed from a DTV pointer using fewer machine instructions.

**Figure 4-5. TLS Block Diagram**

TLS[m] denotes the TLS block for the module with index m

DTV[m] denotes the DTV pointer for the module with index m
4.15.3. TLS Access Models

TLS data access is categorized into the following models:

- **General Dynamic TLS Model**
- **Local Dynamic TLS Model**
- **Initial Exec TLS Model**
- **Local Exec TLS Model**

Examples for each access model are provided in the following TLS Model sub-sections.

For these examples, register r31 holds to the address of the symbol `_GLOBAL_OFFSET_TABLE_` in the Global Offset Table. A different register may be used for this purpose as well.

### 4.15.3.1. General Dynamic TLS Model

Given the following code fragment, to determine the address of the a thread-local variable `x`, the `__tls_get_addr()` function is called with one parameter which is a pointer to a data object of type `tls_index`.

```c
extern __thread int x;
&x;
```

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addi 3,31,x@got@tlsgd</td>
<td>R_PPC_GOT_TLSGD16</td>
<td>x</td>
</tr>
<tr>
<td>bl __tls_get_addr(x@tlsgd)</td>
<td>R_PPC_TLSGD</td>
<td>x</td>
</tr>
<tr>
<td></td>
<td>R_PPC_REL24</td>
<td>__tls_get_addr</td>
</tr>
</tbody>
</table>

### 4.15.3.2. Local Dynamic TLS Model

For the Local Dynamic TLS Model two different relocation sequences may be used, depending on the size of the offset to the variable. For the following code sequence a different relocation sequence is used for each variable.

```c
static __thread int x1;
static __thread int x2;
&x1;
```

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_DTPMOD32</td>
<td>x</td>
</tr>
<tr>
<td>GOT[n+1]</td>
<td>R_PPC_DTPREL32</td>
<td>x</td>
</tr>
</tbody>
</table>

The relocation specifier `@got@tlsgd` causes the link editor to create a data object of type `tls_index` in the GOT. The address of this data object is loaded into the first argument register with the `addi` instruction, and a standard function call is made.
Table 4-14. Local Dynamic Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addi 3,31,x1@got@tlsld</td>
<td>R_PPC_GOT_TLSL16</td>
<td>x1</td>
</tr>
<tr>
<td>bl __tls_get_addr(x1@tlsld)</td>
<td>R_PPC_TLSL16</td>
<td>x1</td>
</tr>
<tr>
<td></td>
<td>R_PPC_REL24</td>
<td>__tls_get_addr</td>
</tr>
<tr>
<td>...</td>
<td></td>
<td></td>
</tr>
<tr>
<td>addi 9,3,x1@dtprel</td>
<td>R_PPC_DTPREL16</td>
<td>x1</td>
</tr>
<tr>
<td>...</td>
<td></td>
<td></td>
</tr>
<tr>
<td>addis 9,3,x2@dtprel@ha</td>
<td>R_PPC_DTPREL16_HA</td>
<td>x2</td>
</tr>
<tr>
<td>addi 9,9,x2@dtprel@l</td>
<td>R_PPC_DTPREL16_LO</td>
<td>x2</td>
</tr>
</tbody>
</table>

Table 4-15. Local Dynamic Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_DTPMOD32</td>
<td>x1</td>
</tr>
<tr>
<td>GOT[n+1]</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>

The relocation specifier @got@tlsld in the first instruction causes the link editor to generate a tls_index data object in the GOT with a fixed 0 offset. The code shown assumes that x1 is in the first 64k of the thread storage block, while x2 is not. To load the values of x1 and x2 instead of the address, access int variables with the following.

...  
lwz 0,x1@dtprel(3) R_PPC_DTPREL16 x1  
...  
addis 9,3,x2@dtprel@ha R_PPC_DTPREL16_HA x2  
lwz 0,x2@dtprel@l(9) R_PPC_DTPREL16_LO x2

4.15.3.3. Initial Exec TLS Model

Given the following code fragment the relocation sequence in Table 4-16 is used for the Initial Exec TLS Model.

extern __thread int x;
&x;

Table 4-16. Initial Exec Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>lwz 9,x@got@tprel(31)</td>
<td>R_PPC_GOT_TPREL16</td>
<td>x</td>
</tr>
<tr>
<td>add 9,9,x@tls</td>
<td>R_PPC_TLSL16</td>
<td>x</td>
</tr>
</tbody>
</table>
Table 4-17. Initial Exec Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_TPREL32</td>
<td>x</td>
</tr>
</tbody>
</table>

The relocation specifier @got@tprel in the first instruction causes the link editor to generate a GOT entry with a relocation that the dynamic linker will replace with the offset for x relative to the thread pointer. The relocation specifier x@tls tells the assembler to use an r2 form of the instruction, i.e., add 9,9,2 in this case, and tag the instruction with a relocation that indicates it belongs to a TLS sequence. This relocation specifier can be used later by the link editor when optimizing TLS code.

To read the contents of the variable instead of calculating its address, the add 9,9,x@tls instruction might be replaced with lwzx 0,9,x@tls

4.15.3.4. Local Exec TLS Model

Given the following code fragment, two different relocation sequences may be used, depending on the size of the offset to the variable. The sequence in Table 4-18 handles offsets within 60KB relative to the end of the TCB (where r2 points 28KB past the end of the TCB, which is immediately before the first TLS block). The sequence in Table 4-19 handles offsets past 60KB relative to the end of the TCB.

static __thread int x;
&x;

The following diagram illustrates which sequence is used:

Figure 4-6. Local Exec TLS Model Sequences

---

Table 4-18. Local Exec Initial Relocations (Sequence 1)

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addi 9,2,x1@tprel</td>
<td>R_PPC_TPREL16</td>
<td>x</td>
</tr>
</tbody>
</table>

Table 4-19. Local Exec Initial Relocations (Sequence 2)

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addis 9,2,x2@tprel@ha</td>
<td>R_PPC_TPREL16_HA</td>
<td>x</td>
</tr>
<tr>
<td>addi 9,9,x2@tprel@l</td>
<td>R_PPC_TPREL16_LO</td>
<td>x</td>
</tr>
</tbody>
</table>
4.15.4. TLS Link Editor Optimizations

When the link editor knows if the code being generated is for an executable file or for a shared object file, or when a reference to a thread-local variable in the executable is unconditionally satisfied by a definition in the executable itself, the link editor can optimize the computation of a variable’s address provided the compiler emits code sequences as described.

The following TLS link editor transformations are provided as optimizations to convert between specific TLS Access Models:

- *General Dynamic to Initial Exec*
- *General Dynamic to Local Exec*
- *Local Dynamic to Local Exec*
- *Initial Exec to Local Exec*

### 4.15.4.1. General Dynamic to Initial Exec

Table 4-20. General Dynamic to Initial Exec Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addi 3,31,x@got@tlsgd</td>
<td>R_PPC_GOT_TLSGD16</td>
<td>x</td>
</tr>
<tr>
<td>bl __tls_get_addr(x@tlsgd)</td>
<td>R_PPC_TLSGD</td>
<td>x</td>
</tr>
<tr>
<td></td>
<td>R_PPC_REL24</td>
<td>__tls_get_addr</td>
</tr>
</tbody>
</table>

Table 4-21. General Dynamic to Initial Exec Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_DTPMOD32</td>
<td>x</td>
</tr>
<tr>
<td>GOT[n+1]</td>
<td>R_PPC_DTPREL32</td>
<td>x</td>
</tr>
</tbody>
</table>

The preceding relocations are replaced by the following relocations.

Table 4-22. General Dynamic to Initial Exec Replacement Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>lwz 3,x@got@tprel(31)</td>
<td>R_PPC_GOT_TPREL16</td>
<td>x</td>
</tr>
<tr>
<td>add 3,3,2</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 4-23. General Dynamic to Initial Exec Replacement Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_TPREL32</td>
<td>x</td>
</tr>
</tbody>
</table>
4.15.4.2. General Dynamic to Local Exec

Table 4-24. General Dynamic to Local Exec Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addi 3,31,x@got@tlsd</td>
<td>R_PPC_GOT_TLSGD16</td>
<td>x</td>
</tr>
<tr>
<td>bl __tls_get_addr(x@tlsd)</td>
<td>R_PPC_TLSGD</td>
<td>x</td>
</tr>
<tr>
<td></td>
<td>R_PPC_REL24</td>
<td>__tls_get_addr</td>
</tr>
</tbody>
</table>

Table 4-25. General Dynamic to Local Exec Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_DTPMOD32</td>
<td>x</td>
</tr>
<tr>
<td>GOT[n+1]</td>
<td>R_PPC_DTPREL32</td>
<td>x</td>
</tr>
</tbody>
</table>

The preceding initial relocations are replaced by the following initial relocations. This optimization does not replace the preceding outstanding relocations.

Table 4-26. General Dynamic to Local Exec Replacement Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addis 3,2,x@tprel@ha</td>
<td>R_PPC_TPREL16_HA</td>
<td>x</td>
</tr>
<tr>
<td>addi 3,3,x@tprel@l</td>
<td>R_PPC_TPREL16_LO</td>
<td>x</td>
</tr>
</tbody>
</table>

4.15.4.3. Local Dynamic to Local Exec

Under this TLS linker optimization, the function call is replaced with an equivalent code sequence. As shown, the following dtprel sequences are left unchanged.

Table 4-27. Local Dynamic To Local Exec Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addi 3,31,x1@got@tlsld</td>
<td>R_PPC_GOT_TLSLD16</td>
<td>x1</td>
</tr>
<tr>
<td>bl __tls_get_addr(x1@tlsld)</td>
<td>R_PPC_TLSLD</td>
<td>x1</td>
</tr>
<tr>
<td></td>
<td>R_PPC_REL24</td>
<td>__tls_get_addr</td>
</tr>
<tr>
<td>..</td>
<td></td>
<td></td>
</tr>
<tr>
<td>addi 9,3,x1@dtprel</td>
<td>R_PPC_DTPREL16</td>
<td>x1</td>
</tr>
<tr>
<td>..</td>
<td></td>
<td></td>
</tr>
<tr>
<td>addis 9,3,x2@dtprel@ha</td>
<td>R_PPC_DTPREL16_HA</td>
<td>x2</td>
</tr>
<tr>
<td>addi 9,9,x2@dtprel@l</td>
<td>R_PPC_DTPREL16_LO</td>
<td>x2</td>
</tr>
</tbody>
</table>

Table 4-28. Local Dynamic To Local Exec Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_DTPMOD32</td>
<td>x1</td>
</tr>
<tr>
<td>GOT[n+1]</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
The preceding relocations are replaced by the following relocations. This optimization does not replace the preceding outstanding relocations.

Table 4-29. Local Dynamic To Local Exec Replacement Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addis 3,2,L@tprel@ha</td>
<td>R_PPC_TPREL16_HA</td>
<td>link editor generated local symbol</td>
</tr>
<tr>
<td>addi 3,3,L@tprel@l</td>
<td>R_PPC_TPREL16_LO</td>
<td>link editor generated local symbol</td>
</tr>
<tr>
<td>addi 9,3,x1@dtprel</td>
<td>R_PPC_DTPREL16</td>
<td></td>
</tr>
<tr>
<td>addi 9,3,x2@dtprel@ha</td>
<td>R_PPC_DTPREL16_HA</td>
<td></td>
</tr>
<tr>
<td>addi 9,9,x2@dtprel@l</td>
<td>R_PPC_DTPREL16_LO</td>
<td></td>
</tr>
</tbody>
</table>

The link editor generated local symbol points to the start of the thread storage block plus 0x7000 bytes. In practice, a section symbol with a suitable offset will be used.

4.15.4.4. Initial Exec to Local Exec

Table 4-30. Initial Exec to Local Exec Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>lwz 9,x@got@tprel(31)</td>
<td>R_PPC_GOT_TPREL16</td>
<td>x</td>
</tr>
<tr>
<td>add 9,9,x@tls</td>
<td>R_PPC64_TLS</td>
<td>x</td>
</tr>
</tbody>
</table>

Table 4-31. Initial Exec to Local Exec Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_TPREL32</td>
<td>x</td>
</tr>
</tbody>
</table>

The preceding relocations are replaced by the following relocations. This optimization does not replace the preceding outstanding relocations.

Table 4-32. Initial Exec to Local Exec Replacement Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addis 9,2,x@tprel@ha</td>
<td>R_PPC_TPREL16_HA</td>
<td>x</td>
</tr>
<tr>
<td>addi 9,9,x@tprel@l</td>
<td>R_PPC_TPREL16_LO</td>
<td>x</td>
</tr>
</tbody>
</table>

Other sizes and types of thread-local variables may use any of the X-form indexed load or store instructions. The \texttt{lwz} and \texttt{add} instruction, in this case, can have interleaved code inserted by the compiler.

Table 4-33 shows how to access the contents of a variable using the X-form indexed load and store instructions.
Table 4-33. Initial Exec to Local Exec X-form Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>lwz 9,x@got@tprel(31)</td>
<td>R_PPC_GOT_TPREL16</td>
<td>x</td>
</tr>
<tr>
<td>lbzx 10,9,x@tls</td>
<td>R_PPC_TLS</td>
<td></td>
</tr>
<tr>
<td>addi 10,10,1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>stbx 10,9,x@tls</td>
<td>R_PPC_TLS</td>
<td>x</td>
</tr>
</tbody>
</table>

Table 4-34. Initial Exec to Local Exec X-form Outstanding Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>GOT[n]</td>
<td>R_PPC_TPREL32</td>
<td>x</td>
</tr>
</tbody>
</table>

The preceding relocations are replaced by the following relocations. This optimization does not replace the preceding outstanding relocations.

Table 4-35. Initial Exec to Local Exec X-form Replacement Initial Relocations

<table>
<thead>
<tr>
<th>Code Sequence</th>
<th>Relocation</th>
<th>Symbol</th>
</tr>
</thead>
<tbody>
<tr>
<td>addis 9,2,x@tprel@ha</td>
<td>R_PPC_TPREL16_HA</td>
<td>x</td>
</tr>
<tr>
<td>lbz 10,x@tprel@l(9)</td>
<td>R_PPC_TPREL16_LO</td>
<td>x</td>
</tr>
<tr>
<td>addi 10,10,1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>stb 10,x@tprel@l(9)</td>
<td>R_PPC_TPREL16_LO</td>
<td>x</td>
</tr>
</tbody>
</table>

4.15.5. ELF TLS Definitions

The result of performing a relocation for a TLS symbol is module ID and its offset within the TLS block. These are then stored in the Section 5.2.3 and later obtained by the dynamic linker at run-time and passed to __tls_get_addr(), which returns the address for the variable for the current thread.

The following notations are use to explain the expressions in the Table 4-36:

\( S \)

Represent the value of the symbol whose index resides in the relocation entry.

\( A \)

Represent the addend used to compute the value of the relocatable field.

\( tp \)

The value of the thread pointer in general-purpose register 2 (r2).

\( TLS\_TP\_OFFSET \)

The constant value 0x7000, representing the offset (in bytes) of the location the thread pointer is initialized to point to, relative to the start of the thread local storage for the first initially available module.

\( TCB\_LENGTH \)

The constant value 0x8, representing the length of the TCB in bytes.
**tcb**

Represents the base address of the TCB.

tcb = (tp - (TLS_TP_OFFSET + TCB_LENGTH))

**dtv**

Represents the base address of the DTV.

dtv = tcb[0]

**dtpmod**

Represents the load module index of the load module that contains the definition of the symbol being relocated and is used to index the DTV.

**dtprel**

Represents the offset of the symbol being relocated relative to the value of dtv[dtpmod].

dtv[dtpmod] + dtprel = (S + A)

**tprel**

Represents the offset of the symbol being relocated relative to TP.

tp + tpreg = (S + A)

**tlsgd**

Allocates two contiguous entries in the GOT to hold a tls_index structure, with values dtpmod and dtprel, and computes the offset of the first entry within the GOT.

If $n$ is the offset computed:

\[
\_GLOBAL\_OFFSET\_TABLE\_[n] = dtpmod
\]

\[
\_GLOBAL\_OFFSET\_TABLE\_[n + 1] = dtprel
\]

The call to __tls_get_addr () would happen as:

\[
\_\_tls\_get\_addr ((tls\_index \*) \&\_GLOBAL\_OFFSET\_TABLE\_[n])
\]

**tlsld**

Allocates two contiguous entries in the GOT to hold a tls_index structure, with values dtpmod and zero, and computes the offset of the first entry within the GOT.

If $n$ is the offset computed:

\[
\_GLOBAL\_OFFSET\_TABLE\_[n] = dtpmod
\]

\[
\_GLOBAL\_OFFSET\_TABLE\_[n + 1] = 0
\]

The call to __tls_get_addr () would happen as:

\[
\_\_tls\_get\_addr ((tls\_index \*) \&\_GLOBAL\_OFFSET\_TABLE\_[n])
\]
tprelg

Allocates an entry in the GOT with value tprel, and computes the offset of the entry within the GOT.

If $n$ is the offset computed:

$$\_\text{GLOBAL\_OFFSET\_TABLE\_[}n\_\text{]} = \text{tprel}$$

The value of tprel is loaded into a register from the location ($\_\text{GLOBAL\_OFFSET\_TABLE\_} + n$) to be used in an r2 form instruction.

Note: Relocations not using the #ha(), #hi(), and #lo() modifiers (those flagged with an asterisk(*)) will trigger a relocation failure if the value computed does not fit in the field specified.
Table 4-36. TLS Relocation Table

<table>
<thead>
<tr>
<th>Relocation Name</th>
<th>Value</th>
<th>Field</th>
<th>Expression</th>
</tr>
</thead>
<tbody>
<tr>
<td>R_PPC_TLS</td>
<td>67</td>
<td>none</td>
<td>none</td>
</tr>
<tr>
<td>R_PPC_DTPMOD32</td>
<td>68</td>
<td>word32</td>
<td>dtpmod</td>
</tr>
<tr>
<td>R_PPC_TPREL16</td>
<td>69</td>
<td>half16*</td>
<td>tprel</td>
</tr>
<tr>
<td>R_PPC_TPREL16_LO</td>
<td>70</td>
<td>half16</td>
<td>#lo(tprel)</td>
</tr>
<tr>
<td>R_PPC_TPREL16_HI</td>
<td>71</td>
<td>half16</td>
<td>#hi(tprel)</td>
</tr>
<tr>
<td>R_PPC_TPREL16_HA</td>
<td>72</td>
<td>half16</td>
<td>#ha(tprel)</td>
</tr>
<tr>
<td>R_PPC_TPREL16_HA</td>
<td>73</td>
<td>word32</td>
<td>tprel</td>
</tr>
<tr>
<td>R_PPC_DTPREL16</td>
<td>74</td>
<td>half16*</td>
<td>dtprel</td>
</tr>
<tr>
<td>R_PPC_DTPREL16_LO</td>
<td>75</td>
<td>half16</td>
<td>#lo(dtprel)</td>
</tr>
<tr>
<td>R_PPC_DTPREL16_HI</td>
<td>76</td>
<td>half16</td>
<td>#hi(dtprel)</td>
</tr>
<tr>
<td>R_PPC_DTPREL16_HA</td>
<td>77</td>
<td>half16</td>
<td>#ha(dtprel)</td>
</tr>
<tr>
<td>R_PPC_DTPREL32</td>
<td>78</td>
<td>word32</td>
<td>dtprel</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSGD16</td>
<td>79</td>
<td>half16*</td>
<td>tlsgd</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSGD16_LO</td>
<td>80</td>
<td>half16</td>
<td>#lo(tlsgd)</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSGD16_HI</td>
<td>81</td>
<td>half16</td>
<td>#hi(tlsgd)</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSGD16_HA</td>
<td>82</td>
<td>half16</td>
<td>#ha(tlsgd)</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSLD16</td>
<td>83</td>
<td>half16*</td>
<td>tlsld</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSLD16_LO</td>
<td>84</td>
<td>half16</td>
<td>#lo(tlsld)</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSLD16_HI</td>
<td>85</td>
<td>half16</td>
<td>#hi(tlsld)</td>
</tr>
<tr>
<td>R_PPC_GOT_TLSLD16_HA</td>
<td>86</td>
<td>half16</td>
<td>#ha(tlsld)</td>
</tr>
<tr>
<td>R_PPC_GOT_TPREL16</td>
<td>87</td>
<td>half16*</td>
<td>tprelg</td>
</tr>
<tr>
<td>R_PPC_GOT_TPREL16_LO</td>
<td>88</td>
<td>half16</td>
<td>#lo(tprelg)</td>
</tr>
<tr>
<td>R_PPC_GOT_TPREL16_HI</td>
<td>89</td>
<td>half16</td>
<td>#hi(tprelg)</td>
</tr>
<tr>
<td>R_PPC_GOT_TPREL16_HA</td>
<td>90</td>
<td>half16</td>
<td>#ha(tprelg)</td>
</tr>
<tr>
<td></td>
<td>91</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>94</td>
<td></td>
<td>Reserved for future TLS ABI use.</td>
</tr>
<tr>
<td>R_PPC_TLSGD</td>
<td>95</td>
<td>none</td>
<td>none</td>
</tr>
<tr>
<td>R_PPC_TLSLD</td>
<td>96</td>
<td>none</td>
<td>none</td>
</tr>
<tr>
<td></td>
<td>97</td>
<td></td>
<td>Reserved for future TLS ABI use.</td>
</tr>
</tbody>
</table>
TLS Relocation Descriptions

R_PPC_TLS
R_PPC_TLSGD
R_PPC_TLSLD

These are marker relocations that tie together instructions in TLS code sequences. They allow the link editor to reliably optimize TLS code. R_PPC_TLSGD and R_PPC_TLSLD shall be emitted immediately before their associated __tls_get_addr call relocation.

ATR-TLS
Chapter 5. Program Loading and Dynamic Linking

5.1. Program Loading

A number of criteria constrain the mapping of an executable file or shared object file to virtual memory segments. During mapping, the operating system may employ delayed physical reads to improve performance, which necessitates that file offsets and virtual addresses are congruent, modulo the page size.

Page size must be less than or equal to the operating system implemented congruency. This ABI defines 64 KB congruency as the minimum allowable. To maintain interoperability between operating system implementations, 64K congruency is recommended.

Note: There is historical precedence for 64 KB congruency in that there is synergy with the Power Architecture instruction set whereby high and high adjusted relocations can be easily performed using addi or addis instructions.

The value of the p_align member of the program header struct must be 0x10000 which indicates that segments are aligned on 64 KB boundaries. The size of each segment is defined to be a positive, integral power of two, but no less than 64 KB.

The following program header information will illustrate an application that is mapped with a base address of 0x10000000:

<table>
<thead>
<tr>
<th>Header Member</th>
<th>Text Segment</th>
<th>Data Segment</th>
</tr>
</thead>
<tbody>
<tr>
<td>p_type</td>
<td>PT_LOAD</td>
<td>PT_LOAD</td>
</tr>
<tr>
<td>p_offset</td>
<td>0x000000</td>
<td>0x000af0</td>
</tr>
<tr>
<td>p_vaddr</td>
<td>0x10000000</td>
<td>0x10010af0</td>
</tr>
<tr>
<td>p_paddr</td>
<td>0x10000000</td>
<td>0x10010af0</td>
</tr>
<tr>
<td>p_filesz</td>
<td>0x00af0</td>
<td>0x00124</td>
</tr>
<tr>
<td>p_memsz</td>
<td>0x00af0</td>
<td>0x00128</td>
</tr>
<tr>
<td>p_flags</td>
<td>R-E</td>
<td>RW-</td>
</tr>
<tr>
<td>p_align</td>
<td>0x10000</td>
<td>0x10000</td>
</tr>
</tbody>
</table>

Note: For the PT_LOAD entry describing the data segment, the p_memsz may be greater than the p_filesz. The difference is the size of the .bss section. On implementations that use virtual memory file mapping, only the portion of the file between the .data p_offset (rounded down to the nearest page) to p_offset + p_filesz (rounded up to the next page size) is included. If the distance between p_offset + p_filesz and p_offset + p_memsz crosses a page boundary then additional memory must be allocated out of anonymous memory to include data through p_vaddr + p_memsz.

Table 5-2 demonstrates a typical mapping of file to memory segments.
Chapter 5. Program Loading and Dynamic Linking

Table 5-2. Memory Segment Mappings

<table>
<thead>
<tr>
<th>File</th>
<th>Section</th>
<th>Virtual Address</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x0</td>
<td>header</td>
<td>0x10000000</td>
</tr>
<tr>
<td>0x100</td>
<td>.text</td>
<td>0x10000100</td>
</tr>
<tr>
<td>0xaf0</td>
<td>.data</td>
<td>0x10010af0</td>
</tr>
<tr>
<td>0xc14</td>
<td>.bss</td>
<td>0x10010c14</td>
</tr>
<tr>
<td>0xc18</td>
<td>.dataend</td>
<td>0x10010c18</td>
</tr>
</tbody>
</table>

Operating systems typically enforce memory permission on a per-page granularity. This ABI maintains that the memory permissions are consistent across each memory segment when a File image is mapped to a process Memory Segment. The Text Segment and Data Segment require differing memory permissions. To maintain congruency of file offset to virtual address modulo the page size the system will map the file region holding the overlapped text and data twice at different virtual addresses for each segment (see Figure 5-1).

**ATR-SECURE-PLT**

Under the Secure-PLT ABI, certain sections of the Data Segment may be protected as read-only after the pages are mapped and relocations are resolved. See Section 5.2.5.2 for more information.
As a result of this mapping there can be up to four pages of impure text or data in the virtual memory segments for the application as described in the following list:

1. ELF header information, program headers, and other information will precede the .text section and reside at the beginning of the Text Segment.

2. The last memory page of the Text Segment can contain a copy of the partial, first file image Data page as an artifact of page faulting the last file image Text page from the file image to the Text Segment while maintaining the required offsets as shown in Figure 5-1.
3. Likewise, the first memory page of the Data Segment may contain a copy of the partial, last file image Text page as an artifact of page faulting the first file image Data page from the file image to the Data Segment while maintaining the required offsets.

4. The last faulted Data Segment memory page may contain residual data from the last file image Data page that is not part of the actual file image. The system is required to zero this residual memory; after that page is mapped to the Data Segment. If the application requires static data, the remainder of this page is used for that purpose. If the static data requirements exceed the remnant left in the last faulted memory page, additional pages shall be mapped from anonymous memory and zeroed.

Note: The handling of the contents of the first three impure pages is undefined by this ABI.

5.1.1. Addressing Models

When mapping an executable file or shared object file to memory the system can utilize the following addressing models. Each application is allocated its own virtual address space.

- Traditionally executable files have been mapped to virtual memory using an absolute addressing model, where the mapping of the sections to segments uses the section $p_vaddr$ specified by the ELF header directly as an absolute address.

- The Position-Independent Code (PIC) addressing model allows the file image Text of an executable file or shared object file to be loaded into the virtual address space of a process at an arbitrary starting address chosen by the kernel loader or program interpreter (dynamic linker).

  Note: Shared objects need to use the PIC addressing model so that all references to global variables go through the Global Offset Table.

  Note: Position-independent executables should use the PIC addressing model.

5.2. Dynamic Linking

5.2.1. Program Interpreter

For dynamic linking the standard program interpreter is /lib/ld.so.1.

5.2.2. Dynamic Section

The dynamic section provides information used by the dynamic linker to manage dynamically loaded shared objects, including relocation, initialization, and termination when loaded or unloaded, resolving dependencies on other shared objects, resolving references to symbols in the shared object, and supporting debugging. The following dynamic tags are relevant to this processor specific ABI:
Chapter 5. Program Loading and Dynamic Linking

DT_PLTGOT

The \texttt{d_ptr} member of this dynamic tag holds the address of the first byte of the \textit{Procedure Linkage Table}.

DT_JMPREL

The \texttt{d_ptr} member of this dynamic tag points to the first byte of the table of relocation entries which have a one-to-one correspondence with PLT entries. Any executable or shared object with a PLT must have DT_JMPREL. A shared object containing only data will not have a PLT and thus will not have DT_JMPREL.

5.2.3. Global Offset Table

To support position independent code, a Global Offset Table (GOT) shall be constructed by the link editor in the Data Segment when linking code containing any of the various R_PPC_GOT* relocations or when linking code that references the _GLOBAL_OFFSET_TABLE_ symbol. The link editor shall emit dynamic relocations as appropriate for each entry in the GOT. At runtime, the dynamic linker will apply these relocations once addresses of all memory segments are known (and thus the addresses of all symbols). At that point, the GOT may be considered to be an array of absolute addresses, but note that this ABI does not preclude the GOT containing nonaddress entries.

Absolute addresses are generated for all GOT relocations by the dynamic linker before giving control to any process image code. The dynamic linker is free to choose different memory segment addresses for the executable or shared objects in a different process image. After the initial mapping of the process image by the dynamic linker, memory segments reside at fixed addresses for the life of a process.

The symbol _GLOBAL_OFFSET_TABLE_ may be used to access the GOT or in GOT-relative addressing to other data constructs, such as the Procedure Linkage Table. The symbol may be offset by 0x8000 bytes from the start of the .got section. This offset allows the use of the full (64KB) signed range of 16-bit displacement fields by using both positive and negative subscripts into the array of addresses.

5.2.3.1. Global Offset Table Under The Secure-PLT ABI

Under the Secure-PLT ABI, a writable segment cannot be executable and an executable segment cannot be writable. Therefore, the GOT shall be nonexecutable. A program may calculate the address of the GOT by using the position independent code shown in Table 4-9.

Figure 5-2. Loading the Address of _GLOBAL_OFFSET_TABLE_ Under the Secure-PLT ABI

\begin{verbatim}
bcl 20,31,1f
1: mflr 30
    addis 30,30,(got-1b)@ha
    addi 30,30,(got-1b)@l
\end{verbatim}

In Figure 5-2 the computed address of the _GLOBAL_OFFSET_TABLE_ symbol is placed in r30. Using r30 to hold the address of the _GLOBAL_OFFSET_TABLE_ symbol is the current convention used by the compiler and link-editor and is only required for nonleaf routines which use the PIC addressing
model. Leaf routines or code not using the PIC addressing model may use any available unreserved general-purpose register to hold the address of the _GLOBAL_OFFSET_TABLE_ symbol. See Section 5.2.5.2 for more information on this convention.

Under the Secure-PLT ABI three words in the Global Offset Table are reserved:

- \texttt{GLOBAL\_OFFSET\_TABLE\[0\]}  
  Initialized to the link-time address of the .dynamic section by the link editor.

- \texttt{GLOBAL\_OFFSET\_TABLE\[1\]}  
  Initialized to the address of \texttt{dl\_runtime\_resolve} by the dynamic linker.

- \texttt{GLOBAL\_OFFSET\_TABLE\[2\]}  
  Reserved for use by the dynamic linker. This entry holds a parameter of \texttt{dl\_runtime\_resolve}.

\begin{verbatim}
ATR-SECURE-PLT
\end{verbatim}

\begin{verbatim}
ATR-BSS-PLT
\end{verbatim}

5.2.3.2. Global Offset Table Under The BSS-PLT ABI

Under the BSS-PLT ABI four words in the Global Offset Table are reserved:

- \texttt{GLOBAL\_OFFSET\_TABLE\[-1\]}  
  Holds the \texttt{btrl} instruction.

- \texttt{GLOBAL\_OFFSET\_TABLE\[0\]}  
  Initialized by the link editor to the address of the .dynamic section. The dynamic linker uses this address (by referencing the symbol \_DYNAMIC, which holds the address of the .dynamic section) to determine the run-time load address of shared objects and of the dynamic linker itself.

- \texttt{GLOBAL\_OFFSET\_TABLE\[1\]}  
  Reserved for future use.

- \texttt{GLOBAL\_OFFSET\_TABLE\[2\]}  
  Reserved for future use.

The program text in \textit{Figure 5-3} may be used to load the address of the \texttt{GLOBAL\_OFFSET\_TABLE} symbol into a general purpose register (in this case r31).

\begin{verbatim}
Figure 5-3. Loading the Address of _GLOBAL\_OFFSET\_TABLE_ Under the BSS-PLT ABI
\end{verbatim}

\begin{verbatim}
b1 \_GLOBAL\_OFFSET\_TABLE\[-4\]@local
mflr r31
\end{verbatim}
5.2.4. Function Addresses

The following requirements concern function addresses.

**When referencing a function address:**

Intraobject executable or shared object function address references may be resolved by the dynamic linker to the absolute virtual address of the symbol.

**ATR-SECURE-PLT**

Function address references from within the executable file to a function defined in a shared object file are resolved by the link editor to the `.text` section address of the Secure-PLT call stub for that function within the executable file.

**ATR-BSS-PLT**

Function address references from within the executable file to a function defined in a shared object file are resolved by the link editor to the address of the PLT entry for that function within the executable file.

**When comparing function addresses:**

The address of a function shall compare to the same value in executables and shared objects.

For intraobject comparisons of function addresses within the executable or shared object the link editor may directly compare the absolute virtual addresses.

**ATR-SECURE-PLT**

For a function address comparison where an executable references a function defined in a shared object, the link editor will place the address of a `.text` section Secure-PLT call stub for that function in the corresponding dynamic symbol table entry’s `st_value` field (see Section 4.6.1).

**ATR-BSS-PLT**

For a function address comparison where an executable references a function defined in a shared object, the link editor will place the address of the PLT entry for that function in the function’s dynamic symbol table entry’s `st_value` field (see Section 4.6.1).
When the dynamic linker loads shared objects associated with an executable and resolves any outstanding relocations into absolute addresses it will search the dynamic symbol table of the executable for each symbol that needs to be resolved.

If it finds the symbol and the `st_value` of the symbol table entry is nonzero it shall use the address indicated in the `st_value` entry as the symbol’s address. If the dynamic linker does not find the symbol in the executable’s dynamic symbol table, or the entry’s `st_value` member is zero the dynamic linker may consider the symbol as undefined in the executable file.

### 5.2.5. Procedure Linkage Table

When the link editor builds an executable file or shared object file it doesn’t know the absolute address of undefined function calls; therefore, it can’t generate code to directly transfer execution to another shared object or executable. For each execution transfer to an undefined function call in the file image the link editor places a relocation against an entry in the *Procedure Linkage Table* (PLT) of the executable or shared object that corresponds to that function call.

Additionally, for all nonstatic functions with standard (nonhidden) visibility in a shared object the link editor will invoke the function through the PLT, even if the shared object defines the function. The same is not true for executables.

The link editor knows the number of functions invoked via the PLT and it reserves space for an appropriately sized `.plt` section.

A unique PLT shall be constructed for the executable and each dependent shared object in the Data segment of the process image at object load time by the dynamic linker using the information about the `.plt` section stored in the file image. The individual PLT entries are populated by the dynamic linker using one of the following binding methods. Execution can then be redirected to a dependent shared object or executable.

**Lazy Binding**

The lazy binding method is the default. It delays the resolution of a PLT entry to an absolute address until the function call is made the first time. The benefit of this method is that the application doesn’t pay the resolution cost until the first time it needs to call the function, if at all.

**Immediate Binding**

The immediate binding method will resolve the absolute addresses of all PLT entries in the executable and dependent shared objects at load time, prior to passing execution control to the application. The environment variable `LD_BIND_NOW` may be set to a nonnull value to signal the dynamic linker that immediate binding is desired at load time, before control is given to the application.

For some performance sensitive situations it may be better to pay the resolution cost to populate the PLT entries upfront rather than during execution.
5.2.5.1. BSS Procedure Linkage Table

Under the BSS-PLT ABI, PLT entries hold executable stubs which transfer program control from the executable or shared object to the requested function once the absolute address of the function has been calculated by the dynamic linker.

The PLT is created in the .plt section of the Data segment at load time by the dynamic linker. It is composed of the following parts:

- The first 18 words (72 bytes) are reserved for the dynamic linker. This space may be used for trampoline code that transfers execution to the runtime resolver in order to resolve PLT relocations into absolute addresses.
- For PLT entries 1 through 8192 the link editor reserves two words.
- For PLT entries 8193 through \( n \) the link editor reserves four words.
- The link editor reserves an additional word for each entry in the PLT following the actual entries.

Figure 5-4 shows a possible rule conforming example implementation of a .plt section after an executable or shared object is loaded but before outstanding PLT entry relocations are resolved. This example uses a trampoline to branch to the dynamic linker’s runtime resolver for resolving outstanding PLT entries. This example is for demonstration purposes only since the exact method is not mandated by the ABI.

**Figure 5-4. Example BSS-PLT .plt Section Implementation**

```
.plt
  # Use when the plt entry target address exceeds +/- 32MB.
  # Convert the index into the .plt_datawords array held in
  # r11 into an actual address.
  .plt_farcall:  addis  r11,r11,.plt_datawords@ha
                lwz    r11,.plt_datawords@l(r11)
                mtctr  r11
                bctr
                nop
                nop

  # Subtract .plt_datawords for long entries
  .plt_longbranch: addis  r11,r11,-.plt_datawords@ha
                   addi  r11,r11,-.plt_datawords@l

  # Multiply index of the entry in r11 by 3
  .plt_trampoline: rlwinm r12,r11,1,0,30
                  # Add it to the index in r11 which will then hold the
                  # relocation offset of the corresponding entry in the
                  # relocation table.
                  add   r11,r12,r11
                  # Load the address of dl_runtime_resolve into r12
                  li    r12,dl_runtime_resolve@l
                  addis r12,r12,dl_runtime_resolve@ha
                  mtctr r12
                  # Get the address of the dynamic linker’s link map in
                  # order to later locate the symbol table for the object
                  li    r12,link_map@l
                  addis r12,r12,link_map@ha
```
Chapter 5. Program Loading and Dynamic Linking

# Pass execution to the runtime resolver code.
bctr
nop
nop

# Each entry in .plt_n loads the index of the
# entry into the PLT entry list into r11
# .plt_1
li  r11,4×0
b   .plt_trampoline
...
# .plt_i
li  r11,4×i
b   .plt_trampoline
...
# Entries 8193 - n use every other slot due to
# the extra instructions required for branching.
# .plt_8193
lis  r11,8193×4+.plt_datawords@ha
lwzu r12,8193×4+.plt_datawords@l(r11)
b   .plt_longbranch
bctr
...
# .plt_n
lis  r11,n×4+.plt_datawords@ha
lwzu r12,n×4+.plt_datawords@l(r11)
b   .plt_longbranch
bctr

# .plt_datawords1
.plt_datawords:  nop
...
# .plt_datawordsi
nop
...
# .plt_datawords8193
nop
...
# .plt_datawordsn
nop

The address of relocation entries 1 through 8192 are close enough to the address of the runtime resolver trampoline .plt_trampoline to use a relative branch. Relocations 8193 through n must use additional instructions to reach the trampoline code. As a result PLT entries 8193 through n consist of four words rather than two. These entries branch to .plt_longbranch which cascades into the trampoline code.

Note: there are exactly 18 instructions between .plt and the first PLT entry indicated by .plt_1. These 18 instructions (including nop instructions) correspond with the space reserved at the head of the plt section for the dynamic linker trampolines. Following the .plt_n entry there are exactly n word entries in .plt_datawords.
Note: In the case where the address of the runtime resolver is too far away from the .plt_trampoline to use a relative branch the trampoline code may need to perform additional instructions to pass control to the resolver. This is not shown in the Figure 5-4.

When the instructions in a PLT entry are executed for the first time they pass execution to the dynamic linker’s runtime resolver code. The resolver will attempt to find the absolute virtual address of the function associated with the PLT slot and populate the entry with the address.

The DT_JMPREL entry of the _DYNAMIC array to holds the address of the relocation table of the shared object or executable. Since PLT entries don’t have symbol names attached to them the dynamic linker must find the symbol name. There is a one to one correspondence between PLT entries and relocation entries and the dynamic linker uses an offset into the relocation table (held in r11 in Figure 5-4), corresponding to the PLT entry, to find the relocation entry.

The relocation table contains R_PPC_JMP_SLOT relocations. Each of these relocations contain an offset to the corresponding PLT entry from the start of the shared object or executable followed by the index into the dynamic symbol table for the symbol. The dynamic linker uses this symbol table entry to look up the name of the symbol in a dependent shared library or executable.

After the dynamic linker has resolved the absolute address of the function corresponding to a PLT entry subsequent execution of the PLT entry will result in control passing directly to the target function either directly or indirectly through the .plt_farcall trampoline.

Figure 5-5 shows an example of how the PLT entries for functions name1 (corresponding to PLT slot 1), name2 (corresponding to PLT slot 2), name8193 (corresponding to PLT slot 8193), and name8194 have been resolved by the dynamic linker after executing the runtime resolver (where [stale] is a comment which indicates that these memory locations in the .plt retained their content after the resolver has run but are unreachable for execution).

Figure 5-5. Example BSS-PLT Entries Post Resolution

```assembly
....
# .plt_1
b <absolute address of name1>
b .plt_trampoline # [stale]
# .plt_2
li r11,4×2
b .plt_farcall
...
# .plt_8193
b <absolute address of name8193>
lwzu r12,8193×8+.plt_datawords@l(r11) # [stale]
b .plt_longbranch # [stale]
bctr # [stale]
...
# .plt_8194
li r11,4×8194
b .plt_farcall
b .plt_longbranch # [stale]
bctr # [stale]
# .plt_datawords1
.plt_datawords: nop
...
# .plt_datawords2
```
The following list explains the resolution of four different PLT entry examples shown in Figure 5-5:

**name1**

The address of function *name1* is within ±32 MB of the address of the .plt_1 PLT entry such that a relative branch to absolute virtual address of *name1* is possible.

**name2**

The address of function *name2* is beyond ±32 MB of the address of the .plt_2 PLT entry; therefore, a relative branch to .plt_2 is impossible so a relative branch to the .plt_farcall trampoline is made which loads the absolute virtual address of *name2* from .plt_datawords2 where it was placed by the dynamic linker into the count register. The bctr instruction is executed to pass control to *name2*.

**name8193**

The address of function *name8193* within ±32 MB of the address of the .plt_8193 PLT entry; therefore a relative branch to the absolute virtual address of *name8193* is possible.

**name8194**

The address of function *name8194* is beyond ±32 MB of the address of the .plt_8194 PLT entry; therefore, a relative branch to .plt_8194 is impossible so a relative branch to the .plt_farcall trampoline is made which loads the absolute virtual address of *name8194* from .plt_datawords8194 where it was placed by the dynamic linker into the count register. The bctr instruction is executed to pass control to *name8194*.

---

### ATR-BSS-PLT

---

### ATR-SECURE-PLT

#### 5.2.5.2. Secure Procedure Linkage Table

Under the Secure-PLT ABI, PLT entries corresponding to function calls hold absolute addresses of those calls that are calculated by the dynamic linker. These PLT entries are nonexecutable and an executable fragment in the object .text section uses the absolute addresses in the PLT entries as the target for indirect function invocation.

*Procedure Linkage Table* (PLT) support under the Secure-PLT ABI is split into the following:

- The *plt section*, residing in the Data Segment, contains an array of function addresses.
• **Call stubs**, residing in the .text section, use index relative addressing to load an absolute address of a function from a specific .plt slot.

• The **.glink**, residing in the .text section, is a symbol resolver stub.

The **.glink** and **call stubs** are generated by the link editor and placed in the .text section. The **call stubs** need not be adjacent to one another or unique, and they can be scattered throughout the text segment so that they can be reached with a branch and link instruction. The .plt section shall be allocated by the dynamic linker in the Data Segment.

The details of the **call stub** and **.glink** implementation are left to the link editor except for how the symbol resolver stub interfaces with the dynamic linker for lazy PLT resolution. Upon initialization by the dynamic linker, every .plt slot holds the address of the symbol resolver stub that is located in the .glink.

The symbol resolver stub shall call the `dl_runtime_resolve()` function specified by `__GLOBAL_OFFSET_TABLE__[1]` with r11 set to the PLT relocation offset, and r12 set to the value of `__GLOBAL_OFFSET_TABLE__[2]`.

The PIC **call stub** sequence requires that the compiler ensure that the register used to hold the **__GLOBAL_OFFSET_TABLE__** pointer is set before any calls are made from the PLT. The current convention between the compiler and link editor is that r30 be used for this purpose. This is a change from the BSS-PLT ABI which only required GOT addressing to access static storage.

A possible implementation for PIC code follows, where \( n \) is the \( n \)th **call stub**.

If \((\text{plt} + (n - 1) \times 4 - \text{got})\) is less than 32 KB the following PIC call stub implementation may be used.

```assembly
lwz 11,(plt + (n - 1) × 4 - got)(30)
mtctr 11
bctr
```

Otherwise, the following PIC call stub implementation may be used for greater addressability.

```assembly
addis 11,30,(plt + (n - 1) × 4 - got)@ha
lwz 11,(plt + (n - 1) × 4 - got)@l(11)
mtctr 11
bctr
```

For a PIC **.glink** the following implementation may be used.

```assembly
# A table of branches, one for each plt entry.
# The idea is that the plt call stub loads ctr (and r11) with these
# addresses, so (r11 - res_0) gives the plt index × 4.
res_0:  b PLTresolve
res_1:  b PLTresolve
.
# Some number of entries towards the end can be nops
res_n_m3: nop
res_n_m2: nop
res_n_m1:

PLTresolve:
addis 11,11,(1f-res_0)@ha
mflr 0
bcl 20,31,1f
1:  addi 11,11,(1b-res_0)@l
mflr 12
```

```
Chapter 5. Program Loading and Dynamic Linking

```assembly
mtlr 0
sub 11,11,12  # r11 = index × 4
addis 12,12,(got+4-1b)@ha
lwz 0,(got+4-1b)@l(12)  # got[1] address of dl_runtime_resolve
lwz 12,(got+8-1b)@l(12)  # got[2] contains the map address
mtctr 0
add 0,11,11
add 11,0,11  # r11 = index × 12 = reloc offset.
bctr
```

For non-PIC code, r30 will not hold the GOT pointer; so the stubs must be different, as shown in the following implementation.

For a non-PIC call stub the following implementation may be used.

```assembly
lis 11,(plt+(i-1)×4)@ha
lwz 11,(plt+(i-1)×4)@l(11)
mtctr 11
bctr
```

For a non-PIC .glink the following implementation may be used.

```assembly
res_0: b PLTresolve
res_1: b PLTresolve
.
res_n_m3: nop
res_n_m2: nop
res_n_m1:

NonPIC_PLTresolve:
lis 12,got+4@ha
addis 11,11,-res_0@ha
lwz 0,got+4@l(12)
addi 11,11,-res_0@l
mtctr 0
add 0,11,11
lwz 12,got+8(12)
add 11,0,11
bctr
```

The .plt will be a loaded section following the .got, consisting of an array of addresses. There will also be an array of R_PPC_JMP_SLOT relocations in .rela.plt, with a one-one correspondence between elements of each array. Each R_PPC_JMP_SLOT reloc will have r_offset pointing at the .plt word it relocates. To support lazy linking, the link editor will set each .plt word to point to the symbol resolver stub in .glink. On loading a shared library, the dynamic linker should relocate the contents of the .plt section by adding the load address to each word in .plt.

Note: As a security measure, the .got and the .plt may be protected as read-only after relocations are performed. This necessitates that any sections in the Data Segment that can be protected as read-only be grouped together, separate from those that remain read-write. This will affect section ordering in the segment as shown in Figure 4-2.
Note: This ABI does not require a fixed GOT register, or even one register used throughout a binary. Non-PIC code does not set the _GLOBAL_OFFSET_TABLE_ pointer and does not need to reserve a register for that purpose. Code under the PIC addressing model that accesses static storage or calls nonlocal functions will need a register to hold the _GLOBAL_OFFSET_TABLE_ pointer. However, leaf functions or functions that only call other functions which are static (@local) may use any general-purpose register within the constraints for the existing ABI.

PIC-code functions that call nonlocal functions will need to allocate a register to hold the _GLOBAL_OFFSET_TABLE_ pointer which is used by the PLT call stubs. This requires a protocol between the compiler (which generates the function prologue and sets the _GLOBAL_OFFSET_TABLE_ pointer) and the link editor (which generates the PLT call stubs, that use the pointer). Allowing an arbitrary register for the _GLOBAL_OFFSET_TABLE_ pointer will require additional relocations to allow the compiler to communicate which register it is using to the link editor.

Some code, such as that generated by using the large model PIC, does not have a single GOT section but rather implements multiple GOT sections, one per file in .got2. To support multiple GOT pointers, the addend on each R_PPC_PLTREL24 reloc will have the offset within .got2 used as the GOT pointer. The link editor might need to generate multiple plt call stubs for a given destination.

To allow the dynamic linker to support both old and new shared libraries, a per library flag that indicates the old or new plt layout is required. The dynamic tag, DT_PPC_GOT, shall be set to the link-time address of _GLOBAL_OFFSET_TABLE_. This allows the dynamic linker to check at library load and PLT resolve time and perform the appropriate set-up and relocations.

Note: The Secure-PLT ABI enabled dynamic linker shall support BSS-PLT ABI libraries as long as the kernel allows the required memory protection states.

The link editor will detect the difference between BSS-PLT relocatable objects and Secure-PLT relocatable objects by looking at relocations. A relocatable object using the Secure-PLT ABI will always have R_PPC_REL16* relocations if it uses the GOT or (potentially) calls from the PLT. BSS-PLT ABI files will not have these R_PPC_REL16 relocations.

The link editor will accept a mix of Secure-PLT ABI and BSS-PLT ABI relocatable objects, but the existence of any BSS-PLT relocatable objects as input will force the resulting executable file or shared object file to use the BSS-PLT ABI.

**ATR-SECURE-PLT**

**ATR-TLS**
Chapter 6. Libraries

6.1. Library Requirements

This ABI doesn’t specify any additional interfaces for general-purpose libraries. However, certain processor specific support routines are defined in order to ensure portability between ABI conforming implementations.

Such processor specific support definitions concern floating-point alignment, register save/restore routines, variable argument list layout and a limited set of data definitions.

6.1.1. C Library Conformance with Generic ABI

6.1.1.1. Malloc Routine Return Pointer Alignment

The `malloc()` routine must always return a pointer with the alignment of the largest supported data type from the following list:

<table>
<thead>
<tr>
<th>Data Type</th>
<th>Alignment Requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td>ATR-LONG-DOUBLE-IBM</td>
<td>- At least 16-byte (quadword) aligned, as the required pointer may be used for storing IBM AIX 128-bit Long Double data items that require 16-byte alignment.</td>
</tr>
<tr>
<td>ATR-DFP</td>
<td>- At least 16-byte (quadword) aligned, as the required pointer may be used for storing <code>_Decimal128</code> data items that require 16-byte alignment.</td>
</tr>
</tbody>
</table>

6.1.1.2. Library Handling of Limited-access Bits in Registers

Requirements for the handling of limited-access bits in certain registers by standard library functions are defined in Section 3.2.1.2.

6.1.2. Save and Restore Routines

All of the save and restore routines described in Section 3.3.4 are required. These routines use unusual calling conventions due to their special purpose.
6.1.2.1. Save and Restore Routine Suffixes

The following suffix extensions describe the function templates in Section 6.1.2.2.

_m (save and restore function variable)

The variable _m represents the first register to be saved. That is, to save registers 18 to 31 using 32-bit saves, one would call save32gpr_18.

ATR-BSS-PLT

_g (save function qualifier)

GOT save functions are represented by the _g qualifier. These functions return to the caller of the save function by branching to the brl instruction held at _GLOBAL_OFFSET_TABLE_-4.

ATR-SECURE-PLT

_g (save function qualifier)

GOT save functions use the _g qualifier. These functions are illegal to use with the Secure-PLT ABI since the Secure-PLT is not executable.

_x (restore function qualifier)

Exit restore functions are represented by the _x qualifier. These functions restore the specified registers and use the link-register value in the calling function’s LR-save area to return to the caller’s parent function after removing the caller’s stack frame.

_t (restore function qualifier)

Tail restore functions are represented by the _t qualifier. Given the following function call sequence where function3 is a tail-call:

```c
function1()
{
    function2();
    <further calls and code>
    return;
}

function2()
{
    _rest*_t();
    return function3();
}
```

The tail restore functions are called from function2 and prepare the register state in function2 for a tail-call to function3 that is to return directly to function1. They restore the specified registers for function1 from function1’s stack frame and save the address of function1 from the LRSAVE word of function1’s stack frame into R0 before returning control to function2. Function2 then sets the LR to
the address of function1 held in R0 and calls the tail function function3. Function3 will perform it’s duty and then return directly to function1 rather than function2.

6.1.2.2. Save and Restore Routine Templates

- _savegpr_m
  
  ATR-CLASSIC-FLOAT

- _savefpr_m
  
  ATR-VECTOR

- _savevr_m
  
  ATR-CLASSIC-FLOAT

- _restfpr_m
  
  ATR-CLASSIC-FLOAT

- _restfpr_m_x
  
  ATR-CLASSIC-FLOAT

- _restfpr_m_t
  
  ATR-CLASSIC-FLOAT

- _restvr_m
  
  ATR-VECTOR

- _restgpr_m
- _restgpr_m_x
- _restgpr_m_t
Chapter 6. Libraries

ATR-SPE
• _save32gpr_m

ATR-SPE
• _save64gpr_m

ATR-SPE
• _rest32gpr_m

ATR-SPE
• _rest64gpr_m

ATR-SPE
• _rest32gpr_m_x

ATR-SPE
• _rest64gpr_m_x

ATR-SPE
• _rest32gpr_m_t
6.1.3. Types Defined In Standard Header

The type va_list shall be defined as follows:

```c
typedef struct __va_list_tag {
   unsigned char gpr;
   unsigned char fpr;
   /* Two bytes padding. */
   char *overflow_arg_area;
   char *reg_save_area;
} va_list[1];
```

The names and types of the elements are not part of the ABI, but the `__va_list_tag` name is part of the ABI (since it affects C++ name mangling), and the structure must have the size, alignment and layout implied by this definition.

- The `gpr` element holds the index of the next general-purpose register saved in this area from which an argument would be retrieved with `va_arg()`, where `gpr == N` corresponds to `rN + 3`. (If the argument is passed as DUAL_GP and `gpr` is odd, the next argument would be retrieved from `rN + 4` and `rN` & plus; 5 instead.) If `gpr` is greater than 7, no more arguments will be retrieved from general-purpose registers by `va_arg()`.

- The `fpr` element holds the index of the next floating-point register saved in this area from which an argument would be retrieved with `va_arg()`. `Fpr == N` corresponds to `fN + 1`. If `fpr` is greater than 7, no more arguments will be retrieved from floating-point registers by `va_arg()`.

- If the argument being passed is _Decimal128 and `fpr == N` where `N` is even then `fN + 2` and `fN + 3` are referred to instead. If `fpr` is greater than 6, no more arguments will be retrieved from floating-point registers by `va_arg()`.

- `reg_save_area` points to an 8-byte-aligned area where registers `r3` to `r10` are saved, in that order.
Addresses in the area pointed to by `reg_save_area` that correspond to registers used for passing named arguments, or to unused registers between those used for passing named arguments, need not correspond to allocated memory; those registers need not be saved in this area. `va_arg` shall only access those words required to load the argument of the type passed.

**ATR-SPE**

Only the low 32 bits of each register are saved in this area.

**ATR-CLASSIC-FLOAT**

Registers f1 to f8 immediately follow registers r3 to r10, if CR bit 6 was set when the variable-argument function was called.

- The `overflow_arg_area` element points to the word on the stack at the start of the next argument passed on the stack, or to a prior word that forms part of the padding required for the next argument to have the required alignment. `va_arg` shall only access those words required to load the argument of the type passed; if no arguments were passed on the stack, this area may not be allocated.

The following integer types are defined in headers required to be provided by freestanding implementations, or have their limits defined in such headers, and shall have the following definitions.

- `typedef int ptrdiff_t;`  
- `typedef unsigned int size_t;`  
- `typedef long wchar_t;`  
- `typedef int sig_atomic_t;`  
- `typedef unsigned int wint_t;`  
- `typedef signed char int8_t;`  
- `typedef short int16_t;`  
- `typedef int int32_t;`  
- `typedef long long int64_t;`  
- `typedef unsigned char uint8_t;`  
- `typedef unsigned short uint16_t;`  
- `typedef unsigned int uint32_t;`  
- `typedef unsigned long long uint64_t;`  
- `typedef signed char int_least8_t;`  
- `typedef short int_least16_t;`  
- `typedef int int_least32_t;`
• typedef long long int_least64_t;
• typedef unsigned char uint_least8_t;
• typedef unsigned short uint_least16_t;
• typedef unsigned int uint_least32_t;
• typedef unsigned long long uint_least64_t;
• typedef signed char int_fast8_t;
• typedef int int_fast16_t;
• typedef int int_fast32_t;
• typedef long long int_fast64_t;
• typedef unsigned char uint_fast8_t;
• typedef unsigned int uint_fast16_t;
• typedef unsigned int uint_fast32_t;
• typedef unsigned long long uint_fast64_t;
• typedef int intptr_t;
• typedef unsigned int uintptr_t;
• typedef long long intmax_t;
• typedef unsigned long long uintmax_t;
Appendix A. Taxonomy

The following list describes the archetypal ABI attributes used to conditionally define elements of the ABI. The relationship of these attributes is described in the taxonomy diagram in Figure A-1. A combination of these attributes is used to generate the individual Linux and Embedded ABI documents. These combinations are described in Appendix B. Each attribute description indicates whether it is an ABI software feature or an attribute that is tied to a specific Power ISA category.

32-bit PowerPC Archetypal ABI Attributes

ATR-BSS-PLT

(ABI Software Feature)

The text under this attribute defines the BSS Procedure Linkage Table ABI, which has a writable and executable PLT. ATR-BSS-PLT is mutually exclusive with ATR-SECURE-PLT.

ATR-CLASSIC-FLOAT

(Power ISA Category: Floating-Point)

The text under this attribute describes the classic Power Architecture floating-point ABI where there are 64-bit floating-point registers and an instruction set that accompanies them. ATR-CLASSIC-FLOAT is mutually exclusive with ATR-SOFT-FLOAT.

ATR-PASS-COMPLEX-IN-GPRS

(ABI Software Feature)

The text under this attribute describes a method for passing complex data types in GPRS. ATR-PASS-COMPLEX-IN-GPRS is mutually exclusive and incompatible with ATR-PASS-COMPLEX-AS-STRUCT. ATR-PASS-COMPLEX-IN-GPRS is predicated on ATR-CLASSIC-FLOAT or ATR-SOFT-FLOAT.

ATR-PASS-COMPLEX-AS-STRUCT

(ABI Software Feature)

The text under this attribute describes a method for passing complex data types as structures. ATR-PASS-COMPLEX-AS-STRUCT is mutually exclusive and incompatible with ATR-PASS-COMPLEX-IN-GPRS. ATR-PASS-COMPLEX-IN-GPRS is predicated on ATR-CLASSIC-FLOAT or ATR-SOFT-FLOAT.

ATR-CXX

(ABI Software Feature)

The text under this attribute describes C++ exception support as it impacts this ABI.

ATR-DFP

(Power ISA Category: Decimal Floating-Point)

The text under this attribute describes the Decimal Floating Point ABI as it relates to decimal floating-point registers, alignment, parameter passing, etc. This was introduced in Power ISA 2.05. ATR-DFP is predicated on ATR-CLASSIC-FLOAT or ATR-SOFT-FLOAT.
Appendix A. Taxonomy

ATR-EABI

(Power ISA Category: Embedded)
This attribute describes elements that apply to the Embedded ABI as a whole.

ATR-EABI-EXTENDED

(ABI Software Feature)
This attribute describes elements that apply an implementation of the Embedded ABI with extended conformance such as support for dynamic linking, the GOT, PLT, full relocation support, etc.

ATR-LINUX

(Power ISA Category: Server)
This attribute describes elements that apply to the Linux ABI as a whole.

ATR-LONG-Doubles-IBM

(ABI Software Feature)
The text under this attribute describes usage of the AIX 128-bit Long Double format.
ATR-LONG-Doubles-IBM is mutually exclusive with ATR-LONG-Doubles-IS-Doubles.
ATR-LONG-Doubles-IBM is predicated on ATR-CLASSIC-FLOAT or ATR-SOFT-FLOAT.

ATR-LONG-Doubles-IS-Doubles

(ABI Software Feature)
The text under this attribute describes long double ABI when long double is treated as double.
ATR-LONG-Doubles-IS-Doubles is mutually exclusive with ATR-LONG-Doubles-IBM.
ATR-LONG-Doubles-IS-Doubles is predicated on ATR-CLASSIC-FLOAT or ATR-SOFT-FLOAT.

ATR-SECURE-PLT

(ABI Software Feature)
The text under this attribute describes the Secure Procedure Linkage Table ABI, which has a readable and writable, but nonexecutable PLT. ATR-SECURE-PLT is mutually exclusive with ATR-BSS-PLT.

ATR-SOFT-FLOAT

(ABI Software Feature)
The text under this attribute describes a software emulated 64-bit (double) floating-point ABI which also describes the conventions for Embedded Floating Point in 64-bit GPRs such as SPE-Float.
ATR-SOFT-FLOAT is mutually exclusive with ATR-CLASSIC-FLOAT.

ATR-SPE

(Power ISA Category: SPE)
The text under this attribute describes the Signal Processing Engine ABI for the SPE facility that was introduced in Power ISA v2.03 It is a SIMD instruction set using two element short vectors within 64-bit GPRs. ATR-SPE is mutually exclusive with ATR-VECTOR. ATR-SPE includes SPE-Float which leverages ATR-SOFT-FLOAT. Therefore ATR-SPE is predicated on ATR-SOFT-FLOAT and mutually exclusive with ATR-CLASSIC-FLOAT.
Appendix A. Taxonomy

ATR-TLS

(ABI Software Feature)

The text under this attribute describes the Thread Local Storage ABI. At the time of this writing ATR-TLS is mutually exclusive with ATR-EABI since ATR-EABI uses the thread local storage register for the SDATA2 pointer.

ATR-VECTOR

(Power ISA Category: Vector)

The text under this attribute describes the AltiVec and VMX float and integer SIMD instruction set ABI. ATR-VECTOR is mutually exclusive with ATR-SPE. ATR-VECTOR is predicated on ATR-CLASSIC-FLOAT or ATR-SOFT-FLOAT.

ATR-VLE

(Power ISA Category: VLE)

The text under this attribute describes the Variable Length Encoding environment as introduced in Power ISA 2.03.

The following taxonomy (described in EBNF) describes the relationship between the aforementioned ABI attributes.

Figure A-1. Taxonomy

\[
\text{ABI} \rightarrow \text{CommonCore OperatingEnvironment ISA-Flavor} \\
\text{CommonCore} \rightarrow \text{SYS-V-Without-Float} \begin{cases} 
\text{*/ No attribute. Implicit. */} 
\end{cases} \\
\text{OperatingEnvironment} \rightarrow \text{Linux} \begin{cases} 
\text{atr = ATR-LINUX} \\
\text{EABI} \begin{cases} 
\text{atr = ATR-EABI} 
\end{cases} 
\end{cases} \\
\text{ISA-Flavour} \rightarrow \text{SIMD Encoding Floating-Point} \\
\text{SIMD} \rightarrow \text{Vector} \begin{cases} 
\text{atr = ATR-VECTOR} \\
\text{SPE} \begin{cases} 
\text{atr = ATR-SPE} 
\end{cases} \\
\text{*/ Epsilon */} \begin{cases} 
\text{com = }"/\star \text{ No SIMD. }*/" 
\end{cases} 
\end{cases} \\
\text{Encoding} \rightarrow \text{VLE} \begin{cases} 
\text{atr = ATR-VLE} 
\end{cases} \\
\text{Floating-Point} \rightarrow \text{Common-Float Long-Double FP-Decimal} \\
\text{Common-Float} \rightarrow \text{Classic-Float-Common} \begin{cases} 
\text{atr = ATR-CLASSIC-FLOAT} \\
\text{Soft-Float-Common} \begin{cases} 
\text{atr = ATR-SOFT-FLOAT} 
\end{cases} 
\end{cases} \\
\text{Procedure-Linkage-Table} \rightarrow \text{BSS-PLT} \begin{cases} 
\text{atr = ATR-BSS-PLT} \\
\text{Secure-PLT} \begin{cases} 
\text{atr = ATR-SECURE-PLT} 
\end{cases} 
\end{cases} \\
\text{Thread-Local-Storage} \rightarrow \text{TLS} \begin{cases} 
\text{atr = ATR-TLS} 
\end{cases} \\
\text{Long-Double} \rightarrow \text{IBM} \begin{cases} 
\text{atr = ATR-LONG-Doubles-IBM} \\
\text{None} \begin{cases} 
\text{atr = ATR-LONG-Doubles-IS-Doubles} 
\end{cases} 
\end{cases} \\
\text{FP-Decimal} \rightarrow \text{*/ Epsilon */} \begin{cases} 
\text{com = }"/\star \text{ No FP-Decimal }*/" 
\end{cases} 
\]
Appendix A. Taxonomy

<table>
<thead>
<tr>
<th>DFP</th>
<th>atr = ATR-DFP</th>
</tr>
</thead>
<tbody>
<tr>
<td>Complex</td>
<td>Pass Complex in GPRS atr = ATR-PASS-COMPLEX-IN-GPRS</td>
</tr>
<tr>
<td></td>
<td>Pass Complex As Struct atr = ATR-PASS-COMPLEX-AS-STRUCT</td>
</tr>
<tr>
<td>CXX</td>
<td>C++ Exception Handling atr = ATR-CXX</td>
</tr>
<tr>
<td>EABI-Extended</td>
<td>/* Epsilon <em>/ com = &quot;/</em> No EABI Extended */&quot;</td>
</tr>
<tr>
<td></td>
<td>EABI Extended Conformance atr = ATR-EABI-EXTENDED</td>
</tr>
</tbody>
</table>
Appendix B. Attribute Inclusion and ABI Conformance

This appendix describes ABI attribute inclusion and conformance rules. It uses the attribute tags described in Appendix A.

B.1. ATR-LINUX Inclusion and Conformance

Linux ABI Attribute Inclusions:
• ATR-BSS-PLT
• ATR-CLASSIC-FLOAT
• ATR-CXX
• ATR-DFP
• ATR-LONG-Doubles-IBM
• ATR-LONG-Doubles-IS-Doubles
• ATR-SECURE-PLT
• ATR-SOFT-FLOAT
• ATR-SPE
• ATR-TLS
• ATR-VECTOR
• ATR-PASS-COMPLEX-IN-GPRS

Linux ABI Attribute Exclusions:
• ATR-PASS-COMPLEX-AS-STRUCT
• ATR-VLE
• ATR-EABI-EXTENDED

Linux ABI Conformance
• An implementation of the Linux ABI must implement at least one of the following: ATR-SOFT-FLOAT ATR-CLASSIC-FLOAT
• If an implementation supports 64-bit vector types on SPE processors or uses the high parts of registers on such processors it must implement ATR-SPE.
• An implementation of the Linux ABI must implement ATR-LONG-Doubles-IBM and may also implement ATR-LONG-Doubles-IS-Doubles. A conforming application only uses one or the other.
• An implementation that supports decimal floating point must implement ATR-DFP. Hardware support for DFP requires implementation of ATR-CLASSIC-FLOAT otherwise ATR-SOFT-FLOAT can provide software emulation.
• An implementation must implement ATR-SECURE-PLT. ATR-BSS-PLT should be supported for
Appendix B. Attribute Inclusion and ABI Conformance

binary compatibility with previous versions of this ABI.

- Availability of Vector data types is subject to conformance to a Power ISA category where the categories Vector and Signal Processing Engine are mutually exclusive. A conforming application only uses ATR-VECTOR or ATR-SPE.

Note: An implementation of this ABI shall indicate explicitly which attributes are supported. Supporting attributes which are mutually exclusive is fine as long as only one is supported at a given time during application execution.

B.2. ATR-EABI Inclusion and Conformance

**EABI Attribute Inclusions**

- ATR-BSS-PLT
- ATR-CLASSIC-FLOAT
- ATR-EABI-EXTENDED
- ATR-PASS-COMPLEX-AS-STRUCT
- ATR-PASS-COMPLEX-IN-GPRS
- ATR-LONG-Doubles-DOUBLE
- ATR-SOFT-FLOAT
- ATR-SPE
- ATR-VLE

**EABI Attribute Exclusions**

- ATR-CXX
- ATR-DFP
- ATR-LONG-Doubles-IBM
- ATR-SECURE-PLT
- ATR-TLS
- ATR-VECTOR

**EABI Conformance**

- The EABI does not support thread local storage (ATR-TLS) at this time.
- The EABI does not support ATR-SECURE-PLT at this time.
- The EABI does not support unwind information.
- An implementation of the EABI ABI can implement ATR-PASS-COMPLEX-AS_STRUCT and/or implement ATR-PASS-COMPLEX-IN-GPRS but a conforming application shall only use one or the other.
• Conformance with the EABI does not require implementation of ATR-EABI-EXTENDED, which describes implementation of extended conformance such as support for dynamic linking, the GOT, PLT, full relocation support, etc.

Note: An implementation of this ABI shall indicate explicitly which attributes are supported. Supporting attributes which are mutually exclusive is fine as long as only one is supported at a given time during application execution.
Appendix C. APUs and Power ISA Categories

This appendix discusses the relationship between Auxiliary Processing Units (APUs) and Power ISA categories.

APUs are a method used to extend the Power Architecture beyond the facilities described and ratified in the Power ISA. Since the adoption of the Power ISA many technologies that were historically presented as APUs have now been subsumed into the Power ISA as optional categories or phased into the base ISA.

Since this ABI is not predicated on minimum Power ISA version it continues to present information on APUs (see Section 4.10) that have been subsumed into the Power ISA. It is up to the implementation whether to follow the Power ISA or the APU designation based upon compatibility requirements and to specify APU information as necessary.

The following table identifies APUs and their relationship to the Power ISA.

Table C-1. APU Extensions and Corresponding Power ISA Categories

<table>
<thead>
<tr>
<th>APU Extension</th>
<th>APU Identifier</th>
<th>Power ISA Category</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Altivec</td>
<td>0x003f</td>
<td>V</td>
<td>Vector Facility</td>
</tr>
<tr>
<td>PMR</td>
<td>0x0041</td>
<td>E.pm E</td>
<td>Embedded.Performance Monitor</td>
</tr>
<tr>
<td>RFMCI</td>
<td>0x0042</td>
<td>E</td>
<td>Embedded, Return From Machine Check Interrupt instruction</td>
</tr>
<tr>
<td>CACHE_LOCK</td>
<td>0x0043</td>
<td>ECL</td>
<td>Embedded Cache Locking</td>
</tr>
<tr>
<td>SPE</td>
<td>0x0100</td>
<td>SP, SP.FV</td>
<td>Signal Processing Engine, SPE.Embedded Float Vector</td>
</tr>
<tr>
<td>E500 SFFP/EFS</td>
<td>0x0101</td>
<td>SP.fs, SP.fd</td>
<td>Embedded Float Scalar Single, Embedded Float Scalar Double</td>
</tr>
<tr>
<td>VLE</td>
<td>0x0104</td>
<td>VLE</td>
<td>Variable Length Encoding</td>
</tr>
<tr>
<td>ISEL</td>
<td>0x0040</td>
<td>Base</td>
<td>Power ISA Base (mandatory), Integer Select instruction</td>
</tr>
</tbody>
</table>

The following APUs remain unspecified by the Power ISA (as of version 2.05).

Table C-2. APUs

<table>
<thead>
<tr>
<th>APU Extension</th>
<th>APU Identifier</th>
</tr>
</thead>
<tbody>
<tr>
<td>e500 BRLOCK</td>
<td>0x0102</td>
</tr>
</tbody>
</table>