This document describes the Procedure Call Standard used by the Application Binary Interface (ABI) of the LoongArch Architecture.
Version | Description |
---|---|
20230519 |
initial version, derived from the original LoongArch ELF psABI document. |
20231103 |
revised the parameter passing rules of structures. |
20231219 |
added vector arguments passing rules to the base ABI. |
This document defines the constraints on the program contexts exchanged between the caller and called subroutines or a subroutine and the execution environment. The subroutines following these constraints can be compiled and assembled separately and work together in the same program. The terms "subroutine", "function" and "procedure" may be used interchangeably throughout this document.
That includes constraints on:
-
The initial program context created by the caller for the callee.
-
The program context when the callee finishes its execution and returns to the caller.
-
How subroutine arguments and return values should be encoded in these program contexts.
-
How certain global states may be accessed and preserved by all subroutines.
However, this document does not formally define how entities of standard programming languages other than ISO C should be represented in the machine’s program context, and these language bindings should be described separately if needed.
FP
Floating-point.
GPR
General-purpose register.
FPR
Floating-point register.
FPU
Floating-point unit, containing the floating-point registers.
GAR
General-purpose argument register, belonging to a fixed subset of GPRs.
FAR
Floating-point argument register, belonging to a fixed subset of FPRs.
GRLEN
The bit-width of a general-purpose register of the current ABI variant.
FRLEN
The bit-width of a floating-point register of the current ABI variant
All LoongArch machines have 32 general-purpose registers and optionally 32 floating-point registers. Some of these registers may be used for passing arguments and return values between the caller and callee subroutines.
The bit-width of both general-purpose registers and floating-point registers may be either 32- or 64-bit, depending on whether the machine implements the LA32 or LA64 instruction set, and whether or not do they have a single- or double-precision FPU.
Note
|
In the following text, we use the term "temporary register" for referring to caller-saved registers and "static registers" for callee-saved registers. |
Name | Alias | Meaning | Preserved across calls |
---|---|---|---|
|
|
Constant zero |
(Constant) |
|
|
Return address |
No |
|
|
Thread pointer |
(Non-allocatable) |
|
|
Stack pointer |
Yes |
|
|
Argument registers / return value registers |
No |
|
|
Argument registers |
No |
|
|
Temporary registers |
No |
|
Reserved |
(Non-allocatable) |
|
|
|
Frame pointer / Static register |
Yes |
|
|
Static registers |
Yes |
Name | Alias | Meaning | Preserved across calls |
---|---|---|---|
|
|
Argument registers / return value registers |
No |
|
|
Argument registers |
No |
|
|
Temporary registers |
No |
|
|
Static registers |
Yes |
The memory is byte-addressable for LoongArch machines, and the ordering of the bytes in machine-supported multi-byte data types is little-endian. That is, the least significant byte of a data object is at the lowest byte address the data object occupies in memory.
The least significant byte of a 32-bit GPR / 64-bit GPR / 32-bit FPR / 64-bit GPR
is defined as storing the lowest byte of the data loaded from the memory with one
ld.w
/ ld.d
/ fld.s
/ fld.d
instruction. This byte order is also respected
by other instructions that move typed data across registers such as movgr2fr.d
.
In this document, when referring to a data object (of any type) stored in a register, it is assumed that these objects begin with the least significant byte of the register with no lower-byte paddings.
Depending on the bit-width of the general-purpose registers and the floating-point registers, different ABI variants can be adopted to preserve arguments and return values in the registers as long as it is possible.
Name | Description |
---|---|
|
Uses 64-bit GARs and the stack for passing arguments and return values. Data model is LP64 for programming languages. |
|
Uses 64-bit GARs, 32-bit FARs and the stack for passing arguments and return values. Data model is LP64 for programming languages. |
|
Uses 64-bit GARs, 64-bit FARs and the stack for passing arguments and return values. Data model is LP64 for programming languages. |
|
Uses 32-bit GARs and the stack for passing arguments and return values. Data model is ILP32 for programming languages. |
|
Uses 32-bit GARs, 32-bit FARs and the stack for passing arguments and return values. Data model is ILP32 for programming languages. |
|
Uses 32-bit GARs, 64-bit FARs and the stack for passing arguments and return values. Data model is ILP32 for programming languages. |
Different ABI variants are not expected to be compatible and linking objects in these variants may result in linker errors or run-time failures.
This specification defines machine data types that represents ISO C’s scalar, aggregate (structure and array) and union data types, as well as their layout within the program context when passed as arguments and return values of procedures.
Class | Machine type | Size (bytes) | Natural alignment (bytes) | Note |
---|---|---|---|---|
Integral |
Unsigned byte |
1 |
1 |
Character |
Signed byte |
1 |
1 |
||
Unsigned half-word |
2 |
2 |
||
Signed half-word |
2 |
2 |
||
Unsigned word |
4 |
4 |
||
Signed word |
4 |
4 |
||
Unsigned double-word |
8 |
8 |
||
Signed double-word |
8 |
8 |
||
Pointer |
32-bit data pointer |
4 |
4 |
|
64-bit data pointer |
8 |
8 |
||
Floating Point |
Single precision (fp32) |
4 |
4 |
IEEE 754-2008 |
Double precision (fp64) |
8 |
8 |
||
Quad-precision (fp128) |
16 |
16 |
Note
|
In the following text, the term "integral object" or "integral type" also covers the pointers. |
When passed in registers as subroutine arguments or return values, the unsigned integral objects are zero-extended, and the signed integer data types are sign-extended if the containing register is larger in size.
One exception to the above rule is that in the LP64D ABI, unsigned words,
such as those representing unsigned int
in C,
are stored in general-purpose registers as proper sign extensions of
their 32-bit values.
The following conventional rules are respected:
-
Structures, arrays and unions assume the alignment of their most strictly aligned components (i.e. with the largest natural alignment).
-
The size of any object is always a multiple of its alignment. Tail paddings are applied to structures and unions if it is necessary to comply with this rule. The state of the padding bytes are not defined.
-
Each member within a structure or an array is consecutively assigned to the lowest available offset with the appropriate alignment, in the order of their definitions.
Structures and unions may be passed in registers as arguments or return values. The layout rules of their members within the registers are described in the following section.
Structures and unions may include bit-fields, which are integral values of a declared integral type with a specified bit-width. The specified bit-width of a bit-field may not be greater than the width of its declared type.
A bit-field must be contained in a block of memory that is appropriate to store its declared type, but it can share the same addressable byte with adjacent bitfields in the structure.
When determining the alignment of the structure or the union, only the member bitfields' declared integral types are considered, and their specified widths are irrelevant.
It is possible to define unnamed bit-fields in C. The declared type of these bit-fields do not affect the alignment of a structure or union.
A subroutine as described in this specification may have none or arbitrary number of arguments and one return value. Each argument or return value have exactly one of the machine data types.
The standard calling requirements apply only to functions exported to link-editors and dynamic loaders. Local functions that are not reachable from other compilation units may use other calling conventions.
Empty structure / union arguments and return values should be simply ignored by C compilers which support them as a non-standard extension.
The rationale of the LoongArch procedure calling convention is to pass arguments and return values in registers as long as it is possible, so that memory access and/or cache usage can be reduced to improve program performance.
The registers that can be used for passing arguments and returning values are the argument registers, which include:
-
GARs: 8 general-purpose registers
$a0
-$a7
, where$a0
and$a1
are also used for integral values. -
FARs: 8 floating-point registers
$fa0
-$fa7
, where$fa0
and$fa1
are also used for returning values.
An argument is passed using the stack only when no appropriate argument register is available.
Subroutines should ensure that the initial values of the general-purpose registers
$s0
- $s9
and floating-point registers $fs0
- $fs7
are preserved across
the call.
At the entry of a procedure call, the return address of the call site is stored
in $ra
. A branch jump to this address should be the last instruction executed
in the called procedure.
Each called subroutine in a program may have a stack frame on the run-time stack. A stack frame is a contiguous block of memory with the following layout:
Position | Content | Frame |
---|---|---|
incoming |
(optional padding) |
Previous |
… |
Current |
|
outgoing |
(optional padding) |
The stack frame is allocated by subtracting a positive value from the stack
pointer $sp
. Upon procedure entry, the stack pointer is required to be
divisible by 16, ensuring a 16-byte alignment of the frame.
The first argument object passed on the stack (which may be the argument itself or its on-stack portion) is located at offset 0 of the incoming stack pointer; the following argument objects are stored at the lowest subsequent addresses that meet their respective alignment requirements.
Procedures must not assume the persistence of on-stack data of which the addresses lie below the stack pointer.
When determining the layout of argument data, the arguments should be assigned to their locations in the program context sequentially, in the order they appear in the argument list.
The location of an argument passed by value may be either one of:
-
An argument register.
-
A pair of argument registers with adjacent numbers.
-
A GAR and an FAR.
-
A contiguous block of memory in the stack arguments region, with a constant offset from the caller’s outgoing
$sp
. -
A combination of 1. and 4.
The on-stack part of the structure and scalar arguments are aligned to the greater of the type alignment and GRLEN bits, except when this alignment is larger than the 16-byte stack alignment. In this case, the part of the argument should be 16-byte-aligned.
In a procedure call, GARs / FARs are generally only used for passing non-floating-point / floating-point argument data, respectively. However, the floating-point member of a structure or union argument, or a vector/floating-point argument wider than FRLEN may be passed in a GAR, specifically:
-
A quadruple-precision floating-point argument may be passed or returned in a pair of GARs if the GARs are 64-bit wide, otherwise it would be passed or returned entirely on the stack.
-
An 128-bit vector may be passed in a pair of GARs with adjacent numbers or the combination of a single GAR and a block of memory on the stack if the GARs are 64-bit wide, otherwise it will be passed or returned entirely on the stack.
Note
|
Currently, the following detailed description of parameter passing rules
is only guaranteed to cover the lp64d and lp64s variant, that is, GRLEN is
64 and FRLEN is 64 or 0 .
|
Note
|
In the following text, warg is used for denoting the size of the argument object in bits. And unless otherwise specified, "passed on the stack" implies "passed by value". |
There are two cases:
-
0 < warg ≤ GRLEN
-
The argument is passed in a single argument register, or on the stack if none is available.
-
An fp32 / fp64 argument is passed in an FAR if there is one available. Otherwise, it is passed in a GAR, or on the stack if none of the GARs are available. When passed in registers or on the stack, fp32 / fp64 arguments narrower than GRLEN bits are widened to GRLEN bits, with the upper bits undefined.
-
An integral argument is passed in a GAR if there is one available. Otherwise, it is passed on the stack. If the argument is narrower than the containing GAR, the general rules of integral extensions applies.
-
-
GRLEN < warg ≤ 2 × GRLEN
-
The argument is passed in a pair of GARs with adjacent numbers, with the lower-ordered GRLEN bits in the low-numbered register. If only one GAR is available, the lower-ordered GRLEN bits are passed in this register and the higher-ordered GRLEN bits are passed on the stack. If no GAR is available, the whole argument is passed on the stack.
-
Upon function calls and returns, a structure argument’s storage location is mainly determined by its size and the number of floating-point and/or integer members it contains. A structure argument can be passed in up to two registers if available.
The storage layout of a structure containing other structures or arrays is the same as its flattened counterpart, where the member structures are replaced by its individual members and member arrays of length n (n > 0) are broken down into n consecutive members of its element type.
Note
|
Empty structures or unions are zero-sized in C while they have the size of 1 byte in C++. |
For example, struct { struct { double d[1]; } a[2]; }
and
struct { double d0; double d1; }
should have the same storage layout when
passed as function parameters.
Structures without floating-point members
-
warg > 2 × GRLEN
-
The argument is passed by reference, i.e. replaced in the argument list with its memory address. If there is an available GAR, the address is passed in the GAR, otherwise it is passed on the stack.
-
-
GRLEN < warg ≤ 2 × GRLEN
-
The argument is passed in a pair of GARs with adjacent numbers, with the lower-ordered GRLEN bits in the low-numbered register. If only one GAR is available, the lower-ordered GRLEN bits are passed in this register and the higher-ordered GRLEN bits are passed on the stack. If no GAR is available, the whole argument is passed on the stack.
-
-
0 < warg ≤ GRLEN
-
The argument is passed in a GAR if there is one available with the members laid out as if they were stored in memory. Otherwise, it is passed on the stack.
-
-
warg = 0
-
Zero-sized structure arguments are ignored.
-
Structures with floating-point members
-
If the structure consists of one floating-pointer member within FRLEN bits wide, it is passed in an FAR if available.
-
If the structure consists of two floating-point members both within FRLEN bits wide, it is passed in two FARs if available.
-
If the structure consists of one integer member within GRLEN bits wide and one floating-point member within FRLEN bits wide, it is passed in a GAR and an FAR if available.
-
Additionally, if there are only zero-sized members including structures, arrays or bit-fields, or empty structure in C++, beside the structure members described in the above three rules, these additional members are simply ignored by the compiler, unless the considered additional member is a structure and has a nontrivial destructor or a copy constructor defined in C++.
Note
|
In the above case, non-zero-length arrays of empty structures in C++ are not ignored by the compiler as additional members in the considered structure. |
-
Otherwise, the structure is passed according to the same rule as structures without floating-point members which is described above.
Unions are only passed in the GARs or on the stack, depending on its size.
-
warg > 2 × GRLEN
-
The union is passed by reference and is replaced in the argument list with its memory address. If there is an available GAR, the reference is passed in the GAR, otherwise, the address is passed on the stack.
-
-
GRLEN < warg ≤ 2 × GRLEN
-
The argument is passed in a pair of available GARs, with the low-order bits in the lower-numbered GAR and the high-order bits in the higher-numbered GAR. If only one GAR is available, the low-order bits are in the GAR and the high-order bits are on the stack. The arguments are passed on the stack when no GAR is available.
-
-
0 < warg ≤ GRLEN
-
The argument is passed in a GAR, or on the stack if no GAR is available.
-
-
warg = 0
-
Zero-sized union arguments are ignored.
-
A complex floating-point number, or a structure containing just one complex fp32 / fp64 number, is passed as though it were a structure containing two fp32 / fp64 members.
-
128-bit vector argument
-
An 128-bit vector argument are passed with two GARs with adjacent numbers (if available), with the lower-ordered 64-bit passed in the lower-numbered GAR and the higher-ordered 64-bit passed in the higher-numbered GAR.
-
If only one GAR is available when allocating storage for this argument, the lower-ordered 64-bit goes into the GAR and the higher-ordered 64-bit are passed on the stack.
-
If no GAR is available, the vector argument is passed entirely on the stack.
-
-
256-bit vector argument
-
256-bit vector arguments are passed on the stack, either by reference if there is a GAR available for its address, or by value otherwise.
-
Vector members of structure arguments follow the same rules as above.
A variadic argument list can appear at the end of a procedure’s argument list, which contains argument objects whose number and types are not statically declared with the procedure itself.
A variadic argument’s location is also decided using its bit-width. If one of the variadic arguments is passed on the stack, all subsequent arguments should also be passed on the stack. The variadic arguments never occupy the FARs.
-
warg > 2 × GRLEN
-
The arguments are passed by reference and are replaced in the argument list with the address. If there is an available GAR, the reference is passed in the GAR, and passed on the stack if no GAR is available.
-
-
GRLEN < warg ≤ 2 × GRLEN
-
An argument object in the variadic argument list with 2 × GRLEN alignment and size (e.g. an fp128 object) is passed in a pair of adjacent available GARs of which the first register is even-numbered. If only one GAR is available, the argument is passed on the stack, and this GAR would not be used for passing subsequent argument objects.
-
For other types of argument objects, the variadic arguments are passed in a pair of GARs. If only one GAR is available, the low-order bits are in the GAR and the high-order bits are on the stack.
-
If no GAR is available, the argument object is passed on the stack by value.
-
-
0 < warg ≤ GRLEN
-
The variadic arguments are passed in a GAR, or on the stack by value if no GAR is available.
-
In general, $a0
and $a1
are used for returning non-floating-point values,
while $fa0
and $fa1
are used for returning floating-point values.
Values are returned in the same manner the first named argument
of the same type would be passed. If such an argument would have
been passed by reference, the caller should allocate memory for the
return value, and passes the address as an implicit first argument
that is stored in $a0
.
Note
|
For all base ABI types of LoongArch, the char data type in C is
signed by default.
|
lp64d
lp64f
lp64s
)
Scalar type | Machine type |
---|---|
|
Unsigned byte |
|
Unsigned / signed byte |
|
Unsigned / signed half-word |
|
Unsigned / signed word |
|
Unsigned / signed double-word |
|
Unsigned / signed double-word |
pointer types |
64-bit data pointer |
|
Single precision (IEEE754) |
|
Double precision (IEEE754) |
|
Quadruple precision (IEEE754) |
ilp32d
ilp32f
ilp32s
)
Scalar type | Machine type |
---|---|
|
Unsigned byte |
|
Unsigned / signed byte |
|
Unsigned / signed half-word |
|
Unsigned / signed word |
|
Unsigned / signed word |
|
Unsigned / signed double-word |
pointer types |
32-bit data pointer |
|
Single precision (IEEE754) |
|
Double precision (IEEE754) |
|
Quadruple precision (IEEE754) |