┌───────────────────────┐
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
│                       │
└───────────────────────┘
The Cross-ISAs Notebook
~ CuB3y0nd
# ARM

## Data processing

### Data Moving

Similar to amd64, the `mov` instruction can be used. However, literal values mus
t be prefixed with the `#` symbol!

```asm showLineNumbers=false
mov x1, #0x1337
```

aarch64 registers are 64 bits in size, but the `mov` instruction only works with
 16 bit immediate values.

In order to move larger literal values, the `mov` and `movk` instructions are ne
eded.

`movk` loads a value into the destination register with a _specific bitshift_, r
etaining all other bytes.

```asm showLineNumbers=false
mov x1, #0xbeef
movk x1, #0xdead, lsl #16
```

Results in `x1` containing the value `0xdeadbeef`.

### Load / Store

Memory addresses cannot be directly accessed in aarch64. Only registers can be o
perated on.

Values must be loaded from memory to a register with `ldr` and written back to m
emory via `str`.

For example, to increment a value located at memory address `0x1337`, the follow
ing instructions would be needed:

```asm showLineNumbers=false
mov x1, #0x1337
ldr x0, [x1]
add x0, x0, #1
str x0, [x1]
```

Locations memory addresses can also be offset from. Example:

```asm showLineNumbers=false
mov x1, #0x4000
ldr x0, [x1, #8]
```

Would load 8 bytes stored at `0x4008` into `x0`.

Consecutive memory addresses can be loaded and stored in a single instruction as
 a pair:

```asm showLineNumbers=false
ldp x0, x1, [sp]
stp x0, x1, [sp]
```

Above is equivalent to the following instructions:

```asm showLineNumbers=false
ldr x0, [sp]
ldr x1, [sp, #8]
str x0, [sp]
str x1, [sp, #8]
```

### Stack

aarch64 does not have the `push/pop` instructions to work with the stack, instea
d, you must use `ldr` and `str` to retrieve values from the stack.

Fortunately, both `ldr` and `str` have the ability to increment the address pass
ed in pre/post access.

This feature can be used to perform the same action:

Popping the stack would be of the form:

```asm showLineNumbers=false
ldr x1, [sp], #16
```

This loads the value located at the stack pointer into register `x1` and then ad
ds 16 to the stack.

Pushing to the stack would be of the form:

```asm showLineNumbers=false
str x1, [sp, #-16]!
```

This subtracts 16 from the stack pointer and then stores the value in `x1` at `s
p`.

:::note
In aarch64, the stack pointer must be 16 byte aligned! Accessing the stack point
er when it is not properly aligned will result in a fault!

There is different syntax for accessing memory at an offset, _pre-indexing_, and
 _post-indexing_. All of these forms are used extensively in aarch64.
:::

## Arithmetic Instructions

Arithmetic instructions take three arguments:

```asm showLineNumbers=false
add x0, x1, x2
```

This is equivalent to `x0 = x1 + x2`.

```asm showLineNumbers=false
madd x0, x0, x1, x2
```

This is equivalent to `x0 = x2 + (x0 * x1)`.

Modulo in aarch64 cannot be done in a single instruction.

`r = a % b` is equivalent to:

```plaintext showLineNumbers=false
q = a / b
r = a - q * b
```

For example, calculate `x0 = x0 % x1`:

```asm showLineNumbers=false
sdiv x2, x0, x1
msub x0, x2, x1, x0
```

## Branch Instructions

Loops can be created using conditional branch instructions.

The branch instruction in aarch64 is `b`.

To conditionally branch a dot suffix (ex: `.gt`) is appended resulting in `b.gt`
. This would be equivalent to `jg` in amd64.

## Functions

Function calls in aarch64 are done with the branch and link instruction `bl`.

The functions return value is stored in register `x0`.

The `bl` instruction:

- does a _PC relative_ jump the specified location
- and stores the return address in the link register `lr` (aka `x30`)

It is the caller's responsibility to store the existing `lr` value frame pointer
 and any needed values in `x0` - `x15`.

Registers `x16` - `x18` will be discussed later.

Registers `x19` - `x28` are callee saved.

The saved return address is stored in a special link register `lr` (aka `x30`).

The saved frame pointer is stored in a special frame register `fp` (aka `x29`).

Given the role of `x29` and `x30`, it is common to see a function prologue simil
ar to:

```asm showLineNumbers=false
stp x29, x30, [sp, #-48]!
mov x29, sp
```

Here, the stack pointer is decremented to create a _function frame_ and `lr` and
 `fp` are stored on the stack. The last instruction shown sets the frame pointer
.

Note that the stack pointer and frame pointer are equal in this case. Local vari
ables are stored **ABOVE** the frame pointer. The stack pointer may decrement fu
rther when passing arguments via the stack or for dynamic stack allocations (`al
loca`).

Similarly, a function epilogue consists of:

```asm showLineNumbers=false
ldp x29, x30, [sp], #48
ret
```

Which restores `lr`, `fp` and the stack before returning.

### Example 1

Function form `calc_avg(ptr, count)`

where:

- `ptr` is the start of the array
- `count` is the number of 64 bit numbers in the array

```asm showLineNumbers=false
mov x0, ptr
mov x1, 64
bl calc_avg

calc_avg:
    stp x29, x30, [sp, #-0x10]!
    mov x29, sp
    mov x2, xzr // sum = 0
    mov x3, x1  // original_count = count
loop:
    cbz x1, done
    ldr x4, [x0], #0x8
    add x2, x2, x4
    subs x1, x1, #0x1
    b.ne loop
done:
    sdiv x0, x2, x3
    mov sp, x29
    ldp x29, x30, [sp], #0x10
    ret
```

### Example 2

Function form `fib(pos)`

where:

- `pos` is position in the fibonacci sequence

```asm showLineNumbers=false
// fib(0) = 0
// fib(1) = 1
// fib(n) = fib(n-1) + fib(n-2)

fib:
    stp x29, x30, [sp, #-0x10]!
    mov x29, sp
    cbz x0, .ret0       // if pos == 0, return 0
    cmp x0, #0x1
    b.eq .ret1          // if pos == 1, return 1

    mov x1, #0x0        // prev = 0x0
    mov x2, #0x1        // curr = 0x1
    mov x3, x0          // counter = pos
.loop:
    add x4, x1, x2      // next = prev + curr
    mov x1, x2          // prev = curr
    mov x2, x4          // curr = next
    subs x3, x3, #0x1
    cmp x3, #0x1
    b.gt .loop

    mov x0, x2          // return curr
    mov sp, x29
    ldp x29, x30, [sp], #0x10
    ret
.ret0:
    mov x0, xzr
    mov sp, x29
    ldp x29, x30, [sp], #0x10
    ret
.ret1:
    mov x0, #0x1
    mov sp, x29
    ldp x29, x30, [sp], #0x10
    ret
```

## References

- [Learn the Architecture Guides](https://www.arm.com/architecture/learn-the-arc
hitecture/)
- [A64 Instruction Set Architecture Guide](https://developer.arm.com/documentati
on/102374/latest/)

# MIPS

TODO