Native Backend

Overview

The native backend compiles TAC IR directly to machine code bytes. No LLVM, no intermediate assembly — tape emits raw instructions into a code buffer that the binary emitters wrap in PE/ELF/Mach-O format.

Two code generators exist:

BackendArchitectureTargets
x64_codegenx86-64win64, linux, uusi
aarch64_codegenARM64macos-arm64

Both produce an x64_module struct (shared output format) that the binary emitters consume.

Register strategy

Both backends use “spill everything” — every TAC virtual register lives at a fixed stack slot. Before each operation, operands are loaded into scratch registers; after, the result is stored back. This is correct-first, fast-later:

plaintext
; TAC: %r5 = add %r3, %r4
mov rax, [rbp - 8*4]    ; load %r3
mov rcx, [rbp - 8*5]    ; load %r4
add rax, rcx
mov [rbp - 8*6], rax    ; store %r5

Stack slot for TAC register N: [RBP - 8*(N+1)] (x64) or [x29 - 8*(N+1)] (ARM64).

x86-64 scratch registers

RegisterPurpose
RAXPrimary scratch, return value, left operand
RCXRight operand, shift count, arg passing (Win64)
RDXDivision, arg passing
R8, R9Arg passing
R11Extra scratch

ARM64 scratch registers

RegisterPurpose
x0Primary scratch, return value, left operand
x1Right operand, arg passing
x2Extra scratch, arg passing
x9Address calculations
x10Memcpy loops
x16Indirect calls (intra-procedure-call scratch)

Calling convention

The compiler uses the platform ABI for all calls — both tape-to-tape and tape-to-extern go through the same convention:

Win64

  • Args 1–4: RCX, RDX, R8, R9 (positional — float args use XMM0–3 in same position)
  • Args 5+: stack (right-to-left)
  • 32-byte shadow space reserved by caller
  • Return: RAX (or XMM0 for floats)

SysV (Linux, macOS x64)

  • Integer args: RDI, RSI, RDX, RCX, R8, R9 (separate counter)
  • Float args: XMM0–7 (separate counter)
  • Args overflow: stack
  • Return: RAX (or XMM0 for floats)

AAPCS64 (macOS ARM64)

  • Args: x0–x7 (integer), d0–d7 (float)
  • Return: x0 (or d0 for floats)
  • Frame pointer: x29, Link register: x30
  • SP 16-byte aligned at call sites
PlannedA faster internal calling convention (no shadow space, fixed register mapping) is designed for tape-to-tape calls. This would require ABI trampolines at extern boundaries.

Fixups

During code emission, forward references (branch targets, function calls, string addresses) emit placeholder bytes and record a fixup entry:

Fixup kindPurpose
FIXUP_REL32Direct relative call to internal function
FIXUP_BLOCKIntra-function branch to a basic block
FIXUP_EXTERN_RIPIndirect call through IAT/GOT (extern functions)
FIXUP_STRING_RIPRIP-relative load of string constant address

For ARM64, the same fixup kinds map to different encodings: BL (26-bit immediate), B/B.cond (26/19-bit), ADRP+ADD+BLR sequences.

The binary emitters resolve all fixups once final code layout is known.

Instruction encoding

x86-64

Direct byte emission: REX prefixes, ModR/M, SIB, displacement, immediates. A small built-in assembler provides helpers like emit_mov_reg_imm64, emit_load_slot, emit_store_slot.

ARM64

Fixed 32-bit instruction words. Helpers for MOVZ/MOVK (64-bit immediates), LDR/STR (scaled offset), ADD/SUB, B/BL/B.cond, STP/LDP.

Output formats

PE64 (Windows) — pe64_emit / pe64_emit_dll

  • Image base: 0x140000000
  • .text: entry stub + user code
  • .rdata: import directory, IAT, hint/name table, string constants
  • Entry stub: calls main(), then ExitProcess via IAT
  • DLL mode: adds export directory table (sorted by name)

ELF64 Uusi — elf64_emit

  • Virtual base: 0x100000000000
  • Single LOAD segment (flat binary, no dynamic linker)
  • Entry stub: calls main, passes result to int 0x80 exit syscall
  • Static linking only

ELF64 Uusi STO — elf64_emit_sto

  • Shared object (.sto file, ET_DYN)
  • Exports vtable: array of function pointers (one per pub fn)
  • R_X86_64_RELATIVE relocations for the DSO loader to fixup at load time

ELF64 Linux — elf64_emit_linux

  • Virtual base: 0x400000
  • Dynamically linked (ET_EXEC with PT_INTERP → /lib64/ld-linux-x86-64.so.2)
  • GOT/PLT for extern functions (R_X86_64_JUMP_SLOT)
  • Entry stub: _start → calls main, then exit syscall (nr 231)

ELF64 Object — elf64_emit_obj

  • Relocatable object file (.o) for external linkers

Mach-O (macOS) — macho64_emit / macho64_emit_obj

  • Text vmaddr: 0x100000000
  • Page size: 16KB (0x4000)
  • Segments: __TEXT (code + string constants), __LINKEDIT
  • Dyld chained fixups for extern function binding
  • Object mode: relocatable .o for external linking

Codegen output structure

c
x64_module {
    code           // all function code concatenated (raw bytes)
    functions[]    // per-function: name, code_offset, code_size, frame_size, is_export
    externs[]      // extern functions: name, dll/library, fn_index
    strings[]      // string constants: data pointer + length
    fixups[]       // unresolved references for emitters to patch
    target         // which platform this was generated for
}
Planned — DWARF (ELF/Mach-O) and CodeView/PDB (PE) emission for source-level debugging. Currently no debug information is generated.

Last modified: