TAC IR
Format
TAC uses virtual registers (%r0, %r1, %r2, …) with explicit assignments:
%r0 = const_int 10
%r1 = const_int 20
%r2 = add %r0, %r1
arg %r2, 0
%r3 = call @print_i64, argc=1Every instruction produces at most one result in a named register. Instructions without a result (like arg, store, jump) have dest = TAC_REG_NONE.
Instruction encoding
Each instruction is a fixed-size struct:
| Field | Type | Purpose |
|---|---|---|
op | tac_opcode | Operation kind |
dest | uint32_t | Destination register (or TAC_REG_NONE) |
a | tac_operand | First operand (register or immediate) |
b | tac_operand | Second operand (register or immediate) |
extra | uint32_t | Opcode-specific data (arg count, else-block, field offset, etc.) |
loc | tape_source_loc | Source location for debug info |
Operands are either a register reference (TAC_OP_REG), an immediate value (TAC_OP_IMM), or absent (TAC_OP_NONE).
Basic blocks
Instructions are grouped into basic blocks. Each block has a linear array of instructions and ends with a terminator (jump, branch, or ret):
fn example(x: i64) -> void:
bb0:
%r0 = load local[0]
%r1 = const_int 0
%r2 = cmp_gt %r0, %r1
branch %r2, bb1, bb2
bb1:
%r3 = const_str "positive"
arg %r3, 0
call @println, argc=1
jump -> bb3
bb2:
%r4 = const_str "non-positive"
arg %r4, 0
call @println, argc=1
jump -> bb3
bb3:
retOpcodes
Constants
| Opcode | Semantics |
|---|---|
const_int | dest = imm (64-bit integer literal) |
const_str | dest = strings[imm] (index into module string table) |
Arithmetic
| Opcode | Semantics |
|---|---|
add | dest = a + b |
sub | dest = a - b |
mul | dest = a * b |
div | dest = a / b |
mod | dest = a % b |
Opcodes are type-agnostic — the register’s tape_rtype_kind determines whether the backend emits integer or float instructions.
Bitwise
| Opcode | Semantics |
|---|---|
shl | dest = a << b |
shr | dest = a >> b |
and | dest = a & b |
or | dest = a | b |
xor | dest = a ^ b |
Comparison
| Opcode | Semantics |
|---|---|
cmp_eq | dest = (a == b) |
cmp_ne | dest = (a != b) |
cmp_lt | dest = (a < b) |
cmp_le | dest = (a <= b) |
cmp_gt | dest = (a > b) |
cmp_ge | dest = (a >= b) |
Memory
| Opcode | Semantics |
|---|---|
load | dest = locals[imm] (load from local slot by index) |
store | locals[imm] = b (store to local slot) |
alloca | dest = stack_alloc(imm bytes) |
local_addr | dest = &locals[imm] (address of local slot) |
load_field | dest = *(a + b) (load from base + byte offset; extra = access size) |
store_field | *(a + extra) = b (store to base + byte offset; extra = offset |
load_ptr | dest = *a (dereference pointer; extra = element size) |
store_ptr | *a = b (write through pointer; extra = element size) |
heap_alloc | dest = heap_alloc(extra bytes) (zeroed allocation) |
Control flow
| Opcode | Semantics |
|---|---|
jump | Unconditional branch to a.imm (block id) |
branch | If a then b.imm else extra (conditional branch to two blocks) |
call | dest = fn[a.imm](...) (extra = argument count) |
call_ind | dest = a(...) (indirect call through function pointer; extra = arg count) |
fn_addr | dest = &fn[a.imm] (take address of function) |
arg | Push a as argument b.imm for next call |
ret | Return a (or void if a is none) |
String operations
| Opcode | Semantics |
|---|---|
string_copy | dest = deep_copy(a) |
string_drop | free(a) if dynamic (cap > 0) |
string_concat | dest = a + b (always allocates) |
string_append | dest = append(a, b) (reallocs a’s buffer) |
string_eq | dest = (a == b) (byte comparison) |
int_to_str | dest = decimal_repr(a) |
Type operations
| Opcode | Semantics |
|---|---|
cast | dest = cast(a) (extra = src_type |
Functions
Each TAC function contains:
| Field | Purpose |
|---|---|
name / name_len | Function name |
param_count | Number of parameters (registers 0..n-1) |
local_count | Number of local variable slots |
next_reg | Next available virtual register |
blocks | Array of basic blocks |
reg_types | Type of each virtual register |
local_types | Type of each local slot |
return_type | Function return type |
link_lib | DLL/library name for extern functions |
symbol_name | Explicit linker symbol (for @export) |
is_export | Whether this function appears in the export table |
Parameters occupy registers %r0 through %r(param_count-1). Local variables are accessed by slot index via load/store.
Module
A tac_module contains:
functions[]— all compiled functions (own module + dependencies)strings[]— deduplicated string literal tableroot_fn_offset/root_fn_count— which functions belong to the root module (vs dependencies)
Register typing
Every virtual register and local slot has an associated tape_rtype_kind:
| Kind | Size | Description |
|---|---|---|
void | 0 | No value |
bool | 1 | Boolean |
i8/u8 | 1 | 8-bit integer |
i16/u16 | 2 | 16-bit integer |
i32/u32 | 4 | 32-bit integer |
i64/u64 | 8 | 64-bit integer |
f32 | 4 | 32-bit float |
f64 | 8 | 64-bit float |
ptr | 8 | Pointer |
fn_ptr | 8 | Function pointer |
string | 24 | String (ptr + len + cap) |
slice | 16 | Slice (ptr + len) |
struct | varies | Struct (passed by pointer in registers) |
optional | 16 | Optional (tag + value) |
fallible | 16 | Result type (tag + max(ok, err)) |
tagged | varies | Tagged union |
color | 8 | Color value (packed RGBA) |
Wide values
Types larger than 64 bits (structs, strings, slices) are stack-allocated via alloca. The register holds a pointer to the stack slot:
%r0 = alloca 24 // allocate 24 bytes for Vec3
store_field %r0, %r1 // off=0: x
store_field %r0, %r2 // off=8: y
store_field %r0, %r3 // off=16: zWhy not stack-based bytecode?
Tape 1.0 used stack-machine bytecode. Problems:
- Reverse-engineering value flow for native codegen
- Phantom operand bugs
- Undebuggable calling conventions
TAC eliminates all of these by making data flow explicit. Both the VM interpreter and native backends read the same representation without needing a decompilation step.
Last modified: