TAC IR

Format

TAC uses virtual registers (%r0, %r1, %r2, …) with explicit assignments:

plaintext
%r0 = const_int  10
%r1 = const_int  20
%r2 = add        %r0, %r1
      arg        %r2, 0
%r3 = call       @print_i64, argc=1

Every instruction produces at most one result in a named register. Instructions without a result (like arg, store, jump) have dest = TAC_REG_NONE.

Instruction encoding

Each instruction is a fixed-size struct:

FieldTypePurpose
optac_opcodeOperation kind
destuint32_tDestination register (or TAC_REG_NONE)
atac_operandFirst operand (register or immediate)
btac_operandSecond operand (register or immediate)
extrauint32_tOpcode-specific data (arg count, else-block, field offset, etc.)
loctape_source_locSource location for debug info

Operands are either a register reference (TAC_OP_REG), an immediate value (TAC_OP_IMM), or absent (TAC_OP_NONE).

Basic blocks

Instructions are grouped into basic blocks. Each block has a linear array of instructions and ends with a terminator (jump, branch, or ret):

plaintext
fn example(x: i64) -> void:
  bb0:
    %r0 = load       local[0]
    %r1 = const_int  0
    %r2 = cmp_gt     %r0, %r1
          branch     %r2, bb1, bb2

  bb1:
    %r3 = const_str  "positive"
          arg        %r3, 0
          call       @println, argc=1
          jump       -> bb3

  bb2:
    %r4 = const_str  "non-positive"
          arg        %r4, 0
          call       @println, argc=1
          jump       -> bb3

  bb3:
          ret

Opcodes

Constants

OpcodeSemantics
const_intdest = imm (64-bit integer literal)
const_strdest = strings[imm] (index into module string table)

Arithmetic

OpcodeSemantics
adddest = a + b
subdest = a - b
muldest = a * b
divdest = a / b
moddest = a % b

Opcodes are type-agnostic — the register’s tape_rtype_kind determines whether the backend emits integer or float instructions.

Bitwise

OpcodeSemantics
shldest = a << b
shrdest = a >> b
anddest = a & b
ordest = a | b
xordest = a ^ b

Comparison

OpcodeSemantics
cmp_eqdest = (a == b)
cmp_nedest = (a != b)
cmp_ltdest = (a < b)
cmp_ledest = (a <= b)
cmp_gtdest = (a > b)
cmp_gedest = (a >= b)

Memory

OpcodeSemantics
loaddest = locals[imm] (load from local slot by index)
storelocals[imm] = b (store to local slot)
allocadest = stack_alloc(imm bytes)
local_addrdest = &locals[imm] (address of local slot)
load_fielddest = *(a + b) (load from base + byte offset; extra = access size)
store_field*(a + extra) = b (store to base + byte offset; extra = offset
load_ptrdest = *a (dereference pointer; extra = element size)
store_ptr*a = b (write through pointer; extra = element size)
heap_allocdest = heap_alloc(extra bytes) (zeroed allocation)

Control flow

OpcodeSemantics
jumpUnconditional branch to a.imm (block id)
branchIf a then b.imm else extra (conditional branch to two blocks)
calldest = fn[a.imm](...) (extra = argument count)
call_inddest = a(...) (indirect call through function pointer; extra = arg count)
fn_addrdest = &fn[a.imm] (take address of function)
argPush a as argument b.imm for next call
retReturn a (or void if a is none)

String operations

OpcodeSemantics
string_copydest = deep_copy(a)
string_dropfree(a) if dynamic (cap > 0)
string_concatdest = a + b (always allocates)
string_appenddest = append(a, b) (reallocs a’s buffer)
string_eqdest = (a == b) (byte comparison)
int_to_strdest = decimal_repr(a)

Type operations

OpcodeSemantics
castdest = cast(a) (extra = src_type

Functions

Each TAC function contains:

FieldPurpose
name / name_lenFunction name
param_countNumber of parameters (registers 0..n-1)
local_countNumber of local variable slots
next_regNext available virtual register
blocksArray of basic blocks
reg_typesType of each virtual register
local_typesType of each local slot
return_typeFunction return type
link_libDLL/library name for extern functions
symbol_nameExplicit linker symbol (for @export)
is_exportWhether this function appears in the export table

Parameters occupy registers %r0 through %r(param_count-1). Local variables are accessed by slot index via load/store.

Module

A tac_module contains:

  • functions[] — all compiled functions (own module + dependencies)
  • strings[] — deduplicated string literal table
  • root_fn_offset / root_fn_count — which functions belong to the root module (vs dependencies)

Register typing

Every virtual register and local slot has an associated tape_rtype_kind:

KindSizeDescription
void0No value
bool1Boolean
i8/u818-bit integer
i16/u16216-bit integer
i32/u32432-bit integer
i64/u64864-bit integer
f32432-bit float
f64864-bit float
ptr8Pointer
fn_ptr8Function pointer
string24String (ptr + len + cap)
slice16Slice (ptr + len)
structvariesStruct (passed by pointer in registers)
optional16Optional (tag + value)
fallible16Result type (tag + max(ok, err))
taggedvariesTagged union
color8Color value (packed RGBA)

Wide values

Types larger than 64 bits (structs, strings, slices) are stack-allocated via alloca. The register holds a pointer to the stack slot:

plaintext
%r0 = alloca      24          // allocate 24 bytes for Vec3
      store_field %r0, %r1    // off=0: x
      store_field %r0, %r2    // off=8: y
      store_field %r0, %r3    // off=16: z

Why not stack-based bytecode?

Tape 1.0 used stack-machine bytecode. Problems:

  • Reverse-engineering value flow for native codegen
  • Phantom operand bugs
  • Undebuggable calling conventions

TAC eliminates all of these by making data flow explicit. Both the VM interpreter and native backends read the same representation without needing a decompilation step.

Last modified: