Architecture Overview
Pipeline
Source (.tape)
│
▼
Region Splitter (#code, #view, #test markers)
│
├── #code region ──► Scanner ──► Parser (Pratt + recursive descent)
│ │
├── #view region ──► Scanner ──► View Parser (component declarations)
│ │
│ ┌───────────────┘
│ ▼
│ Merged AST (tape_ast_module)
│ │
│ ┌───────────┼───────────┐
│ ▼ │ │
│ Module Resolver │ (single-module path)
│ (if imports) │ │
│ │ │ │
│ ▼ ▼ ▼
│ Comptime Execution (@tape blocks, @emit)
│ │
│ ▼
│ Type Checker
│ │
│ ▼
│ TAC Lowering (AST → three-address code)
│ │
│ ▼
│ Link Check (@link symbol verification)
│ │
│ ┌───────────┴───────────┐
│ ▼ ▼
│ VM Interpreter Native Codegen
│ (tape run) (x86-64 / ARM64)
│ │
│ ▼
│ Binary Emitter
│ (PE64 / Mach-O / ELF64 / STO)
│ │
│ ▼
└── #test region ──► Output binary
tape test path
(separate pipeline)Key design decisions
- Single IR — both VM and native backends consume the same TAC representation
- Per-region parsers —
#codeuses the full parser,#viewuses a component-only parser,#testuses a test parser; all produce compatible AST nodes - No LLVM — direct machine code emission, keeping the compiler self-contained and fast
- Comptime before typecheck —
@tapeblocks and@emitrun first, generating AST nodes that the type checker then validates like hand-written code
Region splitting
The first pass splits the file at #region markers. If no markers exist, the entire file is #code. A preamble (imports and attributes before the first marker) is prepended to #code. The #view region is parsed separately and its components are merged into the module’s component list.
Module resolution
When a module has import statements, the compiler enters the multi-module path:
- Resolve each import to a file path (include paths + relative paths)
- Parse each dependency recursively
- Build a dependency order
- Process each module in order: inject extern stubs → comptime → typecheck
- Build a unified dispatch registry across all modules
- Lower each module with shared dispatch info
- Merge all TAC modules into one
Single-module files (no imports) take a simpler path: comptime → typecheck → lower directly.
Comptime execution
After parsing but before type checking, the compiler evaluates @tape blocks and @tape fn calls. This stage can emit new AST nodes (@emit struct, @emit fn, etc.) that become part of the module. The type checker sees these generated declarations identically to hand-written ones.
TAC IR
The intermediate representation is linear three-address code:
%1 = load_local "x"
%2 = const_i64 1
%3 = add_i64 %1, %2
store_local "x", %3Explicit data flow, no implicit stack, easy to interpret and compile.
Native backends
Two native code generators read the TAC:
| Backend | Target | Output formats |
|---|---|---|
x64_codegen | x86-64 | PE64 (Windows), ELF64 (Linux, Uusi), STO (Uusi shared) |
aarch64_codegen | ARM64 | Mach-O (macOS), Mach-O object |
The target is selected with --target (win64, linux, macos, macos-arm64, uusi).
VM backend
tape run interprets the TAC directly via tape_vm_run(). The VM also powers tape test — each test function is invoked individually via tape_vm_run_fn(), with optional setup/teardown calls bracketing each test.
Link checking
Before codegen, tape_link_check() verifies that all @link library references and their symbols can be resolved. This catches missing DLLs or undefined symbols at compile time rather than at runtime. Skippable with --skip-link-check.
Why not stack-based bytecode?
Tape 1.0 used stack-machine bytecode. This caused:
- Reverse-engineering value flow for native codegen
- Phantom operand bugs
- Undebuggable calling conventions
TAC eliminates all of these by making data flow explicit.
Last modified: