Architecture Overview

Pipeline

plaintext
Source (.tape)


Region Splitter (#code, #view, #test markers)

    ├── #code region ──► Scanner ──► Parser (Pratt + recursive descent)
    │                                    │
    ├── #view region ──► Scanner ──► View Parser (component declarations)
    │                                    │
    │                    ┌───────────────┘
    │                    ▼
    │              Merged AST (tape_ast_module)
    │                    │
    │        ┌───────────┼───────────┐
    │        ▼           │           │
    │   Module Resolver  │    (single-module path)
    │   (if imports)     │           │
    │        │           │           │
    │        ▼           ▼           ▼
    │   Comptime Execution (@tape blocks, @emit)
    │                    │
    │                    ▼
    │              Type Checker
    │                    │
    │                    ▼
    │              TAC Lowering (AST → three-address code)
    │                    │
    │                    ▼
    │              Link Check (@link symbol verification)
    │                    │
    │        ┌───────────┴───────────┐
    │        ▼                       ▼
    │   VM Interpreter          Native Codegen
    │   (tape run)              (x86-64 / ARM64)
    │                                │
    │                                ▼
    │                           Binary Emitter
    │                           (PE64 / Mach-O / ELF64 / STO)
    │                                │
    │                                ▼
    └── #test region ──►        Output binary
        tape test path
        (separate pipeline)

Key design decisions

  1. Single IR — both VM and native backends consume the same TAC representation
  2. Per-region parsers#code uses the full parser, #view uses a component-only parser, #test uses a test parser; all produce compatible AST nodes
  3. No LLVM — direct machine code emission, keeping the compiler self-contained and fast
  4. Comptime before typecheck@tape blocks and @emit run first, generating AST nodes that the type checker then validates like hand-written code

Region splitting

The first pass splits the file at #region markers. If no markers exist, the entire file is #code. A preamble (imports and attributes before the first marker) is prepended to #code. The #view region is parsed separately and its components are merged into the module’s component list.

Module resolution

When a module has import statements, the compiler enters the multi-module path:

  1. Resolve each import to a file path (include paths + relative paths)
  2. Parse each dependency recursively
  3. Build a dependency order
  4. Process each module in order: inject extern stubs → comptime → typecheck
  5. Build a unified dispatch registry across all modules
  6. Lower each module with shared dispatch info
  7. Merge all TAC modules into one

Single-module files (no imports) take a simpler path: comptime → typecheck → lower directly.

Comptime execution

After parsing but before type checking, the compiler evaluates @tape blocks and @tape fn calls. This stage can emit new AST nodes (@emit struct, @emit fn, etc.) that become part of the module. The type checker sees these generated declarations identically to hand-written ones.

TAC IR

The intermediate representation is linear three-address code:

plaintext
%1 = load_local "x"
%2 = const_i64 1
%3 = add_i64 %1, %2
store_local "x", %3

Explicit data flow, no implicit stack, easy to interpret and compile.

Native backends

Two native code generators read the TAC:

BackendTargetOutput formats
x64_codegenx86-64PE64 (Windows), ELF64 (Linux, Uusi), STO (Uusi shared)
aarch64_codegenARM64Mach-O (macOS), Mach-O object

The target is selected with --target (win64, linux, macos, macos-arm64, uusi).

VM backend

tape run interprets the TAC directly via tape_vm_run(). The VM also powers tape test — each test function is invoked individually via tape_vm_run_fn(), with optional setup/teardown calls bracketing each test.

Before codegen, tape_link_check() verifies that all @link library references and their symbols can be resolved. This catches missing DLLs or undefined symbols at compile time rather than at runtime. Skippable with --skip-link-check.

Why not stack-based bytecode?

Tape 1.0 used stack-machine bytecode. This caused:

  • Reverse-engineering value flow for native codegen
  • Phantom operand bugs
  • Undebuggable calling conventions

TAC eliminates all of these by making data flow explicit.

Last modified: