Loom: A Scalable Computer Architecture for Looped Transformers
Paper | GitHub | Live Demos
Loom is a general-purpose computer implemented as a looped transformer with analytically derived weights. Programs are written in C, compiled to a 21-opcode ISA, and executed as iterated matrix multiplications through 8 fixed-weight transformer layers.
Model Description
Loom is not a trained model. Every weight is derived analytically from the ISA specification. The transformer implements a programmable computer: each forward pass executes one compiled instruction, and the model is applied in a loop until the program halts.
The architecture supports 21 opcodes (arithmetic, logic, shifts, comparisons, branches, indirect memory access, conditional moves, and multiply-accumulate) in 8 transformer layers, down from the 10 layers required for the single-instruction baseline of Giannou et al.
Key Design Choices
- Argmax attention replaces softmax for numerically exact execution over arbitrary step counts.
- Opcode-as-operand-routing maps all 21 operations to operand preparation for a shared subtract core, requiring only one arithmetic layer.
- 6-threshold direct subtraction computes a-b in one layer (replacing the classical 3-layer approach).
- STORE opcode enables indirect memory writes, reducing the Sudoku solver from 1,085 to 284 instructions.
Configurations
| Config | d_model | n | Memory | Instructions | ONNX Size |
|---|---|---|---|---|---|
| Compact | 146 | 512 | 160 slots | 320 slots | 8.3 MB |
| Standard | 155 | 1,024 | 64 slots | 928 slots | 14.8 MB |
| Large | 164 | 2,048 | 224 slots | 1,792 slots | 28.0 MB |
ONNX Models
The repository includes pre-exported ONNX models with argmax attention expressed as GPU-native operations (ReduceMax, comparison, division). No TopK or OneHot operators. Full WebGPU acceleration.
argmax_146x512.onnx(8.3 MB): compact configargmax_155x1024.onnx(14.8 MB): standard configargmax_164x2048.onnx(28.0 MB): large config
Input/Output
- Input:
statetensor of shape[d_model, n], dtype float32 - Output:
new_statetensor of shape[d_model, n], dtype float32
Each call executes one ISA instruction. Loop until PC (program counter) reaches 0.
Usage
Quick Start (Python)
from loom_v1 import LoomConfig, LoomComputer, init_state, read_memory, get_pc, OP_INC, OP_HALT
from subleq import signed_from_bipolar
import torch
cfg = LoomConfig(s=32, m=8, n=64, N=8)
comp = LoomComputer(cfg)
X = init_state(cfg, [5,0,0,0,0,0,0,0], [(OP_INC, cfg.s, 0), (OP_HALT, 0, 0)])
with torch.no_grad():
while get_pc(X, cfg) != 0:
X = comp.step(X)
print('mem[0] =', read_memory(X, cfg)[0]) # 6
Full C Compilation
from loom_v1 import LoomConfig, LoomComputer, init_state, read_memory, get_pc
from c_compiler import compile_c
import torch
source = """
int main() {
int a; int b; int t; int i;
a = 0; b = 1; i = 0;
while (i < 10) { t = a + b; a = b; b = t; i = i + 1; }
return a;
}
"""
cfg, mem, cmds, meta = compile_c(source, s=32, m=160, n=512, N=8)
comp = LoomComputer(cfg)
X = init_state(cfg, mem, cmds)
with torch.no_grad():
while get_pc(X, cfg) != 0:
X = comp.step(X)
from subleq import signed_from_bipolar
a = signed_from_bipolar(X[cfg.idx_memory:cfg.idx_memory+cfg.N, cfg.s + meta['variables']['a']])
print(f"fib(10) = {a}") # 55
Validation
- 42 opcode unit tests (all pass)
- 19 SWAP integration tests (all pass)
- 50 compiled C program tests including Fibonacci, GCD, sorting, LOAD/STORE round-trips (all pass)
- FPGA hardware verification on Xilinx Alveo U200 (INC test pass)
Browser Demos
Interactive demos run entirely client-side via ONNX Runtime Web:
- Sorting with real-time architecture visualization (3D layer activations)
- C debugger with source highlighting and variable watch
- Playable Snake game (84 transformer steps per tick)
- 9x9 Sudoku solver
- DOOM raycasting
Limitations
- 8-bit signed integers (-128 to 127) by default
- One instruction per forward pass (no pipelining)
- No multiplication/division in hardware (software emulation via MULACC)
- FIND requires unique values in the search array
- LOAD/STORE require valid in-range pointers
Citation
@misc{turkcan2026loomscalableanalyticalneural,
title={Loom: A Scalable Analytical Neural Computer Architecture},
author={Mehmet Kerem Turkcan},
year={2026},
eprint={2604.08816},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2604.08816},
}