Entity
The verified scaffold between you and any LLM. Entity is a CLI/TUI that sits
between you and a model you choose (via /model), applying structured edits and
verification before writing — with a dark-cyan interface and an animated ASCII
octopus mascot. It ships with Entity-Bench, the proof-of-concept benchmark.
Read
THESIS.md(the claim + the honest scorecard),ARCHITECTURE.md(design), andBLUEPRINT.md(the full build spec that generated this repo).
Install
pip install -e ".[full]" # from source, with TUI + live LLM calls
# system-wide (Ubuntu/Debian/WSL):
bash packaging/build_deb.sh && sudo dpkg -i dist/entity_0.1.0_all.deb
# Fedora/RHEL or any distro:
sudo bash packaging/install.sh
Core installs dependency-free; [full] adds rich, prompt_toolkit, httpx.
Use
entity # launch the TUI (octopus banner, entity› prompt)
entity --plain # no-TUI line mode (for pipes / dumb terminals)
entity --version
Inside the TUI: /model to connect an LLM, then just chat.
entity› /model
entity› /learner none # parametric learner is OPTIONAL (none|bitnet|lora|custom)
entity› /edit entity-ast
entity› /verifier z3
Full command list: docs/cli.md. The /model page: docs/model-config.md.
Benchmark
entity bench run --dataset mock --n 128 --out runs/a # -> PASS
Metrics & scorecard: docs/metrics.md and THESIS.md. Note: mock/synthetic
datasets are an offline illustration of the pipeline, not evidence — the real
evidence is the real study below.
Documentation, paper & empirical study
MANUAL.md— the complete user manual (install, every slash command, the/modelwizard, the verification gate, the benchmark, packaging, FAQ).paper/entity.pdf— the pre-print "Entity: A Verified Scaffold Between Language Models and Source Code" (compile frompaper/entity.texwithtectonic).experiments/— the reproducible empirical study. The headline evidence is a real study: a real model (Claude) implementing real library functions (benchmarks/real/), judged by a real differential oracle and a real Z3 gate, with output tokens counted by a real tokenizer (tiktokeno200k_base). The model's solutions are archived inbenchmarks/real/solutions.jsonl, so the verification half of the study reproduces deterministically with no model or network access. Also included: real Z3 proofs on a contract corpus, real dense-vs-lexical retrieval, a modelled sensitivity analysis (explicitly not evidence), and CodeCarbon energy accounting.
pip install -e ".[study]"
python -m experiments.run_all --out results --seed 0 # writes results/*.json
python -m experiments.figures --results results --out figures # layered SVG + PDF
Headline results — measured, with honest caveats
n = 25 real functions; effect sizes and bootstrap CIs, no p-value theater.
| Result | Value |
|---|---|
| Token economy, whole-entity edit (real tokenizer, real files) | median −53% vs search/replace, −97% vs whole-file rewrite (100% of tasks favour entity). Caveat: a small localized change is cheaper as a unified diff — the win is for whole-entity rewrites. |
| Pass@1 (real model, real differential oracle) | 1.00 under a light battery; 0.96 under a strengthened battery — the honest oracle caught 1/25 shallow (subtly-wrong) patch that light testing accepted |
| Invalid patches written by the gate | 0 (invalid_rate = 0.00) |
| Specification coverage (the hard question) | 60% of functions admit a checkable output post-condition; 0% admit a Z3 end-to-end body proof. The formal gate is sound where a spec exists (gate accuracy 1.00 on the contract corpus) but covers a narrow slice of real edits |
| Verified-memory retrieval (disjoint paraphrases) | dense p@1 = 0.50 vs lexical 0.35 vs chance 0.125 (modest, but beats the baselines) |
| Energy / carbon (CodeCarbon) | order ~10⁻⁵ kWh / ~10⁻⁶ kg CO₂eq; see results/summary.json |
What we removed and why. Earlier versions led with −85% token savings vs
diff, a composite −30% token reduction at p<10⁻¹⁶⁰, and a metric "algebra".
The first was an over-estimate; the second came from a Wilcoxon test on a
hardcoded constant (it measured sample size, not an effect); the third is a
weighted scorecard whose weights are author-chosen. All three are gone. See
THESIS.md and the paper's Limitations section.
License
Apache-2.0.