composer-replication-framework / docs /DEEP_WORK_LOOP_LOG.md
Codeseys's picture
Wave 7: Phase 2-4 of deep work loop — backlog, parallel research, three ADRs
ac4bfb4

Deep Work Loop Log — Composer 2.5 Replication Framework

Started: 2026-05-26 Operator: Codeseys (Hermes Agent autonomous loop) Skill: deep-work-loop v1.0.0

Vision

Take any HuggingFace model → further RL train it using:

  1. RLVR (tests-pass reward),
  2. SDPO/hint-distillation (Composer 2.5's "targeted RL with textual feedback"),
  3. multi-teacher trace-replay DPO, integrated against TRL/VeRL/OpenEnv with DiLoCo-style outer loop sync.

Output: a published, reproducible framework — the "Composer 2.5 replication" the open ecosystem is missing.

Starting state

  • HEAD: 040eff8 (Wave 6: vision validation self-audit, 5/10 scorecard)
  • Tests: 38/38 green in spikes/005-integrated-trainer-skeleton/
  • Working tree: clean

Phase ledger

Phase Description Status Started Done
1 commit-state 2026-05-26 2026-05-26
2 backlog-audit (BACKLOG.md from VISION_VALIDATION) 2026-05-26 2026-05-26
3 parallel-research (3 subagents) 🟡 2026-05-26
4 architect with ADRs (ADR-001..003)
5 plan in waves (W7–W10)
6 execute W7 — Spike 006 (real HF model smoke)
7 execute W8 — Spike 007 (real trace ingestion)
8 execute W9 — Spike 008 (DiLoCo smoke)
9 execute W10 — packaging
10 (Modal-gated) Spike 002a-mini real GPU smoke
11 cross-model-final-review
12 update scorecard + push

Constraints

  • Verify ALL claims against primary sources (Wave 2 lesson — subagent synthesis is not evidence).
  • Tests must pass before commit.
  • Memory L1 is at 99% — write to L2 wiki + L3 fact_store, not L1.
  • Modal budget: $20 hard cap for this loop. Anything more goes to user for approval.
  • No upload_file mixing with git pushgit push hf master:main only.
  • Commit messages via -F /tmp/<wave>-commit-msg.txt.