Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: agpl-3.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- tiny-lm
|
| 6 |
+
- goldfish
|
| 7 |
+
- transformer
|
| 8 |
+
- rope
|
| 9 |
+
- swiglu
|
| 10 |
+
pipeline_tag: text-generation
|
| 11 |
+
base_model: []
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# GlubLM (36M)
|
| 15 |
+
|
| 16 |
+
> *the language model that already forgot this sentence*
|
| 17 |
+
|
| 18 |
+
**GlubLM** is a 36-million-parameter transformer that plays the character of a goldfish with a 10-second memory. Inspired by [GuppyLM](https://github.com/arman-bd/guppylm) by Arman BD and Ted Lasso's meditation on the goldfish as "the happiest animal on earth", GlubLM has a hard 96-token context window - it *physically* cannot remember what was just said.
|
| 19 |
+
|
| 20 |
+
Try it live: [browser demo](https://den-sec.github.io/glublm/) | [pixel-art desk pet](https://den-sec.github.io/glublm/desk-pet/)
|
| 21 |
+
|
| 22 |
+
## Architecture
|
| 23 |
+
|
| 24 |
+
- **Parameters**: 36,055,680 (36.1M)
|
| 25 |
+
- **Layers**: 8 decoder-only transformer blocks
|
| 26 |
+
- **Hidden dim**: 640
|
| 27 |
+
- **Attention heads**: 10 (head dim 64)
|
| 28 |
+
- **FFN dim**: 1280 (SwiGLU, effective intermediate 2560)
|
| 29 |
+
- **Normalization**: RMSNorm
|
| 30 |
+
- **Position encoding**: Rotary (RoPE)
|
| 31 |
+
- **Vocabulary**: 5,120 Byte-Level BPE
|
| 32 |
+
- **Max context**: 96 tokens (hard cap, the "10-second memory")
|
| 33 |
+
- **Weight-tied LM head**
|
| 34 |
+
- **No bias terms**
|
| 35 |
+
|
| 36 |
+
## Intended use
|
| 37 |
+
|
| 38 |
+
This model is a toy. It exists to:
|
| 39 |
+
1. Explore the design tension between "small + simple" (GuppyLM's thesis) and "small + modern" (GlubLM's hypothesis)
|
| 40 |
+
2. Demonstrate an LLM-generated dataset pipeline using a multi-agent Claude team
|
| 41 |
+
3. Be a fun browser demo and a pixel-art desk pet companion
|
| 42 |
+
|
| 43 |
+
**Do not use GlubLM for anything serious.** It literally forgets within a sentence.
|
| 44 |
+
|
| 45 |
+
## Training data
|
| 46 |
+
|
| 47 |
+
Trained on [`DenSec02/glublm-60k-ted`](https://huggingface.co/datasets/DenSec02/glublm-60k-ted), a 60,549-sample dataset of single-turn goldfish conversations generated by a team of four coordinated Claude agents (generator, critic, diversifier, persona-guardian). Composition: v4 balanced mix (20K poetic + 15K supplement + 5K conversational + 15K forgetful) augmented with v5.1 empathic/introspective hotfix (1K samples) + v5.2 multi-anchor self-awareness recovery (500 samples).
|
| 48 |
+
|
| 49 |
+
**Explicit exclusions**: no references to football, soccer, coaches, teams, or any Ted Lasso show characters.
|
| 50 |
+
|
| 51 |
+
## Training
|
| 52 |
+
|
| 53 |
+
- **Hardware**: NVIDIA RTX 3060 12GB (local)
|
| 54 |
+
- **Framework**: PyTorch 2.x, BF16 mixed precision
|
| 55 |
+
- **Optimizer**: AdamW (b1=0.9, b2=0.95), weight decay 0.1
|
| 56 |
+
- **LR schedule**: cosine with 5% warmup, peak 3e-4
|
| 57 |
+
- **Batch size**: 64
|
| 58 |
+
- **Epochs**: 15
|
| 59 |
+
- **Dropout**: 0.1 (residual), 0.0 (attention)
|
| 60 |
+
- **Gradient clipping**: 1.0
|
| 61 |
+
- **Final loss**: 1.1442
|
| 62 |
+
- **Wall time**: ~15 minutes
|
| 63 |
+
|
| 64 |
+
## Evaluation (v2 cross-model judge)
|
| 65 |
+
|
| 66 |
+
Dual-judge evaluation using Claude Sonnet 4.6 and Opus 4.7 on a 30-prompt rubric across 4 axes (integer 1-5 scale). Each axis aggregates 30 prompts x 3 seeds x 2 passes = 180 scoring rows per judge.
|
| 67 |
+
|
| 68 |
+
### Per-axis score (mean)
|
| 69 |
+
|
| 70 |
+
| Axis | Sonnet 4.6 | Opus 4.7 |
|
| 71 |
+
|---|---:|---:|
|
| 72 |
+
| Conversational Quality | 4.01 | 4.15 |
|
| 73 |
+
| Goldfish Identity | 3.89 | 3.67 |
|
| 74 |
+
| Forgetful Trait | 3.80 | 3.81 |
|
| 75 |
+
| Length Appropriateness | 4.77 | 4.57 |
|
| 76 |
+
|
| 77 |
+
### Cross-judge agreement (Cohen's quadratic-weighted kappa)
|
| 78 |
+
|
| 79 |
+
| Axis | Kappa | Interpretation |
|
| 80 |
+
|---|---:|---|
|
| 81 |
+
| Conversational Quality | 0.77 | substantial |
|
| 82 |
+
| Goldfish Identity | 0.83 | almost perfect |
|
| 83 |
+
| Forgetful Trait | 0.86 | almost perfect |
|
| 84 |
+
| Length Appropriateness | 0.59 | moderate |
|
| 85 |
+
|
| 86 |
+
**Interpretation**: Sonnet and Opus agree almost perfectly on 3/4 axes, validating that the rubric is interpretable consistently across LLM judges. Opus tends to be systematically ~0.2 stricter than Sonnet on the Identity axis (stricter rubric application, not judge bias).
|
| 87 |
+
|
| 88 |
+
Full methodology + 108-row long-format scores: [`eval/report_crossmodel.md`](https://github.com/Den-Sec/glublm/blob/master/eval/report_crossmodel.md).
|
| 89 |
+
|
| 90 |
+
## Limitations & biases
|
| 91 |
+
|
| 92 |
+
- **Hard context limit**: 96 tokens. Inputs longer than a few short sentences will be truncated.
|
| 93 |
+
- **Goldfish worldview**: the model genuinely does not understand human abstractions outside the bowl.
|
| 94 |
+
- **Dataset bias**: the dataset was generated by Claude (Anthropic), so it inherits Claude's language patterns filtered through the goldfish persona.
|
| 95 |
+
- **Single-turn only**: multi-turn memory is a non-goal.
|
| 96 |
+
- **English only**.
|
| 97 |
+
- **Stochastic and occasionally incoherent**: 36M params on 60K samples is small. Do not expect reliability.
|
| 98 |
+
|
| 99 |
+
## How to use
|
| 100 |
+
|
| 101 |
+
```python
|
| 102 |
+
from glublm.config import ModelConfig
|
| 103 |
+
from glublm.model import GlubLM
|
| 104 |
+
from glublm.tokenizer import GlubTokenizer
|
| 105 |
+
from glublm.inference import generate
|
| 106 |
+
from huggingface_hub import hf_hub_download
|
| 107 |
+
from safetensors.torch import load_model
|
| 108 |
+
|
| 109 |
+
tok_path = hf_hub_download("DenSec02/glublm-36m", "tokenizer.json")
|
| 110 |
+
weights_path = hf_hub_download("DenSec02/glublm-36m", "model.safetensors")
|
| 111 |
+
|
| 112 |
+
tok = GlubTokenizer.from_file(tok_path)
|
| 113 |
+
cfg = ModelConfig(vocab_size=tok.vocab_size)
|
| 114 |
+
model = GlubLM(cfg)
|
| 115 |
+
load_model(model, weights_path)
|
| 116 |
+
|
| 117 |
+
print(generate(model=model, tokenizer=tok, prompt="hello", max_new_tokens=24))
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
Or try it in-browser with zero setup:
|
| 121 |
+
- [Chat demo](https://den-sec.github.io/glublm/) (simple web UI)
|
| 122 |
+
- [Desk pet companion](https://den-sec.github.io/glublm/desk-pet/) (pixel-art PWA)
|
| 123 |
+
- [Colab notebook](https://colab.research.google.com/github/Den-Sec/glublm/blob/master/notebooks/train_colab.ipynb) (train your own goldfish)
|
| 124 |
+
|
| 125 |
+
## License
|
| 126 |
+
|
| 127 |
+
AGPL-3.0 - see [LICENSE](https://github.com/Den-Sec/glublm/blob/master/LICENSE).
|
| 128 |
+
|
| 129 |
+
## Citation
|
| 130 |
+
|
| 131 |
+
```bibtex
|
| 132 |
+
@software{glublm_2026,
|
| 133 |
+
author = {Sepede, Dennis},
|
| 134 |
+
title = {GlubLM: a 36M goldfish language model with a 10-second memory},
|
| 135 |
+
year = {2026},
|
| 136 |
+
url = {https://github.com/Den-Sec/glublm}
|
| 137 |
+
}
|
| 138 |
+
```
|