Post
39
🤗 Just released Rain-100M, an experimental ~97M-parameter Qwen3-style language model trained from random initialization.
Repo: raincandy-u/Rain-100M
Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only
Tokenizer: custom 16k BPE, context length 4096
Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
Repo: raincandy-u/Rain-100M
Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only
Tokenizer: custom 16k BPE, context length 4096
Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!