LFM2 2.6B Mr. Tic Tac Toe ❌ ⭕
Collection
Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq • 8 items • Updated • 2
LoRA adapter (rank 8) from the second round of CISPO training for Tic Tac Toe, applied on top of anakin87/LFM2-2.6B-ttt-rl-merged.
This adapter must be loaded on top of the RL round 1 merged model. The merged version is available as anakin87/LFM2-2.6B-mr-tictactoe.
This is a checkpoint from 🎓 LLM RL Environments Lil Course, a hands-on course on building RL environments for Language Models, where models learn from rewards, not examples. It walks through the full process of turning a small open model into a specialist that outperforms a large proprietary one on a specific task (Tic Tac Toe).
🤗🕹️ Play against the final model
100 games per setting.
| Model vs random opponent | % Wins | % Draws | % Losses | % Follows format | % Games w invalid moves |
|---|---|---|---|---|---|
| LiquidAI/LFM2-2.6B | 40 | 11 | 49 | 27.8 | 40 |
| anakin87/LFM2-2.6B-ttt-sft | 74 | 13 | 13 | 99.8 | 11 |
| anakin87/LFM2-2.6B-ttt-rl | 86 | 12 | 2 | 100 | 1 |
| anakin87/LFM2-2.6B-ttt-rl-2 | 90 | 10 | 0 | 100 | 0 |
| Model vs optimal opponent | % Wins | % Draws | % Losses | % Follows format | % Games w invalid moves |
| LiquidAI/LFM2-2.6B | 0 | 11 | 89 | 24.7 | 43 |
| anakin87/LFM2-2.6B-ttt-sft | 0 | 52 | 48 | 99 | 14 |
| anakin87/LFM2-2.6B-ttt-rl | 0 | 85 | 15 | 100 | 1 |
| anakin87/LFM2-2.6B-ttt-rl-2 | 0 | 97 | 3 | 99.8 | 0 |
Base model
LiquidAI/LFM2-2.6B