LFM2 2.6B Mr. Tic Tac Toe ❌ ⭕ - a anakin87 Collection

anakin87 's Collections

LFM2 2.6B Mr. Tic Tac Toe ❌ ⭕

📝 Cool LLM papers

Qwen Scheduler GRPO

Gemma Neogenesis 💎🌍🇮🇹

🇮🇹 Italian Merges

LFM2 2.6B Mr. Tic Tac Toe ❌ ⭕

updated 9 days ago

Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq

Running on Zero

Agents

RL

2

Mr. Tic Tac Toe

⭕

2

Play Tic Tac Toe against a small RL tuned model

Note Play against the model!
anakin87/tictactoe

Viewer • Updated 12 days ago • 200 • 22

Note Synthetic Tic Tac Toe data for SFT warm-up, generated using gpt-5-mini and a RL environment
anakin87/tictactoe-filtered

Viewer • Updated 12 days ago • 174 • 20

Note Synthetic Tic Tac Toe data, filtered by removing losing games
anakin87/LFM2-2.6B-ttt-sft

Text Generation • 3B • Updated 12 days ago • 8

Note Model after SFT warm-up
anakin87/LFM2-2.6B-ttt-rl

Text Generation • Updated 12 days ago

Note LoRA adapter after first RL phase
anakin87/LFM2-2.6B-ttt-rl-merged

Text Generation • 3B • Updated 12 days ago • 11

Note Standalone model after first RL phase
anakin87/LFM2-2.6B-ttt-rl-2

Text Generation • Updated 12 days ago • 9

Note LoRA adapter after second RL phase
anakin87/LFM2-2.6B-mr-tictactoe

Text Generation • 3B • Updated 12 days ago • 284

Note Standalone model after second RL phase