Mr. Tic Tac Toe
⭕
2
Play Tic Tac Toe against a small RL tuned model
Dataset and models for transforming LFM2 2.6B into a Tic Tac Toe master using RL Environments. Free course: https://t.ly/4jIFq
Play Tic Tac Toe against a small RL tuned model
Note Play against the model!
Note Synthetic Tic Tac Toe data for SFT warm-up, generated using gpt-5-mini and a RL environment
Note Synthetic Tic Tac Toe data, filtered by removing losing games
Note Model after SFT warm-up
Note LoRA adapter after first RL phase
Note Standalone model after first RL phase
Note LoRA adapter after second RL phase
Note Standalone model after second RL phase