ChessTransformer200M — Latest (most-trained)
This repo always contains the most recently checkpointed model from ongoing training. It may not be the highest-accuracy model — for that, see chess-transformer-200m-v2.
This model is auto-uploaded every 2000 optimizer steps (~2M positions) so no training progress is lost.
Architecture
- Encoder: FusedBoardEncoder (256d) — 13-token piece-color embeddings
- Backbone: 16-layer Transformer (1024d, 16 heads, FFN 4096, GELU, norm_first)
- Policy Head: SpatialPolicyHead (from×to square features, 512d)
- Value Head: WDL (win/draw/loss) classification
- Total params: ~204M
Training
- Dataset: avewright/chess-positions-lichess-sf (~832M source-sharded positions)
- Base model: avewright/chess-transformer-200m-v2
- Experiment: exp076_continue_v2
- Downloads last month
- 20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support