ChessTransformer200M — Latest (most-trained)

Encoder: FusedBoardEncoder (256d) — 13-token piece-color embeddings
Backbone: 16-layer Transformer (1024d, 16 heads, FFN 4096, GELU, norm_first)
Policy Head: SpatialPolicyHead (from×to square features, 512d)
Value Head: WDL (win/draw/loss) classification
Total params: ~204M

This repo always contains the most recently checkpointed model from ongoing training. It may not be the highest-accuracy model — for that, see chess-transformer-200m-v2.

This model is auto-uploaded every 2000 optimizer steps (~2M positions) so no training progress is lost.