ChessTransformer200M — Latest (most-trained)

This repo always contains the most recently checkpointed model from ongoing training. It may not be the highest-accuracy model — for that, see chess-transformer-200m-v2.

This model is auto-uploaded every 2000 optimizer steps (~2M positions) so no training progress is lost.

Architecture

  • Encoder: FusedBoardEncoder (256d) — 13-token piece-color embeddings
  • Backbone: 16-layer Transformer (1024d, 16 heads, FFN 4096, GELU, norm_first)
  • Policy Head: SpatialPolicyHead (from×to square features, 512d)
  • Value Head: WDL (win/draw/loss) classification
  • Total params: ~204M

Training

  • Dataset: avewright/chess-positions-lichess-sf (~832M source-sharded positions)
  • Base model: avewright/chess-transformer-200m-v2
  • Experiment: exp076_continue_v2
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train avewright/chess-transformer-200m-latest