Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

Group Tree Optimization (GTO) is a framework designed to address draft policy misalignment in speculative decoding. While standard methods optimize for a single greedy path, GTO aligns training with the actual tree-based decoding policy used during inference. This is achieved through a Draft Tree Reward objective and a stable Group-based Draft Policy Training scheme.

Paper: Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Repository: https://github.com/hsj576/GTO

Performance

GTO achieves state-of-the-art acceleration for LLM inference:

5.6x faster than vanilla autoregressive decoding.
7% faster than previous state-of-the-art methods like EAGLE-3.

Usage

To use this model for accelerated inference, please follow the setup instructions in the official GTO repository.

Inference via Web UI

The codebase provides a web interface for testing the acceleration. After setting up the environment and cloning the repo, you can run:

python -m application.webui --ea-model-path [path of GTO weight] \ 
    --base-model-path [path of the original model] \
    --model-type [vicuna\llama3\qwen] \
    --total-token [int]

The total-token parameter represents the number of draft tokens. Adjusting this based on your specific device and model can achieve better results.

Citation

If you find this work useful, please cite:

@article{hu2025bridging,
  title={Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding},
  author={Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
  journal={arXiv preprint arXiv:2509.22134},
  year={2025}
}

Acknowledgements

The implementation is based on the open-source repository of EAGLE. This project has been influenced by many projects in the LLM community, such as HASS and GRIFFIN.

Downloads last month: 7

Paper for husj576/GTO-deepseek-8B

Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

Paper • 2509.22134 • Published Sep 26, 2025