Will there be a small model for speculative decoding?
#71
by Regrin - opened
Hello!
I'm very concerned about the speed of models. The easiest way to improve it without sacrificing quality is speculative decoding.
I have a slow computer, and I'd like to try it, but the problem is that there's no small model for Gemma 4.
Could you perhaps train a microscopic model for speculative decoding?
That would be REALLY REALLY REALLY good.
Hello! I'm not sure about Google's plans, but we have trained a model for use in vLLM, which is available here: https://huggingface.co/RedHatAI/gemma-4-31B-it-speculator.eagle3. We intend to keep iterating over the next week. Feel free to try it out!