Fast, lossless LLM inference via dual-view diffusion decoding.
-
chiennv/Orthrus-Qwen3-4B
Text Generation • 5B • Updated • 44 • 3 -
chiennv/Orthrus-Qwen3-8B
Text Generation • 10B • Updated • 937 • 7 -
chiennv/Orthrus-Qwen3-1.7B
Text Generation • 2B • Updated • 72 • 3 -
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion
Paper • 2605.12825 • Published • 10