Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 6 days ago • 116
Jackrong/Negentropy-claude-opus-4.7-9B-GGUF Image-Text-to-Text • 9B • Updated 5 days ago • 6.22k • 24
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 326
🍎 Qwopus3.6 Collection This collection features the advanced Qwopus3.6 series of multimodal large models, which are fine-tuned from the Qwen3.6 base models with a focus on e • 4 items • Updated 6 days ago • 41
Thinking with Drafting: Optical Decompression via Logical Reconstruction Paper • 2602.11731 • Published Feb 12 • 35