| license: mit | |
| base_model: Qwen/Qwen3.5-9B | |
| tags: | |
| - hipfire | |
| - amd | |
| - rdna | |
| - quantized | |
| - qwen3.5 | |
| library_name: hipfire | |
| # Qwen3.5-9B for hipfire | |
| Pre-quantized **Qwen3.5-9B** (DeltaNet hybrid) for [hipfire](https://github.com/Kaden-Schutt/hipfire), a Rust-native LLM inference engine for AMD RDNA GPUs. | |
| Quantized from [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B). | |
| ## Files | |
| | File | Quant | Size | Min VRAM | Speed (5700 XT) | | |
| |------|-------|------|----------|-----------------| | |
| | qwen3.5-9b.q4.hfq | HFQ4 | 4.5GB | 6GB | 45 tok/s | | |
| | qwen3.5-9b.hfq6.hfq | HFQ6 | 6.8GB | 8GB | 37 tok/s | | |
| ## Usage | |
| ```bash | |
| # Install hipfire | |
| curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash | |
| # Pull and run | |
| hipfire pull qwen3.5:9b | |
| hipfire run qwen3.5:9b "Hello" | |
| ``` | |
| ## Quantization Formats | |
| - **HFQ4**: 4-bit, 256-weight groups (0.53 B/w). Best speed. | |
| - **HFQ6**: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower. | |
| Both include embedded tokenizer and model config. | |
| ## About hipfire | |
| Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware. | |
| - GitHub: [Kaden-Schutt/hipfire](https://github.com/Kaden-Schutt/hipfire) | |
| - All models: [docs/MODELS.md](https://github.com/Kaden-Schutt/hipfire/blob/master/docs/MODELS.md) | |
| ## License | |
| Model weights subject to original [Qwen license](https://huggingface.co/Qwen/Qwen3.5-9B). hipfire engine: MIT. | |