hipfire-qwen3.5-9b / README.md
schuttdev's picture
Upload README.md with huggingface_hub
d9a3c83 verified
---
license: mit
base_model: Qwen/Qwen3.5-9B
tags:
- hipfire
- amd
- rdna
- quantized
- qwen3.5
library_name: hipfire
---
# Qwen3.5-9B for hipfire
Pre-quantized **Qwen3.5-9B** (DeltaNet hybrid) for [hipfire](https://github.com/Kaden-Schutt/hipfire), a Rust-native LLM inference engine for AMD RDNA GPUs.
Quantized from [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B).
## Files
| File | Quant | Size | Min VRAM | Speed (5700 XT) |
|------|-------|------|----------|-----------------|
| qwen3.5-9b.q4.hfq | HFQ4 | 4.5GB | 6GB | 45 tok/s |
| qwen3.5-9b.hfq6.hfq | HFQ6 | 6.8GB | 8GB | 37 tok/s |
## Usage
```bash
# Install hipfire
curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash
# Pull and run
hipfire pull qwen3.5:9b
hipfire run qwen3.5:9b "Hello"
```
## Quantization Formats
- **HFQ4**: 4-bit, 256-weight groups (0.53 B/w). Best speed.
- **HFQ6**: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower.
Both include embedded tokenizer and model config.
## About hipfire
Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware.
- GitHub: [Kaden-Schutt/hipfire](https://github.com/Kaden-Schutt/hipfire)
- All models: [docs/MODELS.md](https://github.com/Kaden-Schutt/hipfire/blob/master/docs/MODELS.md)
## License
Model weights subject to original [Qwen license](https://huggingface.co/Qwen/Qwen3.5-9B). hipfire engine: MIT.