hipfire-qwen3.5-9b / README.md
schuttdev's picture
Upload README.md with huggingface_hub
d9a3c83 verified
metadata
license: mit
base_model: Qwen/Qwen3.5-9B
tags:
  - hipfire
  - amd
  - rdna
  - quantized
  - qwen3.5
library_name: hipfire

Qwen3.5-9B for hipfire

Pre-quantized Qwen3.5-9B (DeltaNet hybrid) for hipfire, a Rust-native LLM inference engine for AMD RDNA GPUs.

Quantized from Qwen/Qwen3.5-9B.

Files

File Quant Size Min VRAM Speed (5700 XT)
qwen3.5-9b.q4.hfq HFQ4 4.5GB 6GB 45 tok/s
qwen3.5-9b.hfq6.hfq HFQ6 6.8GB 8GB 37 tok/s

Usage

# Install hipfire
curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash

# Pull and run
hipfire pull qwen3.5:9b
hipfire run qwen3.5:9b "Hello"

Quantization Formats

  • HFQ4: 4-bit, 256-weight groups (0.53 B/w). Best speed.
  • HFQ6: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower.

Both include embedded tokenizer and model config.

About hipfire

Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware.

License

Model weights subject to original Qwen license. hipfire engine: MIT.