schuttdev
/

hipfire-qwen3.5-9b

Model card Files Files and versions

hipfire-qwen3.5-9b / README.md

schuttdev's picture

Upload README.md with huggingface_hub

d9a3c83 verified 7 days ago

|

history blame contribute delete

1.5 kB

	---
	license: mit
	base_model: Qwen/Qwen3.5-9B
	tags:
	- hipfire
	- amd
	- rdna
	- quantized
	- qwen3.5
	library_name: hipfire
	---

	# Qwen3.5-9B for hipfire

	Pre-quantized Qwen3.5-9B (DeltaNet hybrid) for [hipfire](https://github.com/Kaden-Schutt/hipfire), a Rust-native LLM inference engine for AMD RDNA GPUs.

	Quantized from [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B).

	## Files

	\| File \| Quant \| Size \| Min VRAM \| Speed (5700 XT) \|
	\|------\|-------\|------\|----------\|-----------------\|
	\| qwen3.5-9b.q4.hfq \| HFQ4 \| 4.5GB \| 6GB \| 45 tok/s \|
	\| qwen3.5-9b.hfq6.hfq \| HFQ6 \| 6.8GB \| 8GB \| 37 tok/s \|

	## Usage

	```bash
	# Install hipfire
	curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh \| bash

	# Pull and run
	hipfire pull qwen3.5:9b
	hipfire run qwen3.5:9b "Hello"
	```

	## Quantization Formats

	- HFQ4: 4-bit, 256-weight groups (0.53 B/w). Best speed.
	- HFQ6: 6-bit, 256-weight groups (0.78 B/w). Best quality. ~15% slower.

	Both include embedded tokenizer and model config.

	## About hipfire

	Rust + HIP inference engine for AMD consumer GPUs (RDNA1–RDNA4). No Python in the hot path. 9x faster than llama.cpp+ROCm on the same hardware.

	- GitHub: [Kaden-Schutt/hipfire](https://github.com/Kaden-Schutt/hipfire)
	- All models: [docs/MODELS.md](https://github.com/Kaden-Schutt/hipfire/blob/master/docs/MODELS.md)

	## License

	Model weights subject to original [Qwen license](https://huggingface.co/Qwen/Qwen3.5-9B). hipfire engine: MIT.