🚀 Quick Start with Docker (Recommended)

You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).

1. Pull the Docker Image

docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest

2. Run the Inference Server

For detailed configuration and usage, visit the GitHub Repository.


Qwen3.5-2B-DGX-Spark-GGUF

This repository contains GGUF-quantized weights for Qwen3.5-2B, specifically optimized for NVIDIA Blackwell (DGX Spark) hardware.

🚀 Key Features

  • Hardware Optimized: Built with CUDA 13.0 and SM121 (Blackwell) native acceleration.
  • Quantization: Q4_K_M (4-bit unified quantization) for maximum speed.
  • Base Model Integration: Linked directly to the original Qwen/Qwen3.5-2B.

⚖️ License & Attribution

This model is a quantized version of the original Qwen/Qwen3.5-2B and is subject to the Qwen License Agreement.

By using this model, you agree to comply with Alibaba Cloud / Qwen's licensing terms.

📂 Files Included

  • qwen3.5-35b-a3b-q4_k_m.gguf: Main model weights.
  • qwen3.5-2b-mmproj-f16.gguf: Multimodal vision projector.

Created using DGX-Spark-llama.cpp-Bench

Downloads last month
360
GGUF
Model size
2B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sowilow/Qwen3.5-2B-DGX-Spark-GGUF

Finetuned
Qwen/Qwen3.5-2B
Quantized
(76)
this model

Collection including sowilow/Qwen3.5-2B-DGX-Spark-GGUF