sowilow
/

Qwen3.5-2B-DGX-Spark-GGUF

Image-Text-to-Text

4-bit precision

blackwell-optimized

Model card Files Files and versions

🚀 Quick Start with Docker (Recommended)

You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).

1. Pull the Docker Image

docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest

2. Run the Inference Server

For detailed configuration and usage, visit the GitHub Repository.

Qwen3.5-2B-DGX-Spark-GGUF

This repository contains GGUF-quantized weights for Qwen3.5-2B, specifically optimized for NVIDIA Blackwell (DGX Spark) hardware.

🚀 Key Features

Hardware Optimized: Built with CUDA 13.0 and SM121 (Blackwell) native acceleration.
Quantization: Q4_K_M (4-bit unified quantization) for maximum speed.
Base Model Integration: Linked directly to the original Qwen/Qwen3.5-2B.

⚖️ License & Attribution

This model is a quantized version of the original Qwen/Qwen3.5-2B and is subject to the Qwen License Agreement.

By using this model, you agree to comply with Alibaba Cloud / Qwen's licensing terms.

📂 Files Included

qwen3.5-35b-a3b-q4_k_m.gguf: Main model weights.
qwen3.5-2b-mmproj-f16.gguf: Multimodal vision projector.

Created using DGX-Spark-llama.cpp-Bench

Downloads last month: 360

GGUF

Model size

2B params

Architecture

qwen35

Hardware compatibility

Log In to add your hardware

4-bit

Model tree for sowilow/Qwen3.5-2B-DGX-Spark-GGUF

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Quantized

(76)

this model

Collection including sowilow/Qwen3.5-2B-DGX-Spark-GGUF

Trend models for DGX Spark ✨

Build in DGX Spark, for trended models ✨ • 9 items • Updated 1 day ago