π Quick Start with Docker (Recommended)
You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).
1. Pull the Docker Image
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest
2. Run the Inference Server
For detailed configuration and usage, visit the GitHub Repository.
gpt-oss-20b-DGX-Spark-GGUF
This repository provides GGUF quantized versions of OpenAI's gpt-oss-20b, optimized specifically for NVIDIA Blackwell (DGX Spark) architectures.
These models were converted and quantized using llama.cpp with support for the gpt_oss architecture.
Model Highlights
- Optimized for Blackwell: Specifically tuned for high-performance inference on NVIDIA DGX Spark (SM120/SM121).
- Flexible Quantization:
Q4_MXFP4: 4-bit Medium quantization (recommended for efficiency).Q8_0: 8-bit quantization (recommended for maximum precision).
- MoE Architecture: 21B total parameters with 3.6B active parameters, leveraging Mixture-of-Experts for high efficiency.
- Long Context: Supports up to 131k context length.
Quantization Details
| File | Quant Method | Bitrate | Size | Description |
|---|---|---|---|---|
gpt-oss-20b-q4_mxfp4.gguf |
Q4_MXFP4 | 4.5 bpw | ~12 GB | Balanced performance and quality. |
gpt-oss-20b-q8_0.gguf |
Q8_0 | 8.5 bpw | ~22 GB | Standard 8-bit quantization. |
Quick Start (llama.cpp)
To run these models on a DGX Spark system:
Pull the optimized Docker image:
docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latestRun with llama-server:
docker run --gpus all -v $(pwd)/models:/model \ ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest \ llama-server -m /model/gpt-oss-20b-q4_mxfp4.gguf -ngl 99 -c 8192
Original Model Information
This is a quantized version of openai/gpt-oss-20b. Please refer to the original model card for details on training, safety, and benchmarks.
Citation
@misc{openai2025gptoss120bgptoss20bmodel,
title={gpt-oss-120b & gpt-oss-20b Model Card},
author={OpenAI},
year={2025},
eprint={2508.10925},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.10925},
}
- Downloads last month
- -
8-bit
Model tree for sowilow/gpt-oss-20b-DGX-Spark-GGUF
Base model
openai/gpt-oss-20b