🚀 Quick Start with Docker (Recommended)

You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).

1. Pull the Docker Image

docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest

2. Run the Inference Server

For detailed configuration and usage, visit the GitHub Repository.

LFM2.5-1.2B-Instruct-DGX-Spark-GGUF

This repository contains GGUF-quantized weights for LFM2.5-1.2B-Instruct, specifically optimized for NVIDIA Blackwell (DGX Spark) hardware.

🚀 Key Features

Hardware Optimized: Built with CUDA 13.0 and SM121 (Blackwell) native acceleration.
Quantization:
- Q4_K_M: Balanced performance and accuracy.
- Q8_0: High precision preservation.
Base Model Integration: Linked directly to the original LiquidAI/LFM2.5-1.2B-Instruct.