πŸš€ Quick Start with Docker (Recommended)

You can easily run this model using the DGX-Spark-llama.cpp-Bench inference engine. It's pre-configured for high-performance inference on NVIDIA hardware (especially Blackwell/DGX Spark).

1. Pull the Docker Image

docker pull ghcr.io/sowilow/dgx-spark-llama.cpp-bench:latest

2. Run the Inference Server

For detailed configuration and usage, visit the GitHub Repository.


LFM2.5-1.2B-Instruct-DGX-Spark-GGUF

This repository contains GGUF-quantized weights for LFM2.5-1.2B-Instruct, specifically optimized for NVIDIA Blackwell (DGX Spark) hardware.

πŸš€ Key Features

  • Hardware Optimized: Built with CUDA 13.0 and SM121 (Blackwell) native acceleration.
  • Quantization:
    • Q4_K_M: Balanced performance and accuracy.
    • Q8_0: High precision preservation.
  • Base Model Integration: Linked directly to the original LiquidAI/LFM2.5-1.2B-Instruct.

βš–οΈ License & Attribution

This model is a quantized version of the original LiquidAI/LFM2.5-1.2B-Instruct and is subject to its original license.

πŸ“‚ Files Included

  • lfm2.5-1.2b-instruct-q4_k_m.gguf: 4-bit quantized model.
  • lfm2.5-1.2b-instruct-q8_0.gguf: 8-bit quantized model.

Created using DGX-Spark-llama.cpp-Bench

Downloads last month
179
GGUF
Model size
1B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF

Quantized
(43)
this model

Collection including sowilow/LFM2.5-1.2B-Instruct-DGX-Spark-GGUF