ibitato/c64-ministral-3-8b-thinking-c64-reasoning-gguf

Overview

GGUF exports of the C64-focused Ministral 3 8B reasoning fine-tune, ready for llama.cpp and Ollama.

Project source code and training pipeline:

Related repositories:

Technical Details

  • Derived from: mistralai/Ministral-3-8B-Reasoning-2512 + project LoRA adaptation
  • Context length in GGUF metadata: 262,144 tokens
  • Architecture in GGUF: mistral3

Training Provenance

  • DAPT checkpoint used: checkpoint-39
  • SFT checkpoint used: checkpoint-153
  • DAPT steps: 39 / 39
  • SFT steps: 153 / 153
  • Data splits: DAPT 408/27/45, SFT 1620/204/190
  • Card generated at (UTC): 2026-03-02T16:47:23.284978+00:00
  • Source git revision: 13fafe7

Included Files

File Size
c64-ministral-3-8b-thinking-c64-F16.gguf 15.82 GiB
c64-ministral-3-8b-thinking-c64-Q4_K_M.gguf 4.84 GiB
c64-ministral-3-8b-thinking-c64-Q6_K.gguf 6.49 GiB
c64-ministral-3-8b-thinking-c64-Q8_0.gguf 8.41 GiB

Modelfile templates are included for direct Ollama import.

Quick Start

Ollama

ollama create c64-ministral-c64 -f Modelfile.Q4_K_M
ollama create c64-ministral-c64-q6 -f Modelfile.Q6_K
ollama create c64-ministral-c64-q8 -f Modelfile.Q8_0

llama.cpp

llama-cli -m c64-ministral-3-8b-thinking-c64-Q6_K.gguf -ngl 99 -c 4096 -n 256 -p "Explain VIC-II timing."

llama-server (OpenAI-compatible API / GUI reasoning panel)

python3 scripts/prompt_contract.py --model-profile 8b --print-full > .cache/runtime/c64_system_prompt_8b.txt
llama-server \
  -hf ibitato/c64-ministral-3-8b-thinking-c64-reasoning-gguf:F16 \
  --host 0.0.0.0 --port 8080 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget -1 \
  --system-prompt-file .cache/runtime/c64_system_prompt_8b.txt \
  --ctx-size 32768 \
  -ngl 99 \
  --temp 0.15 \
  --threads "$(nproc)" \
  --fit on

Use --reasoning-format none for raw [THINK]...[/THINK] tags in content instead of separated reasoning fields.

Reasoning Validation Snapshot

  • Validation status: PASS
  • Source artifacts: results/reasoning_validation/8b/20260302_151302
Metric Value
single_think_tag_rate 1.0000
single_balanced_tag_rate 1.0000
single_final_after_think_rate 1.0000
multi_turn_retention_rate 1.0000
format_contract_pass_rate 1.0000
exact_hash_match_rate 1.0000
semantic_similarity_avg 1.0000
crash_or_timeout_rate 0.0000

Reference Throughput (legacy llama-bench data)

No current benchmark CSV was found for this profile, so legacy llama-bench values are shown.

Infrastructure used:

  • Host OS: Fedora Linux 43 (Server Edition)
  • Host kernel: 6.18.8-200.fc43.x86_64
  • CPU: AMD RYZEN AI MAX+ 395 (16C/32T)
  • System RAM: 30 GiB
  • GPU: AMD Radeon 8060S (96.00 GiB VRAM visible to PyTorch)
  • Container image: rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1
  • llama.cpp revision: 2afcdb9
  • Benchmark command source: scripts/inference/benchmark_gguf_matrix.sh
Quant pp256 (tok/s) tg64 (tok/s)
Q4_K_M 1080.50 33.52
Q6_K 820.06 26.31
Q8_0 404.59 21.20
F16 546.18 10.68

Run bash scripts/inference/benchmark_gguf_matrix.sh --model-profile 8b to refresh this section with current CSV-based metrics.

Downloads last month
55
GGUF
Model size
8B params
Architecture
mistral3
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ibitato/c64-ministral-3-8b-thinking-c64-reasoning-gguf

Collection including ibitato/c64-ministral-3-8b-thinking-c64-reasoning-gguf