ibitato/c64-ministral-3-8b-thinking-c64-reasoning-gguf

Overview

GGUF exports of the C64-focused Ministral 3 8B reasoning fine-tune, ready for llama.cpp and Ollama.

Project source code and training pipeline:

https://github.com/ibitato/C64_AI_Companion

Related repositories:

Technical Details

Derived from: mistralai/Ministral-3-8B-Reasoning-2512 + project LoRA adaptation
Context length in GGUF metadata: 262,144 tokens
Architecture in GGUF: mistral3

Training Provenance

DAPT checkpoint used: checkpoint-39
SFT checkpoint used: checkpoint-153
DAPT steps: 39 / 39
SFT steps: 153 / 153
Data splits: DAPT 408/27/45, SFT 1620/204/190
Card generated at (UTC): 2026-03-02T16:47:23.284978+00:00
Source git revision: 13fafe7

Included Files

File	Size
`c64-ministral-3-8b-thinking-c64-F16.gguf`	15.82 GiB
`c64-ministral-3-8b-thinking-c64-Q4_K_M.gguf`	4.84 GiB
`c64-ministral-3-8b-thinking-c64-Q6_K.gguf`	6.49 GiB
`c64-ministral-3-8b-thinking-c64-Q8_0.gguf`	8.41 GiB

Modelfile templates are included for direct Ollama import.

Quick Start

Ollama

ollama create c64-ministral-c64 -f Modelfile.Q4_K_M
ollama create c64-ministral-c64-q6 -f Modelfile.Q6_K
ollama create c64-ministral-c64-q8 -f Modelfile.Q8_0

llama.cpp

llama-cli -m c64-ministral-3-8b-thinking-c64-Q6_K.gguf -ngl 99 -c 4096 -n 256 -p "Explain VIC-II timing."

llama-server (OpenAI-compatible API / GUI reasoning panel)

python3 scripts/prompt_contract.py --model-profile 8b --print-full > .cache/runtime/c64_system_prompt_8b.txt
llama-server \
  -hf ibitato/c64-ministral-3-8b-thinking-c64-reasoning-gguf:F16 \
  --host 0.0.0.0 --port 8080 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget -1 \
  --system-prompt-file .cache/runtime/c64_system_prompt_8b.txt \
  --ctx-size 32768 \
  -ngl 99 \
  --temp 0.15 \
  --threads "$(nproc)" \
  --fit on

Use --reasoning-format none for raw [THINK]...[/THINK] tags in content instead of separated reasoning fields.

Reasoning Validation Snapshot

Validation status: PASS
Source artifacts: results/reasoning_validation/8b/20260302_151302

Metric	Value
single_think_tag_rate	1.0000
single_balanced_tag_rate	1.0000
single_final_after_think_rate	1.0000
multi_turn_retention_rate	1.0000
format_contract_pass_rate	1.0000
exact_hash_match_rate	1.0000
semantic_similarity_avg	1.0000
crash_or_timeout_rate	0.0000

Reference Throughput (legacy llama-bench data)

No current benchmark CSV was found for this profile, so legacy llama-bench values are shown.

Infrastructure used:

Host OS: Fedora Linux 43 (Server Edition)
Host kernel: 6.18.8-200.fc43.x86_64
CPU: AMD RYZEN AI MAX+ 395 (16C/32T)
System RAM: 30 GiB
GPU: AMD Radeon 8060S (96.00 GiB VRAM visible to PyTorch)
Container image: rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1
llama.cpp revision: 2afcdb9
Benchmark command source: scripts/inference/benchmark_gguf_matrix.sh