☕ Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

☕ ko-fi.com/djlougen

Harmonic-Hermes-9B-GGUF

Harmonic-Hermes-9B

GGUF quantizations of Harmonic-Hermes-9B for local inference with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.

Harmonic-Hermes-9B is the Stage 2 agentic fine-tune of Harmonic-9B — a dedicated tool-calling and agent model built on top of a strong reasoning backbone.

Where Harmonic-9B teaches the model how to think, Harmonic-Hermes-9B teaches it how to act — structured tool use, multi-turn agent workflows, and function calling, all grounded in the reasoning depth from Stage 1.

Stage 1Harmonic-9B: Heavy reasoning fine-tune on privately generated, structurally validated data. Every row passes strict quality gates. The thinking backbone.

Stage 2 (this model): Agentic fine-tune on hermes-agent-traces-filtered — 3,679 structurally validated agent traces with deep reasoning, tool calling, and multi-turn workflows.

Available Quantizations

File Quant Size Use Case
Qwen3.5-9B-Harmonic.F16.gguf F16 ~18 GB Maximum quality, needs 24GB+ VRAM
Harmonic-Hermes-9B-Q8_0.gguf Q8_0 ~9.5 GB Near-lossless, 16GB VRAM
Harmonic-Hermes-9B-Q6_K.gguf Q6_K ~6.9 GB Very high quality, 12GB VRAM
Harmonic-Hermes-9B-Q5_K_M.gguf Q5_K_M ~6.1 GB Best 5-bit for quality
Harmonic-Hermes-9B-Q5_K_S.gguf Q5_K_S ~5.9 GB 5-bit, smaller
Harmonic-Hermes-9B-Q5_0.gguf Q5_0 ~5.9 GB 5-bit legacy
Qwen3.5-9B-Harmonic.Q4_K_M.gguf Q4_K_M ~5.3 GB Best 4-bit for quality
Harmonic-Hermes-9B-Q4_K_S.gguf Q4_K_S ~5.0 GB 4-bit, smaller
Harmonic-Hermes-9B-Q4_0.gguf Q4_0 ~5.0 GB 4-bit legacy
Harmonic-Hermes-9B-IQ4_XS.gguf IQ4_XS ~4.9 GB 4-bit imatrix, smallest 4-bit
Harmonic-Hermes-9B-Q3_K_L.gguf Q3_K_L ~4.6 GB Best 3-bit for quality
Harmonic-Hermes-9B-Q3_K_M.gguf Q3_K_M ~4.4 GB 3-bit, balanced
Harmonic-Hermes-9B-Q3_K_S.gguf Q3_K_S ~4.0 GB 3-bit, smaller
Harmonic-Hermes-9B-Q2_K.gguf Q2_K ~3.6 GB Smallest, significant quality loss

MLX (Apple Silicon)

MLX conversions are available in separate repos:

Repo Quant Size
Harmonic-Hermes-9B-MLX-bf16 BF16 ~17 GB
Harmonic-Hermes-9B-MLX-8bit 8-bit ~8.9 GB
Harmonic-Hermes-9B-MLX-4bit 4-bit ~4.8 GB

Vision (Multimodal)

This model includes vision projectors for multimodal inference. Use with llama.cpp's --mmproj flag for image understanding tasks.

How Our Training Data Compares

Quality Comparison

Quality Comparison

Metrics Summary

Metrics Summary

We ran the same structural quality analysis used for Stage 1 against comparable public agentic datasets. The results show why starting from quality-filtered data matters:

Metric Harmonic Traces (ours) Carnice GLM-5 (kai-os)
Rows 3,679 1,627
Source model Multiple frontier models GLM-5 via OpenRouter
Think block depth 581 words avg 40 words avg
Self-correction 63.0% 29.7%
Verification 95.9% 63.7%
Alternative exploration 43.7% 51.3%
Valid JSON (all tool calls) 100% 100%
Tool calls per conversation 18.5 5.4
Messages per conversation 32.1 12.1
Multi-turn (>5 messages) 97.8% 89.6%

The critical gap is reasoning depth: 581 vs 40 words in think blocks. Carnice traces plan briefly then act — the model learns tool-call formatting but not deliberation. Our traces contain 14x deeper reasoning before every action, with nearly universal verification (96% vs 64%) and twice the self-correction rate.

The conversation depth also matters for agent training. Our traces average 32 messages and 18 tool calls per trajectory — complete agentic sessions, not short dispatches. This teaches the model to maintain coherent state across extended multi-step workflows.

Reasoning Flow

Reasoning Flow

Marker density across thinking traces — the filtered set shows tighter, more consistent reasoning structure.

Conversation Structure

Conversation Structure

Category Distribution

Categories

Training data: DJLougen/hermes-agent-traces-filtered

What This Model Does

  • Tool calling / function calling — structured JSON tool use in the Hermes agent format
  • Multi-turn agent workflows — maintains coherent state across extended tool-use conversations
  • Reasoning-grounded decisions — inherits Harmonic-9B's self-correction, verification, and exploration before committing to actions

Training Approach

Harmonic-Hermes-9B is a Stage 2 fine-tune of Harmonic-9B, trained on hermes-agent-traces-filtered — 3,679 structurally validated agent traces with deep reasoning, tool calling, and multi-turn workflows.

The key insight: most agent models are fine-tuned directly from base models or generic instruct tunes. They learn tool-call formatting but not when or why to use tools. By starting from a model that already reasons deeply (Stage 1), the agent behaviors are grounded in genuine multi-step thinking rather than pattern-matched tool invocations.

Usage

Ollama

ollama run DJLougen/Harmonic-Hermes-9B-GGUF

llama.cpp

./llama-cli -m Harmonic-Hermes-9B-Q8_0.gguf -p "Use the available tools to..." -n 4096

LM Studio

Download any quantization and load in LM Studio. The model follows standard ChatML formatting.

Reasoning + Tool Use

The model uses <think> blocks for reasoning before acting:

<think>
The user wants to check the weather in Toronto. I have a get_weather tool available.
Let me call it with the right parameters...
</think>

<tool_call>
{"name": "get_weather", "arguments": {"location": "Toronto, Canada"}}
</tool_call>

Intended Use

  • Agentic workflows with tool calling and function execution
  • Multi-turn assistant interactions requiring structured reasoning
  • Local inference as an always-on agent backbone
  • Research into reasoning-grounded agent behavior

Limitations

  • 9B parameter model — not suitable for tasks requiring extensive world knowledge
  • Agent capabilities are shaped by the training data distribution
  • Benchmark evaluation is ongoing

Architecture

  • Base: Harmonic-9B (Stage 1 reasoning fine-tune of Qwen 3.5 9B)
  • Parameters: 9.65B
  • Training: LoRA fine-tuning, merged into base weights
  • Precision: BF16
  • Context: 8192 tokens

License

Apache 2.0 — same as the base model. Fully commercial use permitted.

Links

Downloads last month
1,684
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Harmonic-Hermes-9B-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(5)
this model

Dataset used to train DJLougen/Harmonic-Hermes-9B-GGUF