☕ Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

☕ ko-fi.com/djlougen

Harmonic-Hermes-9B

Harmonic-Hermes-9B is the Stage 2 agentic fine-tune of Harmonic-9B — a dedicated tool-calling and agent model built on top of a strong reasoning backbone.

Where Harmonic-9B teaches the model how to think, Harmonic-Hermes-9B teaches it how to act — structured tool use, multi-turn agent workflows, and function calling, all grounded in the reasoning depth from Stage 1.

Stage 1 — Harmonic-9B: Heavy reasoning fine-tune on privately generated, structurally validated data. Every row passes strict quality gates. The thinking backbone.

Stage 2 (this model): Agentic fine-tune on tool-calling and agent interaction data. Inherits Stage 1's reasoning depth and adds structured action capabilities.

What This Model Does

Tool calling / function calling — structured JSON tool use in the Hermes agent format
Multi-turn agent workflows — maintains coherent state across extended tool-use conversations
Reasoning-grounded decisions — inherits Harmonic-9B's self-correction, verification, and exploration before committing to actions

Training Approach

Harmonic-Hermes-9B is a Stage 2 fine-tune of Harmonic-9B, trained on hermes-agent-traces-filtered — 3,679 structurally validated agent traces with deep reasoning, tool calling, and multi-turn workflows.

The key insight: most agent models are fine-tuned directly from base models or generic instruct tunes. They learn tool-call formatting but not when or why to use tools. By starting from a model that already reasons deeply (Stage 1), the agent behaviors are grounded in genuine multi-step thinking rather than pattern-matched tool invocations.

How Our Training Data Compares

Quality Comparison

Metrics Summary

We ran the same structural quality analysis used for Stage 1 against comparable public agentic datasets. The results show why starting from quality-filtered data matters:

Metric	Harmonic Traces (ours)	Carnice GLM-5 (kai-os)
Rows	3,679	1,627
Source model	Multiple frontier models	GLM-5 via OpenRouter
Think block depth	581 words avg	40 words avg
Self-correction	63.0%	29.7%
Verification	95.9%	63.7%
Alternative exploration	43.7%	51.3%
Valid JSON (all tool calls)	100%	100%
Tool calls per conversation	18.5	5.4
Messages per conversation	32.1	12.1
Multi-turn (>5 messages)	97.8%	89.6%

The critical gap is reasoning depth: 581 vs 40 words in think blocks. Carnice traces plan briefly then act — the model learns tool-call formatting but not deliberation. Our traces contain 14x deeper reasoning before every action, with nearly universal verification (96% vs 64%) and twice the self-correction rate.

The conversation depth also matters for agent training. Our traces average 32 messages and 18 tool calls per trajectory — complete agentic sessions, not short dispatches. This teaches the model to maintain coherent state across extended multi-step workflows.

Reasoning Flow

Marker density across thinking traces — the filtered set shows tighter, more consistent reasoning structure.

Conversation Structure

Category Distribution

Training data: DJLougen/hermes-agent-traces-filtered

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("DJLougen/Harmonic-Hermes-9B")
tokenizer = AutoTokenizer.from_pretrained("DJLougen/Harmonic-Hermes-9B")

Reasoning + Tool Use

The model uses <think> blocks for reasoning before acting:

<think>
The user wants to check the weather in Toronto. I have a get_weather tool available.
Let me call it with the right parameters...
</think>

<tool_call>
{"name": "get_weather", "arguments": {"location": "Toronto, Canada"}}
</tool_call>

Architecture

Base: Harmonic-9B (Stage 1 reasoning fine-tune of Qwen 3.5 9B)
Parameters: 9.65B
Training: LoRA fine-tuning, merged into base weights
Precision: BF16
Context: 8192 tokens

Intended Use

Agentic workflows with tool calling and function execution
Multi-turn assistant interactions requiring structured reasoning
Local inference as an always-on agent backbone
Research into reasoning-grounded agent behavior

Limitations

9B parameter model — not suitable for tasks requiring extensive world knowledge
Agent capabilities are shaped by the training data distribution
Benchmark evaluation is ongoing

License

Apache 2.0 — same as the base model. Fully commercial use permitted.

Model tree for DJLougen/Harmonic-Hermes-9B

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

DJLougen/Harmonic-9B

Finetuned

(1)

this model

Finetunes

1 model

Quantizations

5 models

DJLougen
/

Harmonic-Hermes-9B

☕ Support This Work

Harmonic-Hermes-9B

What This Model Does

Training Approach

How Our Training Data Compares

Quality Comparison

Metrics Summary

Reasoning Flow

Conversation Structure

Category Distribution

Usage

Reasoning + Tool Use

Architecture

Intended Use

Limitations

License

Links

Model tree for DJLougen/Harmonic-Hermes-9B

Dataset used to train DJLougen/Harmonic-Hermes-9B