☕ Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

☕ ko-fi.com/djlougen

Harmonic-Hermes-9B-MLX-8bit

Harmonic-Hermes-9B

8-bit MLX conversion of Harmonic-Hermes-9B for local inference on Apple Silicon with mlx-lm.

Quantization Size Use Case
8-bit ~8.9 GB Near-lossless quality, 16GB+ unified memory

Other formats


Harmonic-Hermes-9B is the Stage 2 agentic fine-tune of Harmonic-9B — a dedicated tool-calling and agent model built on top of a strong reasoning backbone.

Where Harmonic-9B teaches the model how to think, Harmonic-Hermes-9B teaches it how to act — structured tool use, multi-turn agent workflows, and function calling, all grounded in the reasoning depth from Stage 1.

Stage 1Harmonic-9B: Heavy reasoning fine-tune on privately generated, structurally validated data. Every row passes strict quality gates. The thinking backbone.

Stage 2 (this model): Agentic fine-tune on hermes-agent-traces-filtered — 3,679 structurally validated agent traces with deep reasoning, tool calling, and multi-turn workflows.

Usage

pip install mlx-lm

# Generate
mlx_lm.generate --model DJLougen/Harmonic-Hermes-9B-MLX-8bit --prompt "Use the available tools to..."

# Chat
mlx_lm.chat --model DJLougen/Harmonic-Hermes-9B-MLX-8bit

Python API

from mlx_lm import load, generate

model, tokenizer = load("DJLougen/Harmonic-Hermes-9B-MLX-8bit")
response = generate(model, tokenizer, prompt="Use the available tools to check the weather.", max_tokens=512)
print(response)

Reasoning + Tool Use

The model uses <think> blocks for reasoning before acting:

<think>
The user wants to check the weather in Toronto. I have a get_weather tool available.
Let me call it with the right parameters...
</think>

<tool_call>
{"name": "get_weather", "arguments": {"location": "Toronto, Canada"}}
</tool_call>

How Our Training Data Compares

Quality Comparison

Quality Comparison

Metrics Summary

Metrics Summary

Metric Harmonic Traces (ours) Carnice GLM-5 (kai-os)
Rows 3,679 1,627
Source model Multiple frontier models GLM-5 via OpenRouter
Think block depth 581 words avg 40 words avg
Self-correction 63.0% 29.7%
Verification 95.9% 63.7%
Alternative exploration 43.7% 51.3%
Valid JSON (all tool calls) 100% 100%
Tool calls per conversation 18.5 5.4
Messages per conversation 32.1 12.1
Multi-turn (>5 messages) 97.8% 89.6%

Reasoning Flow

Reasoning Flow

Conversation Structure

Conversation Structure

Category Distribution

Categories

Training data: DJLougen/hermes-agent-traces-filtered

What This Model Does

  • Tool calling / function calling — structured JSON tool use in the Hermes agent format
  • Multi-turn agent workflows — maintains coherent state across extended tool-use conversations
  • Reasoning-grounded decisions — inherits Harmonic-9B's self-correction, verification, and exploration before committing to actions

Architecture

  • Base: Harmonic-9B (Stage 1 reasoning fine-tune of Qwen 3.5 9B)
  • Parameters: 9.65B
  • Training: LoRA fine-tuning, merged into base weights
  • Context: 8192 tokens

License

Apache 2.0 — same as the base model. Fully commercial use permitted.

Downloads last month
66
Safetensors
Model size
9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Harmonic-Hermes-9B-MLX-8bit

Finetuned
Qwen/Qwen3.5-9B
Quantized
(5)
this model

Dataset used to train DJLougen/Harmonic-Hermes-9B-MLX-8bit