☕ Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. It's a hobby that got out of hand. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
Harmonic-Hermes-9B-MLX-8bit
8-bit MLX conversion of Harmonic-Hermes-9B for local inference on Apple Silicon with mlx-lm.
| Quantization | Size | Use Case |
|---|---|---|
| 8-bit | ~8.9 GB | Near-lossless quality, 16GB+ unified memory |
Other formats
| Format | Repo |
|---|---|
| GGUF (all quants) | Harmonic-Hermes-9B-GGUF |
| MLX 4-bit | Harmonic-Hermes-9B-MLX-4bit |
| MLX 8-bit | Harmonic-Hermes-9B-MLX-8bit |
| MLX BF16 | Harmonic-Hermes-9B-MLX-bf16 |
| Full weights | Harmonic-Hermes-9B |
Harmonic-Hermes-9B is the Stage 2 agentic fine-tune of Harmonic-9B — a dedicated tool-calling and agent model built on top of a strong reasoning backbone.
Where Harmonic-9B teaches the model how to think, Harmonic-Hermes-9B teaches it how to act — structured tool use, multi-turn agent workflows, and function calling, all grounded in the reasoning depth from Stage 1.
Stage 1 — Harmonic-9B: Heavy reasoning fine-tune on privately generated, structurally validated data. Every row passes strict quality gates. The thinking backbone.
Stage 2 (this model): Agentic fine-tune on hermes-agent-traces-filtered — 3,679 structurally validated agent traces with deep reasoning, tool calling, and multi-turn workflows.
Usage
pip install mlx-lm
# Generate
mlx_lm.generate --model DJLougen/Harmonic-Hermes-9B-MLX-8bit --prompt "Use the available tools to..."
# Chat
mlx_lm.chat --model DJLougen/Harmonic-Hermes-9B-MLX-8bit
Python API
from mlx_lm import load, generate
model, tokenizer = load("DJLougen/Harmonic-Hermes-9B-MLX-8bit")
response = generate(model, tokenizer, prompt="Use the available tools to check the weather.", max_tokens=512)
print(response)
Reasoning + Tool Use
The model uses <think> blocks for reasoning before acting:
<think>
The user wants to check the weather in Toronto. I have a get_weather tool available.
Let me call it with the right parameters...
</think>
<tool_call>
{"name": "get_weather", "arguments": {"location": "Toronto, Canada"}}
</tool_call>
How Our Training Data Compares
Quality Comparison
Metrics Summary
| Metric | Harmonic Traces (ours) | Carnice GLM-5 (kai-os) |
|---|---|---|
| Rows | 3,679 | 1,627 |
| Source model | Multiple frontier models | GLM-5 via OpenRouter |
| Think block depth | 581 words avg | 40 words avg |
| Self-correction | 63.0% | 29.7% |
| Verification | 95.9% | 63.7% |
| Alternative exploration | 43.7% | 51.3% |
| Valid JSON (all tool calls) | 100% | 100% |
| Tool calls per conversation | 18.5 | 5.4 |
| Messages per conversation | 32.1 | 12.1 |
| Multi-turn (>5 messages) | 97.8% | 89.6% |
Reasoning Flow
Conversation Structure
Category Distribution
Training data: DJLougen/hermes-agent-traces-filtered
What This Model Does
- Tool calling / function calling — structured JSON tool use in the Hermes agent format
- Multi-turn agent workflows — maintains coherent state across extended tool-use conversations
- Reasoning-grounded decisions — inherits Harmonic-9B's self-correction, verification, and exploration before committing to actions
Architecture
- Base: Harmonic-9B (Stage 1 reasoning fine-tune of Qwen 3.5 9B)
- Parameters: 9.65B
- Training: LoRA fine-tuning, merged into base weights
- Context: 8192 tokens
License
Apache 2.0 — same as the base model. Fully commercial use permitted.
- Downloads last month
- 66
8-bit
Model tree for DJLougen/Harmonic-Hermes-9B-MLX-8bit
Base model
Qwen/Qwen3.5-9B-Base




