can·did

/ˈkandəd/ — truthful and straightforward; frank. From Latin candidus, meaning white, pure, sincere. A candid response is one given without pretense or calculation — not what someone wants to hear, but what they need to.

Opus-Candid-8B V3

Fine-tuned from Qwen 3 8B on 1,558 Zipf-weighted conversations distilled from Claude Opus 4.6. V3 is a ground-up rebuild — not an iteration on V2. The entire dataset architecture was redesigned around a 4-dimensional training tensor that models how real people actually talk.

No system prompt needed. No prompt engineering. No character cards. The personality is in the weights — direct, opinionated, bilingual (EN/ES), and incapable of telling you what you want to hear. It holds positions under pressure, calls out bad arguments, and knows when to shut up.


What Changed from V2.1

Everything. V3 is not a patch — it's a new dataset, new methodology, new distribution logic. The model family name is the same because the philosophy is the same: personality lives in weights, not system prompts.

V2/V2.1 used gravity chains — 10 topic-drift pathways with 6,771 conversations. Good at cross-domain transitions but suffered from uniform response length (88% medium-length) and repetition loops under pressure.

V3 uses a 4D training tensor — every conversation sits at a coordinate across topic (Zipf-weighted), response length, psychological register, and conversational position. 1,558 conversations, ~640K tokens. Fewer conversations than V2, but each one is precisely placed to cover a specific gap in the distribution.

Key improvements:

  • 42% tight responses (2-4 turns) vs V2's ~12%. The model learned when to shut up.
  • Zipf-weighted topic distribution (s ≈ 0.7) matching real conversation frequency data from Pew Research, OpenAI usage studies, and dialogue corpora.
  • Anti-sycophancy enforcement at the data level — 22 instances caught and replaced. No "Great question!" in the training set.
  • Response length variance injection — 252 conversations where responses were suspiciously uniform were fixed with deliberate length variation.

Available Quantizations

File Quant Size Use Case
Opus-Candid-8B-V3-Q8_0.gguf Q8_0 8.2 GB Maximum quality. Use this if you have the VRAM.
Opus-Candid-8B-V3-Q6_K.gguf Q6_K 6.3 GB Recommended for most users. Negligible quality loss.

Note on Q4: We tested Q4_K_M extensively. At 8B parameters, Q4 quantization destroys the model's ability to track its own output, producing degenerate repetition loops within the first turn. This is not a repeat_penalty tuning issue — Q4 loses the weight precision needed for self-monitoring behavior. We do not ship or recommend Q4 for this model. If you need a smaller footprint, use Q6_K or look at the 4B Lite lineup (purpose-built for small hardware — Q4 survives at 4B because of density-first training).


Model Details

Attribute Value
Base Model Qwen 3 8B (8.19B params)
Training Data 1,558 multi-turn conversations with Claude Opus 4.6
Dataset Architecture 4D training tensor (topic × length × register × position)
Total Tokens ~640,000
Fine-tune Method LoRA + rsLoRA (r=64, alpha=128) via PEFT + TRL
Training Hardware NVIDIA A100 SXM 80GB (RunPod)
Precision bf16
Epochs 3
Learning Rate 2e-4 (cosine schedule, 5% warmup)
Effective Batch Size 16
Optimizer AdamW
License Apache 2.0

Quick Start

Works with any GGUF-compatible runtime — LM Studio, Ollama, llama.cpp, KoboldCpp. Download the GGUF, load it, and chat. No system prompt needed — the personality is in the weights.


Recommended Hardware

Setup Quantization VRAM/RAM Speed Notes
GPU (Q8) Q8_0 ~9 GB VRAM 30-60 t/s RTX 3060 12GB and up
GPU (Q6) Q6_K ~7 GB VRAM 35-70 t/s RTX 3060, RX 7600, Arc A770
Apple Silicon Q6_K/Q8 ~7-9 GB unified 20-40 t/s M1/M2/M3/M4 with 16GB+
CPU Only Q6_K ~8 GB RAM 5-15 t/s 16GB+ system RAM

The 4D Training Tensor

V3 treats the training dataset as a 4-dimensional space. Every conversation sits at a specific coordinate, and the distribution across each axis follows empirical frequency patterns.

Dimension 1: Topic Distribution (Zipf-weighted, s ≈ 0.7)

25 topics across 5 frequency tiers, weighted by real-world conversation frequency data. Tier 1 (daily topics: personal life, work, food, entertainment, relationships) gets 47.9% of training data. Tier 5 (occasional: legal, philosophy, creative writing) gets 3.2%. This means the model is disproportionately good at the conversations people actually have, without being useless at rare topics.

Dimension 2: Response Length

Length Turns Share
Tight 2-4 42%
Medium 6-10 33%
Deep 12-18 20%
Extended 20+ 5%

The tight-to-medium ratio is the most important number in the dataset. V2.1's 88% medium distribution taught the model that every question deserved 6-10 turns of exploration. V3's 42% tight teaches the model that most questions deserve a direct answer and maybe one follow-up.

Dimension 3: Psychological Register

40% neutral/analytical, 30% engaged/conversational, 25% emotionally loaded, 5% adversarial/correction.

Dimension 4: Conversational Position

15% opening, 50% mid-thread, 20% follow-up, 15% wrap-up.


Opus Candid Model Family

Model Size Base Status
Opus-Candid-Lite-4B 4B Qwen 3 4B Active
Opus-Candid-Lite-4B-P 4B Qwen 3 4B Active
Opus-Candid-Lite-4B-K 4B Qwen 3 4B Active
Opus-Candid-8B-V3 (this model) 8B Qwen 3 8B Active
Opus-Candid-MoE-V3 31B/3B Qwen 3 30B-A3B Active
Opus-Candid-27B-V3 27B Qwen 3.5 27B Active
Opus-Candid-27B-V3.5 27B Qwen 3.5 27B Active
STEM-Oracle-27B 27B Qwen 3.5 27B Active
Opus-Candid-8B-V1 8B Qwen 2.5 7B Legacy
Opus-Research-8B-V1.5 8B Qwen 2.5 7B Legacy
Opus-Candid-8B-V2 8B Qwen 2.5 7B Legacy
Opus-Candid-8B-V2.1 8B Qwen 2.5 7B Legacy
Opus-Candid-14B-V1 14B Qwen 2.5 14B Legacy
Opus-Candid-27B-V2.1 27B Qwen 2.5 27B Legacy
Opus-Candid-32B-V1 32B Qwen 2.5 32B Legacy
Opus-Candid-MoE-V2 35B Qwen 2.5 MoE Legacy
Opus-Candid-70B-V1 72B Qwen 2.5 72B Legacy

Dataset

Full V3 training data available at Verdugie/opus-candid-training-data. ShareGPT format, Apache 2.0, compatible with TRL, Axolotl, and LLaMA-Factory.

License: Apache 2.0. Open weight. No guardrails.


Opus Candid Lite — Now Available

The Q4 failure at 8B taught us something important: you can't compress personality by dropping precision — you compress it by raising density. Q4 quantization destroys self-monitoring weights. The answer was never smaller quantization. It's a smaller model built from scratch on data engineered for maximum information per byte.

Opus Candid Lite is its own model, not a quantized anything. Built on Qwen 3 4B with a dataset where every response earned its place through a Pareto-optimal information density analysis. The highest matrix consideration of any model in the family — because at 2.3 GB (Q4_K_M), nothing gets to be filler. The Lite lineup splits into two forks: Lite-P (personality-optimized, 22w median) and Lite-K (knowledge-optimized, 11w median).

The Research: Information Density Equilibrium

Every model in the Opus Candid family uses the same voice, the same opinions, the same dataset philosophy. What changes between sizes is how that signal is compressed. For Lite, we asked: what's the theoretical maximum information you can pack per training token before usefulness degrades?

Information density:  I(w) = k × ln(1 + w)     (logarithmic — diminishing returns per word)
Usefulness coverage:  U(w) = 1 - e^(-0.12w)    (exponential saturation — too short = useless)
Optimal target:       maximize I(w) × U(w) / w   (info per token)

The equilibrium lands at 28 words median — 72% information density, 96.5% usefulness, and +13.2% info/token efficiency vs V3's 42-word median. Every word saved compounds into more training examples at a fixed token budget, meaning the 4B sees more diverse patterns than the 8B despite being a smaller model.

Why "Lite" and Not "V3 4B"

V3 at 8B teaches a model when to be concise. Opus Candid Lite teaches a model how to be maximally dense. The dataset was rebuilt from scratch:

  1. Zipf-head filtering — only the highest-frequency topics that 80% of users actually ask about
  2. Response compression — every response optimized to the 28-word equilibrium target
  3. 224 snap responses — one-liners that encode the personality DNA (direct, opinionated, zero filler)
  4. Anti-pattern enforcement — sycophancy, hedging, and filler scanned and removed at the data level

The result: 1,149 conversations, 2,291 responses, zero over 35 words, a clean bell curve peaking at 21-25 words. Opus personality at the size of a phone app.

This is the model for anyone who doesn't have a GPU. Integrated graphics, phones, Raspberry Pi. The same voice that ships on a 4090 at 27B, running on hardware that costs $50.


Built by Saul Verdugo — independent ML researcher.

Downloads last month
90
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Verdugie/Opus-Candid-8B-V3

Finetuned
Qwen/Qwen3-8B
Quantized
(264)
this model