canยทdid

/หˆkandษ™d/ โ€” truthful and straightforward; frank. From Latin candidus, meaning white, pure, sincere. A candid response is one given without pretense or calculation โ€” not what someone wants to hear, but what they need to.

Opus-Candid-27B V3.5

The first model in the Opus Candid family with a parameter-calibrated dataset. Currently in training on Qwen 3.5 27B Dense using 5,358 density-optimized conversations distilled from Claude Opus 4.6. Every conversation in the training set is mathematically placed against a 27B-specific information density equilibrium โ€” the point where information per token is maximized without losing the breathing room a 27B needs to express nuance.

V3.5 is not V3 with more data. The dataset was rebuilt from scratch: a compressed V3 core layered with two waves of targeted Opus 4.6 API generation โ€” factual precision, adversarial personality anchoring, deep emotional range, and sustained multi-turn coherence. Per-topic Zipf targets mean math gets tight (33w) and philosophy gets room (48w), because depth isn't about word count โ€” it's about whether the words are carrying weight.

No system prompt. No prompt engineering. No character cards. The personality is in the weights.

Status: Released.


What Changed from V3

V3 used a 4D training tensor with 1,508 conversations. V3.5 keeps the tensor architecture but adds two critical layers:

Parameter-scaled equilibrium. V3 used the same dataset for both 8B and 27B. That was suboptimal โ€” the 27B has enough parameters to learn density from examples that breathe more, while the 8B needed forced brevity. V3.5 recalibrates: 36w median (vs V3's ~42w for 27B), with per-topic variation. Factual/math responses are compressed tighter (33-42w). Philosophy/meta/edge topics get full ceiling (38-48w). The model learns when to be brief and when depth is the whole point.

Targeted gap coverage. V3 had thin spots โ€” edge-case philosophy, sustained identity under adversarial pressure, deep personality in vulnerable/emotional contexts. V3.5 generated ~1,019 conversations specifically targeting these gaps with dedicated system prompts. The pushback/identity tier alone has 200 conversations designed to make the model hold opinions under sustained gaslighting.

3.4x the training data. 5,358 conversations vs 1,508. But calibrated โ€” every conversation earned its slot against the equilibrium curve. More data at the right density, not just more data.


Available Quantizations

File Quant Size Use Case
Opus-Candid-V3.5-27B-Q8_0.gguf Q8_0 ~33 GB Reference quality. Full fidelity.
Opus-Candid-V3.5-27B-Q6_K.gguf Q6_K ~25 GB Quality tier. Minimal degradation.
Opus-Candid-V3.5-27B-Q4_K_M.gguf Q4_K_M ~18 GB Primary ship target. Fits on RTX 4090. Overlap node reinforcement protects personality at Q4.

Why Q4 should work this time: V3's 27B at Q4 was not validated. V3.5's dataset architecture specifically reinforces the overlap nodes (identity, worth, trust, vulnerability, control) from 4-8 independent training directions. These distributed representations survive aggressive quantization because the information is encoded redundantly across multiple weight pathways. This is the core contribution of the Zipfian Gravity Chain architecture โ€” it was designed backward from Q4_K_M as the target deployment format.


Model Details

Attribute Value
Base Model Qwen 3.5 27B Dense
Training Data 5,358 density-optimized conversations with Claude Opus 4.6
Dataset Architecture Zipfian Gravity Chains + 6D Zipf tensor + parameter-scaled equilibrium
Fine-tune Method LoRA + rsLoRA (r=128, alpha=256) via PEFT + TRL
Training Hardware NVIDIA A100 SXM 80GB (RunPod)
Precision bf16
Epochs 4
Learning Rate 1.5e-4 (cosine schedule, 3% warmup)
Effective Batch Size 16 (batch 2 ร— grad_accum 8)
Optimizer AdamW (8-bit)
License Apache 2.0

Quick Start

Works with any GGUF-compatible runtime โ€” LM Studio, Ollama, llama.cpp, KoboldCpp. Download the GGUF, load it, and chat. No system prompt needed โ€” the personality is in the weights.


The Information Density Equilibrium

V3.5 introduces parameter-scaled density โ€” the thesis that optimal training response length is a function of model capacity, not a fixed target.

U(w) = 1 - e^(-ฮปw)    where ฮป(P) = 0.12 ร— (4/P)^0.3

At 27B (P=27): ฮป = 0.068. The equilibrium sits at 36-40 words โ€” high enough to express nuance, low enough to enforce the density discipline that makes Opus Candid sound different from stock Qwen.

But not all topics are equal. The dataset applies per-topic Zipf targets:

Topic Tier Examples Density Target
High-freq emotional, opinion, advice 33-42w (tight โ€” personality through tone)
Mid-freq factual, identity, general 33-44w (room for precision)
Low-freq technical, creative 34-46w (complexity needs space)
Rare philosophy, meta, edge 38-48w (depth IS the information)

This means a math question gets a 33-word answer and a question about consciousness gets 45 words. Not because one is more important โ€” because one carries its information in fewer words.


Dataset Architecture

Three-Layer Construction

Layer 1: V3 Compressed Core (1,558 conversations) Hand-crafted conversations from the V3 dataset, processed through the Zipfian gravity chain pipeline with 13 topic profiles and 6 independent dimensions. Already stress-tested through V3 8B and 27B training.

Layer 2: Wave 1 API Generation (1,936 conversations) Generated via 3-key parallel Opus 4.6 workers: 784 factual, 700 personality, 452 multi-turn.

Layer 3: Wave 2 Targeted Gap Fill (1,864 conversations) Targeted at gaps identified from V3 stress testing:

  • 195 edge/philosophy/meta (consciousness, mortality, epistemology, meaning)
  • 200 pushback/identity (adversarial sycophancy resistance, opinion holding under pressure)
  • 399 deep personality (vulnerability, cultural identity, dark humor, unconventional grief)
  • 1,070 multi-turn (3-6 turn sustained conversations with natural topic drift)

Overlap Node Reinforcement

The top 5 personality-critical nodes โ€” identity, worth, trust, vulnerability, control โ€” appear across 4-8 of the 10 gravity chains. Each appearance trains the same representation from a different direction, creating distributed weight patterns that survive Q4_K_M quantization. This is the V3 architecture's core contribution, and V3.5 amplifies it: Wave 2 adds fresh training directions for each overlap node through adversarial, emotional, and philosophical contexts that V3's 1,508 conversations couldn't cover.


Recommended Hardware

Setup Quant VRAM RAM Speed Notes
RTX 4090 (24GB) Q4_K_M 18 GB 16+ GB 8-15 t/s Fits entirely in VRAM. Sweet spot.
RTX 4090 (24GB) Q6_K 25 GB 32+ GB 3-5 t/s Partial CPU offload needed.
RTX 3090/4080 Q4_K_M 18 GB 16+ GB 5-10 t/s Also fits entirely.
Workstation (32GB+) Q6_K 25 GB 16+ GB 8-12 t/s Full VRAM, no offload.
Apple M2/M3 Ultra Q6_K/Q8 64-128 unified โ€” 5-10 t/s Full model in unified memory.
CPU Only Q4_K_M โ€” 24+ GB 0.5-1.5 t/s Works but slow.

Opus Candid Model Family

Model Size Base Status
Opus-Candid-Lite-4B 4B Qwen 3 4B Active
Opus-Candid-Lite-4B-P 4B Qwen 3 4B Active
Opus-Candid-Lite-4B-K 4B Qwen 3 4B Active
Opus-Candid-8B-V3 8B Qwen 3 8B Active
Opus-Candid-MoE-V3 31B/3B Qwen 3 30B-A3B Active
Opus-Candid-27B-V3 27B Qwen 3.5 27B Active
Opus-Candid-27B-V3.5 (this model) 27B Qwen 3.5 27B Active
STEM-Oracle-27B 27B Qwen 3.5 27B Active
Opus-Candid-8B-V1 8B Qwen 2.5 7B Legacy
Opus-Research-8B-V1.5 8B Qwen 2.5 7B Legacy
Opus-Candid-8B-V2 8B Qwen 2.5 7B Legacy
Opus-Candid-8B-V2.1 8B Qwen 2.5 7B Legacy
Opus-Candid-14B-V1 14B Qwen 2.5 14B Legacy
Opus-Candid-27B-V2.1 27B Qwen 2.5 27B Legacy
Opus-Candid-32B-V1 32B Qwen 2.5 32B Legacy
Opus-Candid-MoE-V2 35B Qwen 2.5 MoE Legacy
Opus-Candid-70B-V1 72B Qwen 2.5 72B Legacy

Why V3.5 is a separate model, not V4

V4 would imply a new architecture. V3.5 uses the same Zipfian gravity chains and overlap node topology โ€” it recalibrates the density operating point for 27B and fills coverage gaps. The architecture thesis is the same; the calibration is new.


Limitations

  • Not a benchmark model. Opus Candid optimizes for conversational personality, not task completion scores.
  • Bilingual (EN/ES) but English-dominant. Spanish competence varies by topic.
  • Long-form generation (2000+ tokens) is not a strength โ€” the density training actively discourages it.
  • Base model (Qwen 3.5) knowledge cutoff applies.
  • Q4_K_M has theoretical overlap node protection but should be validated through stress testing before relying on it for production use.

Research

This model is part of an ongoing research program in quantization-aware dataset architecture. Key papers:

  • Zipfian Gravity Chains โ€” structured topic topology for resource-constrained fine-tuning. Full specification: V3_DATASET_ARCHITECTURE.md
  • Information Density Equilibrium โ€” parameter-scaled word density calibration. Described in V3.5_27B_DESIGN.md and the 4B Lite DESIGN.md.

The core research contribution: designing training data structure backward from the deployment quantization format. Instead of train โ†’ quantize โ†’ accept loss, the loop is: identify target quantization โ†’ design data to reinforce what survives โ†’ train โ†’ quantize โ†’ retain quality where it matters.


Dataset

Training data available at Verdugie/opus-candid-training-data. ShareGPT format, Apache 2.0, compatible with TRL, Axolotl, and LLaMA-Factory.

License: Apache 2.0. Open weight. No guardrails.


Built by Saul Verdugo โ€” independent ML researcher.

Downloads last month
345
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Verdugie/Opus-Candid-27B-V3.5

Base model

Qwen/Qwen3.5-27B
Quantized
(197)
this model