Qwen3.5-122B-A10B-Opus-Reasoning-GGUF

The first Claude Opus-distilled reasoning fine-tune of Qwen3.5-122B at full scale. Enhanced multi-step reasoning, analytical depth, and uncensored output — trained where competitors can't reach.

122B total parameters, 10B active per token (Mixture-of-Experts). LoRA fine-tuned on 12,840 Claude Opus 4.6 reasoning traces. 7 quantization levels from Q2_K to BF16.

Forged on 8×H200 SXM5 | 1.1TB VRAM


Why This Model

Base Qwen3.5-122B Jackrong (27B) TIMTEH (this)
Scale 122B/10B active 27B dense 122B/10B active
Training data Base alignment Opus distillation Opus distillation
Reasoning quality Standard Enhanced (small scale) Enhanced (full MoE scale)
Uncensored
Hardware required to train Any Consumer GPU 8×H200 (1.1TB VRAM)

Nobody else has fine-tuned Qwen3.5-122B on Opus reasoning data. Jackrong stopped at 27B because they don't have the hardware. We do.


Quantizations

Quant File Size BPW RAM Required Use Case
BF16 ...-BF16.gguf 228 GB 16.0 ~235 GB Maximum quality, reference
Q8_0 ...-Q8_0.gguf 121 GB 8.5 ~125 GB Near-lossless, high-VRAM setups
Q6_K ...-Q6_K.gguf 94 GB 6.6 ~98 GB Excellent quality
Q5_K_M ...-Q5_K_M.gguf 81 GB 5.7 ~85 GB Great balance
Q4_K_M ...-Q4_K_M.gguf 70 GB 4.9 ~74 GB Recommended — best quality/size
Q3_K_M ...-Q3_K_M.gguf 55 GB 3.9 ~58 GB Fits 2×48GB GPUs
Q2_K ...-Q2_K.gguf 42 GB 2.9 ~45 GB Single 48GB GPU

Training Details

Parameter Value
Base Model Qwen/Qwen3.5-122B-A10B
Method LoRA (r=64, alpha=128, dropout=0.05)
Trainable Parameters 66.8M / 122.1B (0.05%)
Training Samples 12,840
Epochs 2
Steps 1,266
Final Avg Loss 0.1502
Training Time 6 hours 34 minutes
Hardware 8× NVIDIA H200 SXM5 (141GB HBM3e each, NVLink 478 GB/s)
Precision BF16 (full, no quantized training)
Effective Batch Size 64
Learning Rate Cosine schedule, peak 2e-4
Max Sequence Length 4096

Training Datasets

Dataset Samples Source
opus-10000x 9,633 Claude Opus 4.6 reasoning traces (10K filtered)
opus-3000x 2,326 Claude Opus 4.6 reasoning traces (3K filtered)
reasoning-700x 633 Qwen3.5 reasoning samples
high-reasoning-250x 250 High-quality Opus reasoning (curated)

Architecture

  • Type: Qwen3_5MoeForCausalLM (Mixture-of-Experts)
  • Total Parameters: 122.1B
  • Active Parameters: ~10B per token
  • Hidden Size: 3,072
  • Layers: 48
  • Attention Heads: 32 (GQA)
  • Experts: 256 routed + shared expert, 10 active per token
  • Context Length: 131,072 tokens (default), extensible to 262K
  • Vocab Size: 248,320
  • Thinking Mode: Supports <think> tags for explicit chain-of-thought
  • License: Apache 2.0

Usage

llama.cpp

# Recommended: Q4_K_M
./llama-cli -m Qwen3.5-122B-A10B-Opus-Reasoning-Q4_K_M.gguf \
  -p "Analyze the following problem step by step:" \
  -n 2048 --temp 0.7 --top-p 0.9

# Server mode
./llama-server -m Qwen3.5-122B-A10B-Opus-Reasoning-Q4_K_M.gguf \
  --port 8080 --host 0.0.0.0 -c 65536

Ollama

ollama run timteh673/Qwen3.5-122B-A10B-Opus-Reasoning

LM Studio

Download the GGUF file and load in LM Studio. Supports thinking/non-thinking modes via enable_thinking in chat template.

Open WebUI / SillyTavern

Point your backend to a llama.cpp server running any quant. Full OpenAI-compatible API at /v1/chat/completions.

Recommended Settings

Setting Value Notes
Temperature 0.6–0.7 Reasoning tasks
Temperature 0.8–1.0 Creative tasks
Top-P 0.9
Min-P 0.05 Good alternative to Top-P
Context 32K+ Supports up to 131K
Thinking Enabled Use enable_thinking=True for best results

What's Different From Base

  • Enhanced reasoning chains — trained on 12,840 Opus-quality multi-step analytical traces
  • Better instruction following — deeper engagement with complex prompts
  • Uncensored — no refusal training, responds to all prompts
  • MoE efficiency — only 10B params active per token despite 122B total
  • Thinking mode — native <think> tag support for explicit chain-of-thought

Pipeline

Qwen3.5-122B-A10B (base)
  → LoRA fine-tune (r=64, 12,840 Opus traces, 8×H200, 6.5h)
  → Merge adapter into base weights
  → Convert to BF16 GGUF (llama.cpp, 879 tensors)
  → Quantize: Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, Q2_K

All steps executed natively in BF16 — no quantized training, no optimization hacks. When you have 1.1TB VRAM, you use it.


Model Provenance

  • Base: Qwen/Qwen3.5-122B-A10B (Apache 2.0)
  • Training Framework: transformers + PEFT + TRL (raw, no wrappers)
  • Quantization: llama.cpp (build 8c60b8a)
  • Hardware: 8×NVIDIA H200 SXM5 (IBM Cloud, 1.1TB VRAM total)

Also From TIMTEH

Model Status Description
Qwen3.5-397B-A17B-Uncensored-GGUF ✅ Live Abliterated 397B MoE — 7 quants
Mistral-Small-4-119B-Uncensored-GGUF ✅ Live First TIMTEH release — 7 quants
Nemotron-3-Super-120B-A12B-Uncensored-GGUF ✅ Live Benchmarked — 7 quants
Qwen3.5-397B Opus-Reasoning 🔥 Training Stage 2 fine-tune (same technique, 397B scale)

⚠️ Disclaimer

This model has been fine-tuned on uncensored reasoning data. It may generate content that is harmful, offensive, or inappropriate. Users are solely responsible for ensuring their use complies with applicable laws and ethical standards. Intended for research, testing, and controlled environments.


☕ Support This Work

Running 8×H200 GPUs isn't free. Every donation directly funds more open-weight model releases, better abliteration techniques, and pushing the frontier of what's possible with open models.

Buy Me A Coffee

Buy Me a Coffee QR Code

💎 Crypto Donations

Currency Address
BTC bc1p4q7vpwucvww2y3x4nhps4y4vekye8uwm9re5a0kx8l6u5nky5ucszm2qhh
ETH 0xe5Aa16E53b141D42458ABeEDb00a157c3Fea2108
SOL 9CXwjG1mm9uLkxRevdMQiF61cr6TNHSiWtFRHmUEgzkG

🏢 Enterprise & Custom Models

Need a custom 120B+ model aligned to your proprietary data? TIMTEH provides bespoke enterprise fine-tuning, abliteration, and deployment on 8×H200 SXM5.

  • Custom fine-tuning on your data (up to 400B+ parameters)
  • Private CARE abliteration (Phase 2 technique)
  • Deployment architecture consulting (tensor parallelism, speculative decoding)
  • Bespoke distillation datasets

📧 Contact: tim@timlex.co


Part of the TIMTEH Cognitive Preservation Foundry — surgical capability preservation at scale. ⚡ Forged on 8×NVIDIA H200 SXM5 | 1.1TB VRAM

Downloads last month
2,578
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF

Adapter
(2)
this model

Datasets used to train timteh673/Qwen3.5-122B-A10B-Opus-Reasoning-GGUF