GGUF Files for qwen3.5-moe-0.87B-d0.8B

These are the GGUF files for kshitijthakkar/qwen3.5-moe-0.87B-d0.8B.

This GGUF seems to be broken. This is according to my own testing. Feel free to try it out yourself, and please confirm or deny whether it's broken in the Community tab.

Downloads

GGUF Link Quantization Description
Download Q2_K Lowest quality
Download Q3_K_S
Download IQ3_S Integer quant, preferable over Q3_K_S
Download IQ3_M Integer quant
Download Q3_K_M
Download Q3_K_L
Download IQ4_XS Integer quant
Download Q4_K_S Fast with good performance
Download Q4_K_M Recommended: Perfect mix of speed and performance
Download Q5_K_S
Download Q5_K_M
Download Q6_K Very good quality
Download f16 Full precision, don't bother; use a quant

Note from Flexan

I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.

If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.

You can find more info about me and what I do here.

Qwen3.5 MoE 0.85B (from Qwen3.5-0.8B)

A Qwen3.5 Mixture-of-Experts model created via dual-source weight transfer:

Model Details

Property Value
Total Parameters 854,386,752 (0.85B)
Active Parameters 677,439,552 (0.68B)
Architecture Qwen3.5 Hybrid MoE
Experts 8 routed + 1 shared, top-2
Hidden Size 1024
Layers 24 (hybrid: DeltaNet + full attention)
Attention GQA 8Q / 2KV, head_dim=256
Context 262,144 tokens
Vocab 248,320
Dtype bfloat16

Design

Total MoE FFN parameters are approximately equal to the dense model's FFN parameters. The speed benefit comes from sparsity: only top-2 experts

  • shared expert are active per token (~1/3 of total FFN).

Most weights are pre-trained (backbone from dense model, experts from 35B-A3B). Only the MoE dimension resize introduces noise, making this model suitable for fine-tuning at nominal cost.

Weight Transfer Sources

Component Source Strategy
Embeddings, LM Head Qwen/Qwen3.5-0.8B Exact copy
Attention (Q/K/V/O, norms) Qwen/Qwen3.5-0.8B Exact copy
DeltaNet (linear attention) Qwen/Qwen3.5-0.8B Exact copy
Vision encoder Qwen/Qwen3.5-0.8B Exact copy
Layer norms Qwen/Qwen3.5-0.8B Exact copy
Routed experts Qwen3.5-35B-A3B Slice 256->8, bilinear resize
Shared expert Qwen3.5-35B-A3B Bilinear resize
Router Qwen3.5-35B-A3B Slice + resize

License

Apache 2.0 (following source models)

Downloads last month
2,551
GGUF
Model size
1B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Flexan/kshitijthakkar-qwen3.5-moe-0.87B-d0.8B-GGUF

Quantized
(1)
this model