GGUF Files for qwen3.5-moe-0.87B-d0.8B
These are the GGUF files for kshitijthakkar/qwen3.5-moe-0.87B-d0.8B.
This GGUF seems to be broken. This is according to my own testing. Feel free to try it out yourself, and please confirm or deny whether it's broken in the Community tab.
Downloads
| GGUF Link | Quantization | Description |
|---|---|---|
| Download | Q2_K | Lowest quality |
| Download | Q3_K_S | |
| Download | IQ3_S | Integer quant, preferable over Q3_K_S |
| Download | IQ3_M | Integer quant |
| Download | Q3_K_M | |
| Download | Q3_K_L | |
| Download | IQ4_XS | Integer quant |
| Download | Q4_K_S | Fast with good performance |
| Download | Q4_K_M | Recommended: Perfect mix of speed and performance |
| Download | Q5_K_S | |
| Download | Q5_K_M | |
| Download | Q6_K | Very good quality |
| Download | f16 | Full precision, don't bother; use a quant |
Note from Flexan
I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.
If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.
You can find more info about me and what I do here.
Qwen3.5 MoE 0.85B (from Qwen3.5-0.8B)
A Qwen3.5 Mixture-of-Experts model created via dual-source weight transfer:
- Backbone (attention, embeddings, vision, norms): from Qwen/Qwen3.5-0.8B
- MoE experts (routed + shared): from Qwen/Qwen3.5-35B-A3B (sliced 256->8 experts, bilinear resized)
Model Details
| Property | Value |
|---|---|
| Total Parameters | 854,386,752 (0.85B) |
| Active Parameters | 677,439,552 (0.68B) |
| Architecture | Qwen3.5 Hybrid MoE |
| Experts | 8 routed + 1 shared, top-2 |
| Hidden Size | 1024 |
| Layers | 24 (hybrid: DeltaNet + full attention) |
| Attention | GQA 8Q / 2KV, head_dim=256 |
| Context | 262,144 tokens |
| Vocab | 248,320 |
| Dtype | bfloat16 |
Design
Total MoE FFN parameters are approximately equal to the dense model's FFN parameters. The speed benefit comes from sparsity: only top-2 experts
- shared expert are active per token (~1/3 of total FFN).
Most weights are pre-trained (backbone from dense model, experts from 35B-A3B). Only the MoE dimension resize introduces noise, making this model suitable for fine-tuning at nominal cost.
Weight Transfer Sources
| Component | Source | Strategy |
|---|---|---|
| Embeddings, LM Head | Qwen/Qwen3.5-0.8B | Exact copy |
| Attention (Q/K/V/O, norms) | Qwen/Qwen3.5-0.8B | Exact copy |
| DeltaNet (linear attention) | Qwen/Qwen3.5-0.8B | Exact copy |
| Vision encoder | Qwen/Qwen3.5-0.8B | Exact copy |
| Layer norms | Qwen/Qwen3.5-0.8B | Exact copy |
| Routed experts | Qwen3.5-35B-A3B | Slice 256->8, bilinear resize |
| Shared expert | Qwen3.5-35B-A3B | Bilinear resize |
| Router | Qwen3.5-35B-A3B | Slice + resize |
License
Apache 2.0 (following source models)
- Downloads last month
- 2,551
2-bit
3-bit
4-bit
5-bit
6-bit
16-bit
Model tree for Flexan/kshitijthakkar-qwen3.5-moe-0.87B-d0.8B-GGUF
Base model
kshitijthakkar/qwen3.5-moe-0.87B-d0.8B