GGUF Files for qwen3.5-moe-0.87B-d0.8B

These are the GGUF files for kshitijthakkar/qwen3.5-moe-0.87B-d0.8B.

This GGUF seems to be broken. This is according to my own testing. Feel free to try it out yourself, and please confirm or deny whether it's broken in the Community tab.

Downloads

GGUF Link	Quantization	Description
Download	Q2_K	Lowest quality
Download	Q3_K_S
Download	IQ3_S	Integer quant, preferable over Q3_K_S
Download	IQ3_M	Integer quant
Download	Q3_K_M
Download	Q3_K_L
Download	IQ4_XS	Integer quant
Download	Q4_K_S	Fast with good performance
Download	Q4_K_M	Recommended: Perfect mix of speed and performance
Download	Q5_K_S
Download	Q5_K_M
Download	Q6_K	Very good quality
Download	f16	Full precision, don't bother; use a quant

Note from Flexan

I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.

If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.

You can find more info about me and what I do here.

Qwen3.5 MoE 0.85B (from Qwen3.5-0.8B)

A Qwen3.5 Mixture-of-Experts model created via dual-source weight transfer:

Backbone (attention, embeddings, vision, norms): from Qwen/Qwen3.5-0.8B
MoE experts (routed + shared): from Qwen/Qwen3.5-35B-A3B (sliced 256->8 experts, bilinear resized)

Model Details

Property	Value
Total Parameters	854,386,752 (0.85B)
Active Parameters	677,439,552 (0.68B)
Architecture	Qwen3.5 Hybrid MoE
Experts	8 routed + 1 shared, top-2
Hidden Size	1024
Layers	24 (hybrid: DeltaNet + full attention)
Attention	GQA 8Q / 2KV, head_dim=256
Context	262,144 tokens
Vocab	248,320
Dtype	bfloat16

Design

Total MoE FFN parameters are approximately equal to the dense model's FFN parameters. The speed benefit comes from sparsity: only top-2 experts

shared expert are active per token (~1/3 of total FFN).

Most weights are pre-trained (backbone from dense model, experts from 35B-A3B). Only the MoE dimension resize introduces noise, making this model suitable for fine-tuning at nominal cost.

Weight Transfer Sources

Component	Source	Strategy
Embeddings, LM Head	Qwen/Qwen3.5-0.8B	Exact copy
Attention (Q/K/V/O, norms)	Qwen/Qwen3.5-0.8B	Exact copy
DeltaNet (linear attention)	Qwen/Qwen3.5-0.8B	Exact copy
Vision encoder	Qwen/Qwen3.5-0.8B	Exact copy
Layer norms	Qwen/Qwen3.5-0.8B	Exact copy
Routed experts	Qwen3.5-35B-A3B	Slice 256->8, bilinear resize
Shared expert	Qwen3.5-35B-A3B	Bilinear resize
Router	Qwen3.5-35B-A3B	Slice + resize

License

Apache 2.0 (following source models)

Downloads last month: 2,551

GGUF

Model size

1B params

Architecture

qwen35moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

16-bit

Model tree for Flexan/kshitijthakkar-qwen3.5-moe-0.87B-d0.8B-GGUF

Base model

kshitijthakkar/qwen3.5-moe-0.87B-d0.8B

Quantized

(1)

this model