Important Note

For IQ_KS DO NOT use mainline llama.cpp,ollama or anything that use mainline llama.cpp backend use ik_llama.cpp instead.

Q_K_M is fine thought.

Still uploading BTW!

Quantization using ik_llama.cpp 6ea7f32

Calibration data by Bartowski thank you legends!


Perplexity test using Wikitext-2 test.raw

  • BF16 - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2671 +/- 0.04039
  • Q6_K - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2376 +/- 0.04001
  • Q5_K_M - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2564 +/- 0.04021
  • Q4_K_M - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2901 +/- 0.04049
  • IQ4_KS - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2921 +/- 0.04055
  • Q3_K_M - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.4269 +/- 0.04165
  • IQ3_KS - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.4566 +/- 0.04177

Holy Shit!! Why is it so low thought?! That's Lossless!! (Might got butchered in creative field tho)


Note these quant model is not coherence (perhaps for draft model? or maybe with proper system prompt could work? haven't tried instruct too):

  • IQ2_XS Final estimate: PPL over 72 chunks for n_ctx=4096 = 7.3814 +/- 0.04912 (Even with custom quant recipe)

Dunno what's going on somehow the Q5_K_M preplexity is lower than BF16, need to investigate.

Okay so i think... prunning noise by using imatrix calibration make the models less uncertain on making word decision that mean... yes the model are more deterministic perhaps less creative? unconfirmed but more focused?? I have no idea!

So in theory you could make your own custom calibration data that worked almost like a lora except instead of adding data you're only keeping those more aligned with your goals and discard the rest.

Maybe i'm wrong perhaps it's related to QAT or something... have no idea!

Downloads last month
7,856
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for h34v7/Jackrong-Qwopus3.5-27B-v3-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(23)
this model