Important Note
For IQ_KS DO NOT use mainline llama.cpp,ollama or anything that use mainline llama.cpp backend use ik_llama.cpp instead.
Q_K_M is fine thought.
Still uploading BTW!
Quantization using ik_llama.cpp 6ea7f32
Calibration data by Bartowski thank you legends!
Perplexity test using Wikitext-2 test.raw
- BF16 - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2671 +/- 0.04039
- Q6_K - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2376 +/- 0.04001
- Q5_K_M - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2564 +/- 0.04021
- Q4_K_M - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2901 +/- 0.04049
- IQ4_KS - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.2921 +/- 0.04055
- Q3_K_M - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.4269 +/- 0.04165
- IQ3_KS - Final estimate: PPL over 72 chunks for n_ctx=4096 = 6.4566 +/- 0.04177
Holy Shit!! Why is it so low thought?! That's Lossless!! (Might got butchered in creative field tho)
Note these quant model is not coherence (perhaps for draft model? or maybe with proper system prompt could work? haven't tried instruct too):
- IQ2_XS Final estimate: PPL over 72 chunks for n_ctx=4096 = 7.3814 +/- 0.04912 (Even with custom quant recipe)
Dunno what's going on somehow the Q5_K_M preplexity is lower than BF16, need to investigate.
Okay so i think... prunning noise by using imatrix calibration make the models less uncertain on making word decision that mean... yes the model are more deterministic perhaps less creative? unconfirmed but more focused?? I have no idea!
So in theory you could make your own custom calibration data that worked almost like a lora except instead of adding data you're only keeping those more aligned with your goals and discard the rest.
Maybe i'm wrong perhaps it's related to QAT or something... have no idea!
- Downloads last month
- 7,856