Why not quantize the MATRICES of Wq, Wk, Wv, Wo?

#5
by BeetSoup - opened

The weights \model.language_model.layers.\d+.self_attn.[qkvo]_proj.weight\ make up 28.08% of the model's parameters. However, by keeping them in bf16, u make the model achieves about 8bpw in avg.

Sign up or log in to comment