Why not quantize the MATRICES of Wq, Wk, Wv, Wo?

by BeetSoup - opened 3 days ago

The weights \model.language_model.layers.\d+.self_attn.[qkvo]_proj.weight\ make up 28.08% of the model's parameters. However, by keeping them in bf16, u make the model achieves about 8bpw in avg.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment