vllm error
#2
by huang123chuan - opened
ValueError: The output_size of gate's and up's weight = 704 is not divisible by weight quantization block_n = 128.
I am also getting similar errors. I am trying to load the weights on 2 gpus.
Got the same error.
(EngineCore pid=95) ERROR 04-06 10:05:58 [core.py:1108] ValueError: The output_size of gate's and up's weight = 704 is not divisible by weight quantization block_n = 128.