Quantization
#5
by NukeNotNull - opened
What quantization does the chat.z.ai site use?
What quantization does the chat.z.ai site use?
Its deff gotta be lower than FP8, or just using really shitty KV.
Model performance just completely degrades after 100k tokens on there platform, same with glm 5, yet other providers don't have the same issues.
z.ai still hasn't addressed this like at all.
Stop whining already π I'm glad that it's OSS! And it really awesome model (I'm using unsloth's quantization).