Quantization

by NukeNotNull - opened about 8 hours ago

Discussion

NukeNotNull

about 8 hours ago

What quantization does the chat.z.ai site use?

Gavvvin

about 7 hours ago

What quantization does the chat.z.ai site use?

Its deff gotta be lower than FP8, or just using really shitty KV.
Model performance just completely degrades after 100k tokens on there platform, same with glm 5, yet other providers don't have the same issues.

z.ai still hasn't addressed this like at all.

AImhotep

about 5 hours ago

Stop whining already 😆 I'm glad that it's OSS! And it really awesome model (I'm using unsloth's quantization).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment