glm-5-381-reap-w3a16

This repository contains the W3A16 AutoRound quantization of the 50% REAP-pruned GLM-5 checkpoint.

Checkpoint

  • Base family: GLM-5
  • Architecture: GlmMoeDsaForCausalLM
  • Total parameters: 381,464,351,232
  • Source prune: refusal_contrast_reap, compression ratio 0.50, seed 42, router renormalization true
  • Quantization method: AutoRound
  • Quantization scheme: W3A16
  • Group size: 128
  • Calibration dataset: NeelNanda/pile-10k
  • Calibration samples: 128
  • Sequence length: 1024
  • Iterations per block: 50

Output

  • Saved model shards: 29
  • Quantized tensors: 29,571 / 29,659
  • Quantization config file: quantization_config.json

Intentionally Unquantized

  • lm_head
  • model.layers.[0-2].mlp.down_proj
  • model.layers.[0-2].mlp.gate_proj
  • model.layers.[0-2].mlp.up_proj
  • model.layers.[0-77].self_attn.indexer.weights_proj

Provenance

  • Quantized artifact path: /data0/external_research/glm5-autoround/full/glm5-reap-50pct-w3a16-pile10k-20260405T182123Z/output/layerwise_refusal_contrast_reap-renorm_true-seed_42-0.50-w3g128
  • Quantization log: /data0/external_research/glm5-autoround/full/glm5-reap-50pct-w3a16-pile10k-20260405T182123Z/quant.log

Notes

  • The source checkpoint for this quantization is the BF16 50% REAP GLM-5 artifact.
  • AutoRound reported total tuning time 4549.26s.
Downloads last month
111
Safetensors
Model size
41B params
Tensor type
F32
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for 0xSero/glm-5-381-reap-w3a16

Base model

zai-org/GLM-5
Quantized
(26)
this model