glm-5-381-reap-w3a16
This repository contains the W3A16 AutoRound quantization of the 50% REAP-pruned GLM-5 checkpoint.
Checkpoint
- Base family:
GLM-5 - Architecture:
GlmMoeDsaForCausalLM - Total parameters:
381,464,351,232 - Source prune:
refusal_contrast_reap, compression ratio0.50, seed42, router renormalizationtrue - Quantization method:
AutoRound - Quantization scheme:
W3A16 - Group size:
128 - Calibration dataset:
NeelNanda/pile-10k - Calibration samples:
128 - Sequence length:
1024 - Iterations per block:
50
Output
- Saved model shards:
29 - Quantized tensors:
29,571 / 29,659 - Quantization config file:
quantization_config.json
Intentionally Unquantized
lm_headmodel.layers.[0-2].mlp.down_projmodel.layers.[0-2].mlp.gate_projmodel.layers.[0-2].mlp.up_projmodel.layers.[0-77].self_attn.indexer.weights_proj
Provenance
- Quantized artifact path:
/data0/external_research/glm5-autoround/full/glm5-reap-50pct-w3a16-pile10k-20260405T182123Z/output/layerwise_refusal_contrast_reap-renorm_true-seed_42-0.50-w3g128 - Quantization log:
/data0/external_research/glm5-autoround/full/glm5-reap-50pct-w3a16-pile10k-20260405T182123Z/quant.log
Notes
- The source checkpoint for this quantization is the BF16 50% REAP GLM-5 artifact.
- AutoRound reported total tuning time
4549.26s.
- Downloads last month
- 111
Model tree for 0xSero/glm-5-381-reap-w3a16
Base model
zai-org/GLM-5