0xSero
/

glm-5-381-reap-w3a16

Text Generation

Mixture of Experts

Model card Files Files and versions

glm-5-381-reap-w3a16

This repository contains the W3A16 AutoRound quantization of the 50% REAP-pruned GLM-5 checkpoint.

Checkpoint

Base family: GLM-5
Architecture: GlmMoeDsaForCausalLM
Total parameters: 381,464,351,232
Source prune: refusal_contrast_reap, compression ratio 0.50, seed 42, router renormalization true
Quantization method: AutoRound
Quantization scheme: W3A16
Group size: 128
Calibration dataset: NeelNanda/pile-10k
Calibration samples: 128
Sequence length: 1024
Iterations per block: 50

Output

Saved model shards: 29
Quantized tensors: 29,571 / 29,659
Quantization config file: quantization_config.json

Intentionally Unquantized

lm_head
model.layers.[0-2].mlp.down_proj
model.layers.[0-2].mlp.gate_proj
model.layers.[0-2].mlp.up_proj
model.layers.[0-77].self_attn.indexer.weights_proj

Provenance

Quantized artifact path: /data0/external_research/glm5-autoround/full/glm5-reap-50pct-w3a16-pile10k-20260405T182123Z/output/layerwise_refusal_contrast_reap-renorm_true-seed_42-0.50-w3g128
Quantization log: /data0/external_research/glm5-autoround/full/glm5-reap-50pct-w3a16-pile10k-20260405T182123Z/quant.log

Notes

The source checkpoint for this quantization is the BF16 50% REAP GLM-5 artifact.
AutoRound reported total tuning time 4549.26s.

Downloads last month: 111

Safetensors

Model size

41B params

Tensor type

F32

·

I32

·

BF16

·

F16

·

Model tree for 0xSero/glm-5-381-reap-w3a16

Base model

Quantized

(26)

this model