groxaxo
/

gemma4-prometheus-fixes

Model card Files Files and versions

Gemma4 Prometheus fixes

This repo documents the local patches that made the Gemma4 + Prometheus + GPTQ pipeline work on this machine.

What was fixed

Prometheus adapter targeting now resolves exact module paths instead of suffix matches.
Prometheus steering now dequantizes to FP16 on CUDA by default to avoid VRAM blowups.
GPTQModel now recognizes gemma4, uses a Gemma4 module tree, tolerates missing v_proj, and refreshes rotary position_embeddings per layer.
Quantization scheduling is free-memory-aware and does not rely on naive round-robin GPU assignment.

Related repos

Workflow and reproduction scripts: groxaxo/gemma4-prometheus-workflow
Merged model: groxaxo/gemma4-prometheus-merged
GPTQ model: groxaxo/gemma4-prometheus-gptq-4bit

Reproduce

Use the workflow repo with the same conda env described there. The workflow README includes the exact commands used for:

Prometheus text-only inference
merged-model export
GPTQ quantization

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support