Gemma4 31B Abliterated Multimodal AWQ8
This repository contains a compressed-tensors AWQ W8A16 checkpoint for shreyan35/gemma4-31b-abliterated-multimodal.
What is included
config.jsonwith the compressed-tensors quantization configmodel.safetensorstokenizer.jsonandtokenizer_config.jsonprocessor_config.jsonchat_template.jinjageneration_config.jsonrecipe.yaml
Quantization summary
- Format:
compressed-tensors - Method:
AWQ - Weight bits:
8 - Activation bits:
16 - Group size:
32 - Weights: symmetric
- Observer:
mse - Duo scaling: enabled
- Excluded from quantization: vision tower, multimodal projector/embed_vision, and
lm_head
Recommended serving
Use vLLM for inference.
Tested runtime:
torch 2.10.0+cu128transformers 5.5.1compressed-tensors 0.14.0.1vllm 0.19.0
pip install -U "torch==2.10.0" "transformers==5.5.1" "compressed-tensors==0.14.0.1" "vllm==0.19.0"
vllm serve groxaxo/gemma4-31b-abliterated-multimodal-awq8 \
--trust-remote-code \
--dtype auto \
--max-model-len 6144 \
--served-model-name gemma4-31b-abliterated-multimodal-awq8
For local image inputs, add:
--allowed-local-media-path /path/to/images --limit-mm-per-prompt '{"image":1}'
For text-only long-context serving:
--limit-mm-per-prompt '{"image":0,"video":0,"audio":0}' --skip-mm-profiling --mm-processor-cache-gb 0 --max-model-len 10240
Transformers fallback
The checkpoint also loads with trust_remote_code=True through Transformers:
from transformers import AutoProcessor, AutoModelForImageTextToText
repo_id = "groxaxo/gemma4-31b-abliterated-multimodal-awq8"
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
repo_id,
trust_remote_code=True,
device_map="auto",
torch_dtype="auto",
)
Notes
- This checkpoint was built to preserve multimodal capability while avoiding quantization of the vision tower and projector modules.
- If you only need a local OpenAI-compatible endpoint, point clients at
http://127.0.0.1:1234/v1after starting vLLM.
- Downloads last month
- 418
Model tree for groxaxo/gemma4-31b-abliterated-multimodal-awq8
Base model
google/gemma-4-31B-it