Gemma4 31B Abliterated Multimodal AWQ8

This repository contains a compressed-tensors AWQ W8A16 checkpoint for shreyan35/gemma4-31b-abliterated-multimodal.

What is included

config.json with the compressed-tensors quantization config
model.safetensors
tokenizer.json and tokenizer_config.json
processor_config.json
chat_template.jinja
generation_config.json
recipe.yaml

Quantization summary

Format: compressed-tensors
Method: AWQ
Weight bits: 8
Activation bits: 16
Group size: 32
Weights: symmetric
Observer: mse
Duo scaling: enabled
Excluded from quantization: vision tower, multimodal projector/embed_vision, and lm_head

Recommended serving

Use vLLM for inference.

Tested runtime:

torch 2.10.0+cu128
transformers 5.5.1
compressed-tensors 0.14.0.1
vllm 0.19.0

pip install -U "torch==2.10.0" "transformers==5.5.1" "compressed-tensors==0.14.0.1" "vllm==0.19.0"
vllm serve groxaxo/gemma4-31b-abliterated-multimodal-awq8 \
  --trust-remote-code \
  --dtype auto \
  --max-model-len 6144 \
  --served-model-name gemma4-31b-abliterated-multimodal-awq8

For local image inputs, add:

--allowed-local-media-path /path/to/images --limit-mm-per-prompt '{"image":1}'

For text-only long-context serving:

--limit-mm-per-prompt '{"image":0,"video":0,"audio":0}' --skip-mm-profiling --mm-processor-cache-gb 0 --max-model-len 10240

Transformers fallback

The checkpoint also loads with trust_remote_code=True through Transformers:

from transformers import AutoProcessor, AutoModelForImageTextToText

repo_id = "groxaxo/gemma4-31b-abliterated-multimodal-awq8"
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    repo_id,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype="auto",
)

Notes

This checkpoint was built to preserve multimodal capability while avoiding quantization of the vision tower and projector modules.
If you only need a local OpenAI-compatible endpoint, point clients at http://127.0.0.1:1234/v1 after starting vLLM.

Downloads last month: 418

Safetensors

Model size

34B params

Tensor type

I64

I32

BF16

Model tree for groxaxo/gemma4-31b-abliterated-multimodal-awq8

Base model

google/gemma-4-31B-it

Finetuned

shreyan35/gemma4-31b-abliterated-multimodal

Quantized

(1)

this model