Gemma4 31B Abliterated Multimodal AWQ8

This repository contains a compressed-tensors AWQ W8A16 checkpoint for shreyan35/gemma4-31b-abliterated-multimodal.

What is included

  • config.json with the compressed-tensors quantization config
  • model.safetensors
  • tokenizer.json and tokenizer_config.json
  • processor_config.json
  • chat_template.jinja
  • generation_config.json
  • recipe.yaml

Quantization summary

  • Format: compressed-tensors
  • Method: AWQ
  • Weight bits: 8
  • Activation bits: 16
  • Group size: 32
  • Weights: symmetric
  • Observer: mse
  • Duo scaling: enabled
  • Excluded from quantization: vision tower, multimodal projector/embed_vision, and lm_head

Recommended serving

Use vLLM for inference.

Tested runtime:

  • torch 2.10.0+cu128
  • transformers 5.5.1
  • compressed-tensors 0.14.0.1
  • vllm 0.19.0
pip install -U "torch==2.10.0" "transformers==5.5.1" "compressed-tensors==0.14.0.1" "vllm==0.19.0"
vllm serve groxaxo/gemma4-31b-abliterated-multimodal-awq8 \
  --trust-remote-code \
  --dtype auto \
  --max-model-len 6144 \
  --served-model-name gemma4-31b-abliterated-multimodal-awq8

For local image inputs, add:

--allowed-local-media-path /path/to/images --limit-mm-per-prompt '{"image":1}'

For text-only long-context serving:

--limit-mm-per-prompt '{"image":0,"video":0,"audio":0}' --skip-mm-profiling --mm-processor-cache-gb 0 --max-model-len 10240

Transformers fallback

The checkpoint also loads with trust_remote_code=True through Transformers:

from transformers import AutoProcessor, AutoModelForImageTextToText

repo_id = "groxaxo/gemma4-31b-abliterated-multimodal-awq8"
processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    repo_id,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype="auto",
)

Notes

  • This checkpoint was built to preserve multimodal capability while avoiding quantization of the vision tower and projector modules.
  • If you only need a local OpenAI-compatible endpoint, point clients at http://127.0.0.1:1234/v1 after starting vLLM.
Downloads last month
418
Safetensors
Model size
34B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for groxaxo/gemma4-31b-abliterated-multimodal-awq8

Quantized
(1)
this model