Gemma-4-26B-A4B-it — 14GB (MLX)

Mixed-precision quantized version of google/gemma-4-26B-A4B-it optimised by baa.ai using a proprietary Black Sheep AI method.

Metrics

Metric	Value
Size	13.1 GB
MMLU	78.27%
MMLU vs BF16	96.9% of BF16
MMLU vs Uniform 4-bit	+0.39pp

All RAM Variants (all beat Uniform 4-bit)

Model	Size	MMLU	MMLU % of BF16
BF16	51.6 GB	80.80%	100%
Uniform 4-bit	~14 GB	77.88%	96.4%
RAM 14GB	~14 GB	78.27%	96.9%
RAM 18GB	~18 GB	80.02%	99.0%
RAM 20GB ★	~20 GB	80.70%	99.9%
RAM 22GB	~22 GB	80.41%	99.5%

Usage

from mlx_lm import load, generate

model, tokenizer = load("baa-ai/Gemma-4-26B-A4B-it-RAM-14GB-MLX")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)

Quantized by baa.ai

Downloads last month: -

Safetensors

Model size

25B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for baa-ai/Gemma-4-26B-A4B-it-RAM-14GB-MLX

Base model

google/gemma-4-26B-A4B-it

Quantized

(59)

this model

Collection including baa-ai/Gemma-4-26B-A4B-it-RAM-14GB-MLX

Gemma 4

Collection

RAM optimised Gemma 4 models by baa.ai • 5 items • Updated about 2 hours ago