Gemma 4
Collection
RAM optimised Gemma 4 models by baa.ai • 5 items • Updated
Mixed-precision quantized version of google/gemma-4-26B-A4B-it optimised by baa.ai using a proprietary Black Sheep AI method.
| Metric | Value |
|---|---|
| Size | 13.1 GB |
| MMLU | 78.27% |
| MMLU vs BF16 | 96.9% of BF16 |
| MMLU vs Uniform 4-bit | +0.39pp |
| Model | Size | MMLU | MMLU % of BF16 |
|---|---|---|---|
| BF16 | 51.6 GB | 80.80% | 100% |
| Uniform 4-bit | ~14 GB | 77.88% | 96.4% |
| RAM 14GB | ~14 GB | 78.27% | 96.9% |
| RAM 18GB | ~18 GB | 80.02% | 99.0% |
| RAM 20GB ★ | ~20 GB | 80.70% | 99.9% |
| RAM 22GB | ~22 GB | 80.41% | 99.5% |
from mlx_lm import load, generate
model, tokenizer = load("baa-ai/Gemma-4-26B-A4B-it-RAM-14GB-MLX")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)
Quantized by baa.ai
4-bit
Base model
google/gemma-4-26B-A4B-it