Llama 3
Collection
SWAN quantized versions of Llama 3.1 and 3.3 70B Instruct (MLX) โข 2 items โข Updated
Mixed-precision quantized version of meta-llama/Llama-3.3-70B-Instruct optimised by baa.ai using a proprietary Black Sheep AI method.
Per-tensor bit-width allocation via advanced sensitivity analysis and budget-constrained optimisation โ no calibration data required.
| Metric | Value |
|---|---|
| Size | 46 GB |
| Average bits | 5.6 |
| WikiText-2 PPL (median) | 4.3544 |
| MMLU vs BF16 | 94.8% of BF16 |
from mlx_lm import load, generate
model, tokenizer = load("baa-ai/Llama-3.3-70B-Instruct-RAM-50GB-MLX")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)
Quantized by baa.ai
4-bit
Base model
meta-llama/Llama-3.1-70B