Parakeet TDT 0.6B v3 (MLX, Encoder INT8)
NVIDIA Parakeet TDT v3 with encoder-only INT8 quantization โ the recommended variant for most users. Zero WER degradation, 30% faster, 58% less memory than BF16.
See also:
sonic-speech/parakeet-tdt-0.6b-v3โ Full BF16 referencesonic-speech/parakeet-tdt-0.6b-v3-int4โ Encoder INT4 (lite)
Why INT8?
| Metric | BF16 | INT8 | Change |
|---|---|---|---|
| WER (LibriSpeech) | 0.82% | 0.82% | None |
| WER (TED-LIUM) | 15.1% | 15.1% | None |
| RTFx | 73x | 95x | +30% |
| Peak Memory | 3,002 MB | 1,268 MB | -58% |
| Weight Size | 1,254 MB | 755 MB | -40% |
Benchmarked on Apple M3 Max (64 GB), macOS Sequoia 15.7.3, MLX 0.30.4.
Quantization Details
Only the Conformer encoder (~85% of parameters) is quantized to INT8 (group_size=64). The decoder and joint network remain in BF16, preserving precision for token generation.
Usage
from parakeet import from_pretrained
import mlx.core as mx
model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3-int8", dtype=mx.bfloat16)
result = model.transcribe("audio.wav")
Install: pip install parakeet-mlx
Origin
Quantized from sonic-speech/parakeet-tdt-0.6b-v3 using mlx.nn.quantize (encoder-only, 8-bit, group_size=64).
Part of the Sonic Speech model collection.
- Downloads last month
- 39
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for sonic-speech/parakeet-tdt-0.6b-v3-int8
Base model
nvidia/parakeet-tdt-0.6b-v3 Finetuned
sonic-speech/parakeet-tdt-0.6b-v3