Parakeet TDT 0.6B V2 (MLX, Encoder INT8)
Mixed-precision variant: encoder quantized to INT8, decoder + joint network remain BF16. Optimized for Apple Silicon with reduced memory footprint.
| Metric | BF16 Reference | This (INT8) |
|---|---|---|
| Peak memory | ~3GB | ~1.5GB |
| Weight size | ~1.2GB | ~660MB |
| WER impact | baseline | +0.1-0.2% expected |
Usage
from parakeet import from_pretrained
model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2-int8")
result = model.transcribe("audio.wav")
Quantization Details
- Encoder: INT8, group_size=64
- Decoder (PredictNetwork): BF16 (accuracy-critical)
- Joint network: BF16 (accuracy-critical)
- Method: Post-training quantization via
mlx.nn.quantize - Source:
sonic-speech/parakeet-tdt-0.6b-v2
- Downloads last month
- 13
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for sonic-speech/parakeet-tdt-0.6b-v2-int8
Base model
nvidia/parakeet-tdt-0.6b-v2 Finetuned
sonic-speech/parakeet-tdt-0.6b-v2