Parakeet TDT 0.6B V2 (MLX, Encoder INT4)
Mixed-precision variant: encoder quantized to INT4, decoder + joint network remain BF16. Optimized for Apple Silicon with reduced memory footprint.
| Metric | BF16 Reference | This (INT4) |
|---|---|---|
| Peak memory | ~3GB | ~1.0GB |
| Weight size | ~1.2GB | ~330MB |
| WER impact | baseline | +0.3-0.5% expected |
Usage
from parakeet import from_pretrained
model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2-int4")
result = model.transcribe("audio.wav")
Quantization Details
- Encoder: INT4, group_size=64
- Decoder (PredictNetwork): BF16 (accuracy-critical)
- Joint network: BF16 (accuracy-critical)
- Method: Post-training quantization via
mlx.nn.quantize - Source:
sonic-speech/parakeet-tdt-0.6b-v2
- Downloads last month
- 16
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for sonic-speech/parakeet-tdt-0.6b-v2-int4
Base model
nvidia/parakeet-tdt-0.6b-v2 Finetuned
sonic-speech/parakeet-tdt-0.6b-v2