Parakeet TDT 0.6B V2 (MLX, Encoder INT8)

Mixed-precision variant: encoder quantized to INT8, decoder + joint network remain BF16. Optimized for Apple Silicon with reduced memory footprint.

Metric BF16 Reference This (INT8)
Peak memory ~3GB ~1.5GB
Weight size ~1.2GB ~660MB
WER impact baseline +0.1-0.2% expected

Usage

from parakeet import from_pretrained

model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2-int8")
result = model.transcribe("audio.wav")

Quantization Details

  • Encoder: INT8, group_size=64
  • Decoder (PredictNetwork): BF16 (accuracy-critical)
  • Joint network: BF16 (accuracy-critical)
  • Method: Post-training quantization via mlx.nn.quantize
  • Source: sonic-speech/parakeet-tdt-0.6b-v2
Downloads last month
13
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sonic-speech/parakeet-tdt-0.6b-v2-int8

Finetuned
(2)
this model