Parakeet TDT 0.6B V2 (MLX, Encoder INT4)

Mixed-precision variant: encoder quantized to INT4, decoder + joint network remain BF16. Optimized for Apple Silicon with reduced memory footprint.

Metric BF16 Reference This (INT4)
Peak memory ~3GB ~1.0GB
Weight size ~1.2GB ~330MB
WER impact baseline +0.3-0.5% expected

Usage

from parakeet import from_pretrained

model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v2-int4")
result = model.transcribe("audio.wav")

Quantization Details

  • Encoder: INT4, group_size=64
  • Decoder (PredictNetwork): BF16 (accuracy-critical)
  • Joint network: BF16 (accuracy-critical)
  • Method: Post-training quantization via mlx.nn.quantize
  • Source: sonic-speech/parakeet-tdt-0.6b-v2
Downloads last month
16
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sonic-speech/parakeet-tdt-0.6b-v2-int4

Finetuned
(2)
this model