Parakeet-TDT-ExecuTorch-Metal

Pre-exported ExecuTorch .pte file for Parakeet TDT 0.6B with Metal backend (Apple GPU) and fpa4w quantization (4-bit weight, fp activation). Fast speech-to-text with word-level timestamps and GPU acceleration on macOS Apple Silicon.

For the XNNPACK (CPU) variant, see Parakeet-TDT-ExecuTorch-XNNPACK.

Installation

git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch && EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh
make parakeet-metal

Download

pip install huggingface_hub
huggingface-cli download younghan-meta/Parakeet-TDT-ExecuTorch-Metal --local-dir ~/parakeet_metal

Run

DYLD_LIBRARY_PATH=/usr/lib:$(brew --prefix libomp)/lib \
  cmake-out/examples/models/parakeet/parakeet_runner \
    --model_path ~/parakeet_metal/model.pte \
    --tokenizer_path ~/parakeet_metal/tokenizer.model \
    --audio_path ~/parakeet_metal/poem.wav

Optional flags:

--timestamps segment -- timestamp granularity: none|token|word|segment|all (default: segment)

Export Command

pip install "nemo_toolkit[asr]"
python examples/models/parakeet/export_parakeet_tdt.py \
    --backend metal \
    --qlinear_encoder fpa4w --qlinear_encoder_group_size 32 \
    --qlinear fpa4w --qlinear_group_size 32 \
    --output-dir ./parakeet_metal_quantized

Metal fpa4w quantization requires torchao built with experimental MPS ops:

EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh

More Info

Downloads last month: 11

Model tree for younghan-meta/Parakeet-TDT-ExecuTorch-Metal

Base model

nvidia/parakeet-tdt-0.6b-v2

Quantized

(6)

this model