Qwen3-ASR-1.7B โ Core AI
Qwen3-ASR-1.7B speech-to-text converted for Apple Core AI, running on-device (iPhone + Mac).
The zoo's first ASR model: an AuT audio encoder feeding a Qwen3 decoder on the pipelined engine
(audio embeds bound to one static input buffer; {lang}<asr_text>{text} output). โค30 s clips,
52 languages, automatic language detection.
Driven by CoreAIKit KitASRModel:
let asr = try await KitASRModel(model: .qwen3ASR1_7B)
let r = try await asr.transcribe(samples: pcm16kMono) // -> (language, text)
Layout: gpu-pipelined/ holds the decoder bundle (*_decode_int8hu_n390_s1, int8) + the paired
AuT encoder (*_audio_encoder_fp16_k30, fp16). Same bundles on iOS and macOS.
App: coreai-audio (Transcribe tab โ pick Qwen3-ASR or Whisper large-v3-turbo). Card: zoo/qwen3-asr.md.