Arabic transcription quality and dialect handling in production

#17
by O96a - opened

The 14-language support with Arabic included is compelling for low-resource multilingual ASR pipelines. I noticed the model uses a conformer encoder β€” curious how this compares to Whisper-style encoder-decoder for dialectal Arabic specifically. In my experience with Sudanese/Egyptian Arabic transcription, Whisper tends to struggle with code-switching and dialectal variants that diverge significantly from MSA. Has Cohere benchmarked performance across Arabic dialects, or is training data primarily MSA? The long-form chunking with automatic reassembly is a practical choice β€” RTFx numbers in the 55-minute earnings call example are impressive. For production deployment, have you observed any degradation patterns when audio contains significant background noise or overlapping speakers? Interested in testing this against some Sudanese Arabic news clips I have access to.

Sign up or log in to comment