Cohere Transcribe 2B (CoreML + ONNX Hybrid)

Hybrid CoreML encoder + ONNX decoder for Cohere Transcribe, optimized for Apple Silicon inference. The encoder runs on the Neural Engine via CoreML, the decoder runs with ONNX Runtime and KV cache on CPU.

Cohere Transcribe is a 2B-parameter encoder-decoder ASR model that holds #1 on the Open ASR Leaderboard with 5.42% average WER — beating Whisper Large v3, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B.

Features

  • 14 languages: English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Greek, Arabic, Japanese, Chinese, Vietnamese, Korean
  • #1 accuracy on Open ASR Leaderboard (5.42% WER)
  • ANE-accelerated encoder via CoreML (1.9B params on Neural Engine)
  • KV-cached decoder via ONNX Runtime (153M params, q4f16 quantized)
  • Long-form audio: native 35-second chunking

Specifications

Property Value
Total Parameters 2.07B
Encoder CoreML FP16 (3.5 GB)
Decoder ONNX q4f16 (98 MB)
Projection Float32 (5 MB)
Total Download ~3.6 GB
License Apache 2.0

Files

coreml/
  cohere_encoder.mlmodelc/    # CoreML encoder (ANE-optimized, FP16)
onnx/
  decoder_model_merged_q4f16.onnx       # ONNX decoder header
  decoder_model_merged_q4f16.onnx_data  # ONNX decoder weights (q4f16)
config.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
decoder_proj_weight.bin     # Encoder→decoder projection (1280→1024)
decoder_proj_bias.bin

Usage

This model is designed for use with Petal, a macOS menu bar app for local-first audio transcription.

Architecture:

  1. Audio → mel spectrogram (128 bins, 16kHz, NeMo-style preprocessing)
  2. CoreML encoder (Fast-Conformer, 48 layers, d=1280) → encoder hidden states
  3. Projection layer (1280→1024) applied in Swift via Accelerate
  4. ONNX decoder (Transformer, 8 layers, d=1024) with KV cache → token IDs
  5. SentencePiece tokenizer → text

Performance on Apple Silicon:

  • Model warmup: ~0.2s (cached CoreML)
  • Transcription: ~2s for 4s audio
  • Encoder runs on ANE, decoder on CPU with KV cache

License

Apache 2.0 — original model by Cohere.

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Aayush9029/cohere-transcribe-2b-coreml-onnx

Quantized
(24)
this model