Configuration Parsing Warning:Invalid JSON for config file config.json

Kokoro-82M CoreML INT8

End-to-end CoreML export of hexgrad/Kokoro-82M with INT8 k-means palettization, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+.

A single kokoro_5s.mlmodelc runs the full pipeline (BERT → duration prediction → fixed-shape alignment → prosody → decoder) in one CoreML call. G2P (grapheme-to-phoneme) is a separate pair of CoreML models.

Model

Parameter	Value
Parameters	82M
Precision	INT8 k-means palettization
Max audio length	5 s (200 frames @ 40 fps)
Sample rate	24 kHz
Style dimension	256
Max phonemes per pass	128

Files

File	Size	Description
`kokoro_5s.mlmodelc`	83 MB	Pre-compiled E2E model (pre-compiled, loads directly on-device)
`G2PEncoder.mlmodelc`	0.7 MB	Grapheme-to-phoneme encoder
`G2PDecoder.mlmodelc`	0.8 MB	Grapheme-to-phoneme decoder
`voices/`	0.5 MB	54 preset voice embeddings (10 languages)
`vocab_index.json`	4 KB	Phoneme vocabulary
`g2p_vocab.json`	4 KB	G2P vocabulary
`us_gold.json`, `us_silver.json`	6 MB	English pronunciation dictionaries
`pipeline_config.json`	4 KB	Swift pipeline config

Quality

Compared to the FP16 reference export on a reference 1-second utterance (af_heart voice, 14 phonemes) using the same CoreML inference path:

Metric	Value
Predicted duration Δ	0 frames
Output sample count	identical
Log-spec distance	0.42 (close to inaudible)
SI-SDR waveform	+0.01 dB
Size vs FP16	−74% (83 MB vs 310 MB)

Because CoreML k-means palettization is not deterministic (scikit-learn's k-means is unseeded), different exports land at different losses. This checkpoint was picked from the best of multiple export runs.

Voices

54 preset voices across 10 languages: English (US/UK), Spanish, French, Hindi, Italian, Japanese, Korean, Portuguese, Chinese.

Usage

Add speech-swift to Package.swift:

.package(url: "https://github.com/soniqo/speech-swift", branch: "main")

Then synthesize:

import KokoroTTS

let tts = try await KokoroTTSModel.fromPretrained(
    modelId: "aufklarer/Kokoro-82M-CoreML-INT8"
)
let audio = try await tts.synthesize(
    "Hello world, this is a Kokoro test.",
    voice: "af_heart"
)

CLI:

swift run audio kokoro "Hello world" --voice af_heart --output out.wav

Source

Base model: hexgrad/Kokoro-82M (Apache-2.0)
Dictionaries and G2P: Apache-2.0

License

Model weights: Apache-2.0
CoreML conversion: Apache-2.0

Model tree for aufklarer/Kokoro-82M-CoreML-INT8

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Finetuned

(24)

this model

Collection including aufklarer/Kokoro-82M-CoreML-INT8

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 19 items • Updated about 23 hours ago • 1

aufklarer
/

Kokoro-82M-CoreML-INT8