Svara-TTS v1 โ€” MLX bfloat16

Parent model: kenpath/svara-tts-v1 โ€” full upstream weights, model card, training data, and evaluation. All credit for the model itself goes to the Kenpath team. This repo only contains an MLX-format conversion for inference on Apple Silicon.

Orpheus base: canopylabs/3b-hi-ft-research_release โ€” Canopy Labs' Orpheus Hindi research release, which Svara was fine-tuned from.

Full-precision (bfloat16) MLX port of kenpath/svara-tts-v1 โ€” an autoregressive multilingual text-to-speech model for 19 Indian languages, in the Orpheus / SNAC family. Same numerical precision as upstream, repackaged in MLX-native format (~6.6 GB sharded safetensors).

For smaller memory footprints, use the 4-bit or 8-bit quantized variants linked below.

Built for mlx-audio on Apple Silicon.

Usage

Requires mlx-audio with TTS extras:

pip install "mlx-audio[tts]"

Python

import numpy as np
import soundfile as sf
import mlx.core as mx
from mlx_audio.tts.utils import load_model

model = load_model("mlx-community/svara-tts-v1")

chunks = []
for result in model.generate(
    text="เคจเคฎเคธเฅเคคเฅ‡, เค†เคช เค•เฅˆเคธเฅ‡ เคนเฅˆเค‚? เคฎเฅˆเค‚ เค เฅ€เค• เคนเฅ‚เคเฅค",
    voice="Hindi (Female)",
    temperature=0.75,
    top_p=0.9,
    top_k=40,
    repetition_penalty=1.1,
    max_tokens=1200,
):
    chunks.append(result.audio)

audio = mx.concatenate(chunks, axis=0)
sf.write("hello_hi.wav", np.asarray(audio), model.sample_rate)  # 24 kHz

CLI

mlx_audio.tts.generate \
    --model mlx-community/svara-tts-v1 \
    --text "เคจเคฎเคธเฅเคคเฅ‡, เค†เคช เค•เฅˆเคธเฅ‡ เคนเฅˆเค‚?" \
    --voice "Hindi (Female)" \
    --temperature 0.75 \
    --top_p 0.9

Voices

Use a string of the form "<Language Name> (<Gender>)":

Language Voices
Hindi Hindi (Male), Hindi (Female)
Bengali Bengali (Male), Bengali (Female)
Marathi Marathi (Male), Marathi (Female)
Telugu Telugu (Male), Telugu (Female)
Kannada Kannada (Male), Kannada (Female)
Tamil Tamil (Male), Tamil (Female)
Malayalam Malayalam (Male), Malayalam (Female)
Gujarati Gujarati (Male), Gujarati (Female)
Punjabi Punjabi (Male), Punjabi (Female)
Assamese Assamese (Male), Assamese (Female)
Bhojpuri Bhojpuri (Male), Bhojpuri (Female)
Magahi Magahi (Male), Magahi (Female)
Maithili Maithili (Male), Maithili (Female)
Chhattisgarhi Chhattisgarhi (Male), Chhattisgarhi (Female)
Bodo Bodo (Male), Bodo (Female)
Dogri Dogri (Male), Dogri (Female)
Nepali Nepali (Male), Nepali (Female)
Sanskrit Sanskrit (Male), Sanskrit (Female)
English (Indian) English (Indian) (Male), English (Indian) (Female)

Total: 38 voices across 19 languages.

Sampling Recommendations

The upstream svara-tts-inference repo uses these defaults; they're a good starting point:

Parameter Value
temperature 0.75
top_p 0.9
top_k 40
repetition_penalty 1.1
max_tokens 1200โ€“2048

Architecture

  • Backbone: Llama-3.2-3B (fine-tuned from canopylabs/3b-hi-ft-research_release, Canopy's Orpheus Hindi base).
  • Codec: SNAC 24 kHz โ€” 3-level hierarchical RVQ, 7 codes per ~10 ms frame. Loaded automatically by mlx-audio.
  • Output: 24 kHz mono PCM.

Other Quants

License

Apache 2.0 โ€” see base model card for full details.

Downloads last month
30
Safetensors
Model size
3B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/svara-tts-v1