Svara-TTS v1 โ MLX 8-bit
Parent model:
kenpath/svara-tts-v1โ full upstream weights, model card, training data, and evaluation. All credit for the model itself goes to the Kenpath team. This repo only contains an MLX-format quantization for inference on Apple Silicon.Orpheus base:
canopylabs/3b-hi-ft-research_releaseโ Canopy Labs' Orpheus Hindi research release, which Svara was fine-tuned from.
8-bit MLX-quantized port of kenpath/svara-tts-v1 โ an autoregressive multilingual text-to-speech model for 19 Indian languages, in the Orpheus / SNAC family. Quantized at ~8.5 bits per weight (q-bits=8, q-group-size=64), down from 13.2 GB bf16 to **3.5 GB**. Use this variant when you want quality closer to bf16 with a smaller memory footprint.
Built for mlx-audio on Apple Silicon.
Usage
Requires mlx-audio with TTS extras:
pip install "mlx-audio[tts]"
Python
import numpy as np
import soundfile as sf
import mlx.core as mx
from mlx_audio.tts.utils import load_model
model = load_model("mlx-community/svara-tts-v1-8bit")
chunks = []
for result in model.generate(
text="เคจเคฎเคธเฅเคคเฅ, เคเคช เคเฅเคธเฅ เคนเฅเค? เคฎเฅเค เค เฅเค เคนเฅเคเฅค",
voice="Hindi (Female)",
temperature=0.75,
top_p=0.9,
top_k=40,
repetition_penalty=1.1,
max_tokens=1200,
):
chunks.append(result.audio)
audio = mx.concatenate(chunks, axis=0)
sf.write("hello_hi.wav", np.asarray(audio), model.sample_rate) # 24 kHz
CLI
mlx_audio.tts.generate \
--model mlx-community/svara-tts-v1-8bit \
--text "เคจเคฎเคธเฅเคคเฅ, เคเคช เคเฅเคธเฅ เคนเฅเค?" \
--voice "Hindi (Female)" \
--temperature 0.75 \
--top_p 0.9
Voices
Use a string of the form "<Language Name> (<Gender>)":
| Language | Voices |
|---|---|
| Hindi | Hindi (Male), Hindi (Female) |
| Bengali | Bengali (Male), Bengali (Female) |
| Marathi | Marathi (Male), Marathi (Female) |
| Telugu | Telugu (Male), Telugu (Female) |
| Kannada | Kannada (Male), Kannada (Female) |
| Tamil | Tamil (Male), Tamil (Female) |
| Malayalam | Malayalam (Male), Malayalam (Female) |
| Gujarati | Gujarati (Male), Gujarati (Female) |
| Punjabi | Punjabi (Male), Punjabi (Female) |
| Assamese | Assamese (Male), Assamese (Female) |
| Bhojpuri | Bhojpuri (Male), Bhojpuri (Female) |
| Magahi | Magahi (Male), Magahi (Female) |
| Maithili | Maithili (Male), Maithili (Female) |
| Chhattisgarhi | Chhattisgarhi (Male), Chhattisgarhi (Female) |
| Bodo | Bodo (Male), Bodo (Female) |
| Dogri | Dogri (Male), Dogri (Female) |
| Nepali | Nepali (Male), Nepali (Female) |
| Sanskrit | Sanskrit (Male), Sanskrit (Female) |
| English (Indian) | English (Indian) (Male), English (Indian) (Female) |
Total: 38 voices across 19 languages.
Sampling Recommendations
The upstream svara-tts-inference repo uses these defaults; they're a good starting point:
| Parameter | Value |
|---|---|
temperature |
0.75 |
top_p |
0.9 |
top_k |
40 |
repetition_penalty |
1.1 |
max_tokens |
1200โ2048 |
Architecture
- Backbone: Llama-3.2-3B (fine-tuned from
canopylabs/3b-hi-ft-research_release, Canopy's Orpheus Hindi base). - Codec: SNAC 24 kHz โ 3-level hierarchical RVQ, 7 codes per ~10 ms frame. Loaded automatically by
mlx-audio. - Output: 24 kHz mono PCM.
Other Quants
- bf16 MLX:
mlx-community/svara-tts-v1(~6.6 GB) - 4-bit MLX:
mlx-community/svara-tts-v1-4bit(~1.9 GB) - bf16 source:
kenpath/svara-tts-v1(~13.2 GB)
License
Apache 2.0 โ see base model card for full details.
- Downloads last month
- 27
8-bit
Model tree for mlx-community/svara-tts-v1-8bit
Base model
meta-llama/Llama-3.2-3B-Instruct