Svara-TTS v1 โ€” MLX 8-bit

Parent model: kenpath/svara-tts-v1 โ€” full upstream weights, model card, training data, and evaluation. All credit for the model itself goes to the Kenpath team. This repo only contains an MLX-format quantization for inference on Apple Silicon.

Orpheus base: canopylabs/3b-hi-ft-research_release โ€” Canopy Labs' Orpheus Hindi research release, which Svara was fine-tuned from.

8-bit MLX-quantized port of kenpath/svara-tts-v1 โ€” an autoregressive multilingual text-to-speech model for 19 Indian languages, in the Orpheus / SNAC family. Quantized at ~8.5 bits per weight (q-bits=8, q-group-size=64), down from 13.2 GB bf16 to **3.5 GB**. Use this variant when you want quality closer to bf16 with a smaller memory footprint.

Built for mlx-audio on Apple Silicon.

Usage

Requires mlx-audio with TTS extras:

pip install "mlx-audio[tts]"

Python

import numpy as np
import soundfile as sf
import mlx.core as mx
from mlx_audio.tts.utils import load_model

model = load_model("mlx-community/svara-tts-v1-8bit")

chunks = []
for result in model.generate(
    text="เคจเคฎเคธเฅเคคเฅ‡, เค†เคช เค•เฅˆเคธเฅ‡ เคนเฅˆเค‚? เคฎเฅˆเค‚ เค เฅ€เค• เคนเฅ‚เคเฅค",
    voice="Hindi (Female)",
    temperature=0.75,
    top_p=0.9,
    top_k=40,
    repetition_penalty=1.1,
    max_tokens=1200,
):
    chunks.append(result.audio)

audio = mx.concatenate(chunks, axis=0)
sf.write("hello_hi.wav", np.asarray(audio), model.sample_rate)  # 24 kHz

CLI

mlx_audio.tts.generate \
    --model mlx-community/svara-tts-v1-8bit \
    --text "เคจเคฎเคธเฅเคคเฅ‡, เค†เคช เค•เฅˆเคธเฅ‡ เคนเฅˆเค‚?" \
    --voice "Hindi (Female)" \
    --temperature 0.75 \
    --top_p 0.9

Voices

Use a string of the form "<Language Name> (<Gender>)":

Language Voices
Hindi Hindi (Male), Hindi (Female)
Bengali Bengali (Male), Bengali (Female)
Marathi Marathi (Male), Marathi (Female)
Telugu Telugu (Male), Telugu (Female)
Kannada Kannada (Male), Kannada (Female)
Tamil Tamil (Male), Tamil (Female)
Malayalam Malayalam (Male), Malayalam (Female)
Gujarati Gujarati (Male), Gujarati (Female)
Punjabi Punjabi (Male), Punjabi (Female)
Assamese Assamese (Male), Assamese (Female)
Bhojpuri Bhojpuri (Male), Bhojpuri (Female)
Magahi Magahi (Male), Magahi (Female)
Maithili Maithili (Male), Maithili (Female)
Chhattisgarhi Chhattisgarhi (Male), Chhattisgarhi (Female)
Bodo Bodo (Male), Bodo (Female)
Dogri Dogri (Male), Dogri (Female)
Nepali Nepali (Male), Nepali (Female)
Sanskrit Sanskrit (Male), Sanskrit (Female)
English (Indian) English (Indian) (Male), English (Indian) (Female)

Total: 38 voices across 19 languages.

Sampling Recommendations

The upstream svara-tts-inference repo uses these defaults; they're a good starting point:

Parameter Value
temperature 0.75
top_p 0.9
top_k 40
repetition_penalty 1.1
max_tokens 1200โ€“2048

Architecture

  • Backbone: Llama-3.2-3B (fine-tuned from canopylabs/3b-hi-ft-research_release, Canopy's Orpheus Hindi base).
  • Codec: SNAC 24 kHz โ€” 3-level hierarchical RVQ, 7 codes per ~10 ms frame. Loaded automatically by mlx-audio.
  • Output: 24 kHz mono PCM.

Other Quants

License

Apache 2.0 โ€” see base model card for full details.

Downloads last month
27
Safetensors
Model size
0.9B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/svara-tts-v1-8bit