Chatterbox Arabic Fine-tuned TTS ๐ธ๐ฆ
This is an Arabic-focused fine-tuned version of ResembleAI/chatterbox multilingual model using LoRA (Low-Rank Adaptation).
๐ฏ Model Description
This model has been specifically fine-tuned to improve Arabic language text-to-speech synthesis quality, including:
- Enhanced Arabic pronunciation and phonetics
- Better handling of Arabic diacritics (ุชุดููู)
- Improved intonation for Arabic speech patterns
- Support for Modern Standard Arabic (MSA) and common dialects
- Natural-sounding Arabic voice generation
โจ Key Features
- ๐ฃ๏ธ High-quality Arabic speech synthesis
- ๐ญ Zero-shot voice cloning for Arabic speakers
- โก Fast inference (real-time capable)
- ๐๏ธ Emotion/expression control
- ๐ Supports both Arabic and English (bilingual)
๐ฆ Installation
pip install chatterbox-tts torch torchaudio huggingface_hub
๐ Quick Start
Basic Arabic TTS
import torch
import torchaudio as ta
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
from huggingface_hub import hf_hub_download
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load base multilingual model
model = ChatterboxMultilingualTTS.from_pretrained(device=device)
# Download and apply Arabic fine-tuned weights
t3_path = hf_hub_download(
repo_id="YOUR-USERNAME/chatterbox-arabic-finetuned",
filename="t3_cfg.pt"
)
t3_state = torch.load(t3_path, map_location="cpu")
model.t3.load_state_dict(t3_state)
# Generate Arabic speech
arabic_text = "ู
ุฑุญุจุงู ุจู ูู ูู
ูุฐุฌ ุชุญููู ุงููุต ุฅูู ููุงู
ุงูู
ุญุณูู ููุบุฉ ุงูุนุฑุจูุฉ"
wav = model.generate(arabic_text, language_id="ar")
ta.save("arabic_output.wav", wav, model.sr)
Load All Fine-tuned Components
import torch
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
from huggingface_hub import hf_hub_download
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load base model
model = ChatterboxMultilingualTTS.from_pretrained(device=device)
# Download all fine-tuned weights
repo_id = "YOUR-USERNAME/chatterbox-arabic-finetuned"
t3_path = hf_hub_download(repo_id=repo_id, filename="t3_cfg.pt")
conds_path = hf_hub_download(repo_id=repo_id, filename="conds.pt")
s3gen_path = hf_hub_download(repo_id=repo_id, filename="s3gen.pt")
ve_path = hf_hub_download(repo_id=repo_id, filename="ve.pt")
# Load all components
model.t3.load_state_dict(torch.load(t3_path, map_location="cpu"))
model.conds.load_state_dict(torch.load(conds_path, map_location="cpu"))
model.s3gen.load_state_dict(torch.load(s3gen_path, map_location="cpu"))
model.ve.load_state_dict(torch.load(ve_path, map_location="cpu"))
# Generate
arabic_text = "ูุฐุง ุงุฎุชุจุงุฑ ูููู
ูุฐุฌ ุงูู
ุญุณูู"
wav = model.generate(arabic_text, language_id="ar")
Advanced Usage: Voice Cloning
import torch
import torchaudio as ta
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
from huggingface_hub import hf_hub_download
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model with fine-tuned weights
model = ChatterboxMultilingualTTS.from_pretrained(device=device)
t3_path = hf_hub_download(
repo_id="YOUR-USERNAME/chatterbox-arabic-finetuned",
filename="t3_cfg.pt"
)
model.t3.load_state_dict(torch.load(t3_path, map_location="cpu"))
# Generate with reference audio (voice cloning)
arabic_text = "ุงูุณูุงู
ุนูููู
ูุฑุญู
ุฉ ุงููู ูุจุฑูุงุชู"
reference_audio = "path/to/arabic_speaker.wav" # 6+ seconds recommended
wav = model.generate(
arabic_text,
language_id="ar",
audio_prompt_path=reference_audio,
exaggeration=0.5, # Control expressiveness (0.0-2.0)
cfg_weight=0.5 # Control adherence to prompt (0.0-1.0)
)
ta.save("arabic_cloned_voice.wav", wav, model.sr)
Text with Diacritics (Tashkeel)
# The model handles Arabic text with or without diacritics
text_with_tashkeel = "ู
ูุฑูุญูุจุงู ุจููู ููู ุนูุงููู
ู ุงูุฐููููุงุกู ุงูุงุตูุทูููุงุนูููู"
text_without_tashkeel = "ู
ุฑุญุจุง ุจู ูู ุนุงูู
ุงูุฐูุงุก ุงูุงุตุทูุงุนู"
# Both work well
wav1 = model.generate(text_with_tashkeel, language_id="ar")
wav2 = model.generate(text_without_tashkeel, language_id="ar")
๐๏ธ Parameters
exaggeration (0.0-2.0): Controls speech expressiveness
0.25: More monotone, robotic0.5: Natural (default)1.0-2.0: More dramatic and expressive
cfg_weight (0.0-1.0): Controls adherence to reference audio
0.3: Faster pacing0.5: Balanced (default)0.7+: More similar to reference
temperature (0.05-5.0): Controls randomness
- Lower: More consistent
- Higher: More variation
๐ Training Details
- Base Model: ResembleAI/chatterbox multilingual
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Target Language: Arabic (ุงูุนุฑุจูุฉ)
- Training Dataset: [Add your dataset info - e.g., "Arabic speech corpus with X hours"]
- Training Duration: [Add training time/epochs]
- Hardware: [Add GPU info if relevant]
๐ฏ Use Cases
- Arabic audiobook narration
- Arabic virtual assistants and voice agents
- Arabic e-learning content
- Arabic accessibility tools
- Dubbing and voice-over for Arabic content
- Arabic language learning applications
๐ Model Files
t3_cfg.pt- Text-to-speech transformer (main component) - 2.1 GBconds.pt- Conditioning model - 107 KBs3gen.pt- Speech generation model - 1.06 GBve.pt- Voice encoder - [size]tokenizer.json- Tokenizer configuration
๐ Supported Languages
While this model is optimized for Arabic, it maintains support for:
- Arabic (ar) - Primary focus
- English (en) - Secondary support
๐ Example Outputs
Modern Standard Arabic (MSA):
text = "ุงูุฐูุงุก ุงูุงุตุทูุงุนู ูุบูุฑ ุงูุนุงูู
ู
ู ุญูููุง ุจุทุฑู ูู
ูุชุฎูููุง ู
ู ูุจู"
Common Phrases:
greetings = [
"ุงูุณูุงู
ุนูููู
ูุฑุญู
ุฉ ุงููู ูุจุฑูุงุชู",
"ุตุจุงุญ ุงูุฎูุฑ",
"ู
ุณุงุก ุงูุฎูุฑ",
"ุฃููุงู ูุณููุงู",
"ููู ุญุงููุ"
]
Numbers and Dates:
text = "ุงูููู
ูู ุงูุฎุงู
ุณ ุนุดุฑ ู
ู ููุงูุฑ ุนุงู
ุฃูููู ูุณุชุฉ ูุนุดุฑูู"
โ ๏ธ Limitations
- Works best with Modern Standard Arabic (MSA)
- Dialectal Arabic may have varying quality depending on training data
- Very long sentences (>200 words) should be split for best results
- Reference audio for voice cloning should be clear and 6+ seconds long
๐ Citation
If you use this model, please cite the original Chatterbox work:
@misc{chatterboxtts2025,
author = {{Resemble AI}},
title = {{Chatterbox-TTS}},
year = {2025},
howpublished = {\url{https://github.com/resemble-ai/chatterbox}},
note = {GitHub repository}
}
๐ License
This model inherits the MIT license from the base Chatterbox model.
๐ Acknowledgments
- Thanks to ResembleAI for the base Chatterbox model
- [Add any dataset credits or collaborators]
๐ง Contact
[Add your contact info or leave blank]
For issues or questions, please open an issue on the model repository.
Note: This model includes Resemble AI's Perth watermarking for generated audio.
- Downloads last month
- 9
Model tree for juliardi/chatterbox-multilingual-finetuned-arabic
Base model
ResembleAI/chatterbox