Qwen3-ASR Uzbek (v1)
Fine-tuned Qwen/Qwen3-ASR-1.7B for Uzbek speech-to-text.
Model Details
- Base model: Qwen/Qwen3-ASR-1.7B
- Language: Uzbek (uz)
- Training data: ~104K samples from FLEURS + Uzbek Speech Corpus
- Training: 3 epochs, lr=2e-5, batch_size=32, gradient_accumulation=4
Usage
import torch
from qwen_asr import Qwen3ASRModel
model = Qwen3ASRModel.from_pretrained(
"Gearnode/qwen3-asr-uzbek",
device_map="cuda:0",
dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
max_new_tokens=448,
forced_aligner="Qwen/Qwen3-ForcedAligner-0.6B",
forced_aligner_kwargs=dict(dtype=torch.bfloat16, device_map="cuda:0"),
)
results = model.transcribe(
audio=[(audio_array, 16000)],
language=["Uzbek"],
return_time_stamps=True,
)
print(results[0].text)
Training Datasets
- Google FLEURS uz_uz (2,943 samples)
- Uzbek Speech Corpus (100,767 samples)
Evaluation
| Model | Uzbek Quality | Notes |
|---|---|---|
| This model | Best | Some repetition handled by post-processing |
| Meta MMS (mms-1b-all) | Passable | Use lang code uzb-script_latin |
| Whisper large-v3 | Poor | Hallucination loops on Uzbek |
| Base Qwen3-ASR | Poor | Mixed languages |
License
Apache 2.0 (same as base model)
- Downloads last month
- 74
Model tree for Gearnode/qwen3-asr-uzbek
Base model
Qwen/Qwen3-ASR-1.7B