Whisper Small - Hindi ASR (LoRA Fine-Tuned)
Model Details
Model Description
This repository contains a parameter-efficient fine-tuned (PEFT) version of OpenAI's whisper-small model, optimized specifically for Hindi Automatic Speech Recognition (ASR). The model was trained using Low-Rank Adaptation (LoRA) combined with 8-bit quantization, allowing for highly efficient training and inference without degrading the base model's underlying architecture.
- Developed by: Aagrim Rautela
- Model type: Automatic Speech Recognition (ASR)
- Language: Hindi (hi)
- Finetuned from model:
openai/whisper-small - Training Methodology: LoRA (Low-Rank Adaptation) via Hugging Face
peftandbitsandbytes.
Evaluation and Results
The model's performance was evaluated against the standard pretrained whisper-small baseline to measure improvements in Hindi transcription accuracy.
Testing Data
Empirical evaluation was conducted using the Hindi (hi_in) test split of the standardized FLEURS dataset.
Metrics
Performance is measured using Word Error Rate (WER), where a lower percentage indicates higher accuracy.
Results Summary
| Model | Dataset | Word Error Rate (WER) |
|---|---|---|
Baseline (openai/whisper-small) |
FLEURS (Hindi Test Split) | 69.13% |
| Fine-Tuned Model (LoRA) | FLEURS (Hindi Test Split) | 44.45% |
The fine-tuning process yielded an absolute WER reduction of 24.68%, demonstrating a significant improvement in the model's ability to accurately transcribe spoken Hindi compared to the factory baseline.
How to Get Started with the Model
Because this model utilizes LoRA adapters, you must load the base model in 8-bit precision and attach the adapter weights. Use the following Python code to run inference:
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor, BitsAndBytesConfig
from peft import PeftModel
# 1. Load the Processor
processor = WhisperProcessor.from_pretrained("openai/whisper-small", language="hindi", task="transcribe")
# 2. Load the Base Model in 8-bit
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
base_model = WhisperForConditionalGeneration.from_pretrained(
"openai/whisper-small",
quantization_config=bnb_config,
device_map="auto"
)
# 3. Attach the LoRA Adapters
model_id = "levanell/whisper-small-hi-ft"
model = PeftModel.from_pretrained(base_model, model_id)
# 4. Run Inference (Example)
# audio_array = ... (load your 16kHz audio array here)
# inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features.to("cuda")
# with torch.no_grad():
# generated_ids = model.generate(input_features=inputs, language="hindi", task="transcribe")
# transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
# print(transcription)
- Downloads last month
- 46
Model tree for levanell/whisper-small-hi-ft
Base model
openai/whisper-small