Whisper Small - Hindi ASR (LoRA Fine-Tuned)

Model Details

Model Description

This repository contains a parameter-efficient fine-tuned (PEFT) version of OpenAI's whisper-small model, optimized specifically for Hindi Automatic Speech Recognition (ASR). The model was trained using Low-Rank Adaptation (LoRA) combined with 8-bit quantization, allowing for highly efficient training and inference without degrading the base model's underlying architecture.

Developed by: Aagrim Rautela
Model type: Automatic Speech Recognition (ASR)
Language: Hindi (hi)
Finetuned from model: openai/whisper-small
Training Methodology: LoRA (Low-Rank Adaptation) via Hugging Face peft and bitsandbytes.

Evaluation and Results

The model's performance was evaluated against the standard pretrained whisper-small baseline to measure improvements in Hindi transcription accuracy.

Testing Data

Empirical evaluation was conducted using the Hindi (hi_in) test split of the standardized FLEURS dataset.

Metrics

Performance is measured using Word Error Rate (WER), where a lower percentage indicates higher accuracy.

Results Summary

Model	Dataset	Word Error Rate (WER)
Baseline (`openai/whisper-small`)	FLEURS (Hindi Test Split)	69.13%
Fine-Tuned Model (LoRA)	FLEURS (Hindi Test Split)	44.45%

The fine-tuning process yielded an absolute WER reduction of 24.68%, demonstrating a significant improvement in the model's ability to accurately transcribe spoken Hindi compared to the factory baseline.

How to Get Started with the Model

Because this model utilizes LoRA adapters, you must load the base model in 8-bit precision and attach the adapter weights. Use the following Python code to run inference:

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor, BitsAndBytesConfig
from peft import PeftModel

# 1. Load the Processor
processor = WhisperProcessor.from_pretrained("openai/whisper-small", language="hindi", task="transcribe")

# 2. Load the Base Model in 8-bit
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
base_model = WhisperForConditionalGeneration.from_pretrained(
    "openai/whisper-small", 
    quantization_config=bnb_config, 
    device_map="auto"
)

# 3. Attach the LoRA Adapters
model_id = "levanell/whisper-small-hi-ft" 
model = PeftModel.from_pretrained(base_model, model_id)

# 4. Run Inference (Example)
# audio_array = ... (load your 16kHz audio array here)
# inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features.to("cuda")

# with torch.no_grad():
#     generated_ids = model.generate(input_features=inputs, language="hindi", task="transcribe")
# transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
# print(transcription)

Downloads last month: 46

Model tree for levanell/whisper-small-hi-ft

Base model

openai/whisper-small

Adapter

(216)

this model