Nepali NER with LoRA and Hindi Transfer
First parameter-efficient Named Entity Recognition model for Nepali using Low-Rank Adaptation (LoRA) with Hindi cross-lingual transfer.
Model Performance
| Model | F1 | Params | Time |
|---|---|---|---|
| mBERT Full FT (Yadav 2024) | 86.2% | 177M | 12hr |
| This model | 79.51% | 0.6M | 12min |
99.7% fewer parameters than full fine-tuning.
Model Details
- Developed by: Avinash Gautam, Smarika Ghimire
- Institution: Pokhara University, Nepal
- Model type: Token Classification (NER)
- Language: Nepali (ne), Hindi (hi)
- License: Apache 2.0
- Base model: google-bert/bert-base-multilingual-cased
- Fine-tuning method: LoRA (Low-Rank Adaptation)
Uses
Direct Use
Nepali Named Entity Recognition — identifies Person, Location, and Organization entities in Nepali text.
Example Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
from peft import PeftModel
import torch
# Load model
base_model = AutoModelForTokenClassification.from_pretrained(
"google-bert/bert-base-multilingual-cased",
num_labels=7
)
model = PeftModel.from_pretrained(base_model, "a-proton/nepali-ner-lora")
tokenizer = AutoTokenizer.from_pretrained("a-proton/nepali-ner-lora")
# Predict
text = "राम पोखरा विश्वविद्यालय गए"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
print(predictions)
Training Details
Training Data
- Nepali: EBIQUITY dataset (3,288 training sentences)
- Hindi: WikiANN Hindi NER (6,000 sentences) for cross-lingual transfer
Training Configuration
Base model: bert-base-multilingual-cased LoRA rank: 16 LoRA alpha: 32 LoRA dropout: 0.1 Target modules: query, value Optimizer: AdamW Learning rate: 2e-4 Batch size: 16 Epochs: 5 Hardware: NVIDIA Tesla T4 (Kaggle) Training time: ~12 minutes
Evaluation Results
Evaluated on EBIQUITY test set (329 sentences): EntityPrecisionRecallF1PER81%84%82%LOC78%78%78%ORG77%78%78%Overall78.77%80.26%79.51%
Key Findings
- Hindi transfer improves Nepali NER by +0.92% F1
- Combined training beats sequential training by +3.5% F1
- LoRA rank r=16 outperforms r=32 by +6.8% F1 in low-resource settings
- 99.7% parameter reduction with only 6.7% F1 drop vs full fine-tuning
Limitations
- Trained on EBIQUITY dataset only (3,606 sentences total)
- Supports PER, LOC, ORG entity types only
- Performance may vary on domains different from news text
Citation
Paper currently under review. Citation will be added upon publication.
Code
GitHub: https://github.com/a-proton/nepali-ner-lora
Framework Versions
- PEFT 0.10.0
- Transformers 4.40.0
- PyTorch 2.0+
- Downloads last month
- 22
Model tree for a-proton/nepali-ner-lora
Base model
google-bert/bert-base-multilingual-cased