Nepali NER with LoRA and Hindi Transfer

First parameter-efficient Named Entity Recognition model for Nepali using Low-Rank Adaptation (LoRA) with Hindi cross-lingual transfer.

Model Performance

Model	F1	Params	Time
mBERT Full FT (Yadav 2024)	86.2%	177M	12hr
This model	79.51%	0.6M	12min

99.7% fewer parameters than full fine-tuning.

Model Details

Developed by: Avinash Gautam, Smarika Ghimire
Institution: Pokhara University, Nepal
Model type: Token Classification (NER)
Language: Nepali (ne), Hindi (hi)
License: Apache 2.0
Base model: google-bert/bert-base-multilingual-cased
Fine-tuning method: LoRA (Low-Rank Adaptation)

Uses

Direct Use

Nepali Named Entity Recognition — identifies Person, Location, and Organization entities in Nepali text.

Example Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
from peft import PeftModel
import torch

# Load model
base_model = AutoModelForTokenClassification.from_pretrained(
    "google-bert/bert-base-multilingual-cased",
    num_labels=7
)
model = PeftModel.from_pretrained(base_model, "a-proton/nepali-ner-lora")
tokenizer = AutoTokenizer.from_pretrained("a-proton/nepali-ner-lora")

# Predict
text = "राम पोखरा विश्वविद्यालय गए"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
print(predictions)

Training Details

Training Data

Nepali: EBIQUITY dataset (3,288 training sentences)
Hindi: WikiANN Hindi NER (6,000 sentences) for cross-lingual transfer

Training Configuration

Base model: bert-base-multilingual-cased LoRA rank: 16 LoRA alpha: 32 LoRA dropout: 0.1 Target modules: query, value Optimizer: AdamW Learning rate: 2e-4 Batch size: 16 Epochs: 5 Hardware: NVIDIA Tesla T4 (Kaggle) Training time: ~12 minutes

Evaluation Results

Evaluated on EBIQUITY test set (329 sentences): EntityPrecisionRecallF1PER81%84%82%LOC78%78%78%ORG77%78%78%Overall78.77%80.26%79.51%

Key Findings

Hindi transfer improves Nepali NER by +0.92% F1
Combined training beats sequential training by +3.5% F1
LoRA rank r=16 outperforms r=32 by +6.8% F1 in low-resource settings
99.7% parameter reduction with only 6.7% F1 drop vs full fine-tuning

Limitations

Trained on EBIQUITY dataset only (3,606 sentences total)
Supports PER, LOC, ORG entity types only
Performance may vary on domains different from news text

Citation

Paper currently under review. Citation will be added upon publication.

Code

GitHub: https://github.com/a-proton/nepali-ner-lora

Framework Versions

PEFT 0.10.0
Transformers 4.40.0
PyTorch 2.0+

Downloads last month: 22

Model tree for a-proton/nepali-ner-lora

Base model

google-bert/bert-base-multilingual-cased

Adapter

(15)

this model