Nepali NER with LoRA and Hindi Transfer

First parameter-efficient Named Entity Recognition model for Nepali using Low-Rank Adaptation (LoRA) with Hindi cross-lingual transfer.

Model Performance

Model F1 Params Time
mBERT Full FT (Yadav 2024) 86.2% 177M 12hr
This model 79.51% 0.6M 12min

99.7% fewer parameters than full fine-tuning.

Model Details

  • Developed by: Avinash Gautam, Smarika Ghimire
  • Institution: Pokhara University, Nepal
  • Model type: Token Classification (NER)
  • Language: Nepali (ne), Hindi (hi)
  • License: Apache 2.0
  • Base model: google-bert/bert-base-multilingual-cased
  • Fine-tuning method: LoRA (Low-Rank Adaptation)

Uses

Direct Use

Nepali Named Entity Recognition — identifies Person, Location, and Organization entities in Nepali text.

Example Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
from peft import PeftModel
import torch

# Load model
base_model = AutoModelForTokenClassification.from_pretrained(
    "google-bert/bert-base-multilingual-cased",
    num_labels=7
)
model = PeftModel.from_pretrained(base_model, "a-proton/nepali-ner-lora")
tokenizer = AutoTokenizer.from_pretrained("a-proton/nepali-ner-lora")

# Predict
text = "राम पोखरा विश्वविद्यालय गए"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
print(predictions)

Training Details

Training Data

  • Nepali: EBIQUITY dataset (3,288 training sentences)
  • Hindi: WikiANN Hindi NER (6,000 sentences) for cross-lingual transfer

Training Configuration

Base model: bert-base-multilingual-cased LoRA rank: 16 LoRA alpha: 32 LoRA dropout: 0.1 Target modules: query, value Optimizer: AdamW Learning rate: 2e-4 Batch size: 16 Epochs: 5 Hardware: NVIDIA Tesla T4 (Kaggle) Training time: ~12 minutes

Evaluation Results

Evaluated on EBIQUITY test set (329 sentences): EntityPrecisionRecallF1PER81%84%82%LOC78%78%78%ORG77%78%78%Overall78.77%80.26%79.51%

Key Findings

  1. Hindi transfer improves Nepali NER by +0.92% F1
  2. Combined training beats sequential training by +3.5% F1
  3. LoRA rank r=16 outperforms r=32 by +6.8% F1 in low-resource settings
  4. 99.7% parameter reduction with only 6.7% F1 drop vs full fine-tuning

Limitations

  • Trained on EBIQUITY dataset only (3,606 sentences total)
  • Supports PER, LOC, ORG entity types only
  • Performance may vary on domains different from news text

Citation

Paper currently under review. Citation will be added upon publication.

Code

GitHub: https://github.com/a-proton/nepali-ner-lora

Framework Versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • PyTorch 2.0+
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for a-proton/nepali-ner-lora

Adapter
(15)
this model