PARSeq Malayalam OCR
This repository contains a PARSeq checkpoint trained for Malayalam OCR.
Model details
- Architecture: PARSeq
- Framework: PyTorch Lightning checkpoint
- Checkpoint file:
checkpoints/last.ckpt - Charset config:
configs/charset/malayalam.yaml - Training data source:
magles/malayalam-synthetic-ocr-datsetthh - Training environment: NVIDIA A40 with mixed precision
Important note
This is a Lightning .ckpt checkpoint, not a native Hugging Face Transformers model. Use it with the original PARSeq codebase for inference or further fine-tuning.
Load for inference
from strhub.models.parseq.system import PARSeq
model = PARSeq.load_from_checkpoint("checkpoints/last.ckpt")
model.eval()
Continue fine-tuning
python train.py \
charset=malayalam \
dataset=malayalam \
data.root_dir=data \
data.train_dir=YOUR_LMDB_DIR \
data.normalize_unicode=false \
trainer.accelerator=gpu \
trainer.devices=1 \
ckpt_path=checkpoints/last.ckpt
Notes
- Validation in the referenced run used a very small validation split, so those metrics should not be treated as definitive.
- This checkpoint is best used as a reusable starting point for further evaluation and fine-tuning.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support