PARSeq Malayalam OCR

This repository contains a PARSeq checkpoint trained for Malayalam OCR.

Model details

Architecture: PARSeq
Framework: PyTorch Lightning checkpoint
Checkpoint file: checkpoints/last.ckpt
Charset config: configs/charset/malayalam.yaml
Training data source: magles/malayalam-synthetic-ocr-datsetthh
Training environment: NVIDIA A40 with mixed precision

Important note

This is a Lightning .ckpt checkpoint, not a native Hugging Face Transformers model. Use it with the original PARSeq codebase for inference or further fine-tuning.

Load for inference

from strhub.models.parseq.system import PARSeq

model = PARSeq.load_from_checkpoint("checkpoints/last.ckpt")
model.eval()

Continue fine-tuning

python train.py \
  charset=malayalam \
  dataset=malayalam \
  data.root_dir=data \
  data.train_dir=YOUR_LMDB_DIR \
  data.normalize_unicode=false \
  trainer.accelerator=gpu \
  trainer.devices=1 \
  ckpt_path=checkpoints/last.ckpt

Notes

Validation in the referenced run used a very small validation split, so those metrics should not be treated as definitive.
This checkpoint is best used as a reusable starting point for further evaluation and fine-tuning.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support