STR-Lite

STR-Lite is an ultra-lightweight scene text recognition model that combines Masked Autoencoder (MAE) pretraining with an autoregressive decoder for text generation. With only 6M parameters, it achieves competitive accuracy while remaining highly efficient for real-world deployment.

Model Architecture

Component Details
Backbone ViT-Tiny (embed=192, depth=12, heads=12)
Decoder 1-layer autoregressive transformer (embed=192, heads=12)
Input size 32 ร— 128 (H ร— W)
Patch size 4 ร— 8
Parameters ~6M
Precision bfloat16

Training

Stage 1 โ€” MAE Pretraining

  • Dataset: U14M-Unlabeled
  • Epochs: 40

Stage 2 โ€” Fine-tuning

  • Dataset: U14M-L-Filtered
  • Epochs: 20, Batch: 256, LR: 1e-3, Weight decay: 0.01

Checkpoints

Model Description Epochs Acc Download
MAE ViT-Tiny Pretrained encoder only 40 โ€” pretrain/checkpoint-last.pth
STRLite Full fine-tuned model 20 93.82% finetune/checkpoint-best.pth

Results

Common STR Benchmarks

Subset w/ pretrain w/o pretrain
CUTE80 95.83 94.79
IC13 96.85 96.50
IC15 86.80 86.25
IIIT5k 96.97 96.47
SVT 95.36 94.90
SVTP 92.40 89.77
Weighted avg. 93.82 93.12

U14M Benchmarks

Subset w/ pretrain w/o pretrain
artistic 67.78 62.11
contextless 78.95 77.43
curve 82.19 78.97
general 81.07 79.96
multi oriented 82.91 78.57
multi words 76.72 74.31
salient 78.17 75.33
Weighted avg. 81.03 79.88

Usage

Download and evaluate:

git clone https://github.com/balaboom123/STR-Lite
cd STR-Lite

# Download checkpoint
from huggingface_hub import hf_hub_download
path = hf_hub_download("balaboom123/STRLite", "finetune/checkpoint-best.pth")

# Evaluate
python eval.py \
  resume=$path \
  test_data_path='[/path/to/lmdb_test]'

Fine-tune from MAE pretrained weights:

path = hf_hub_download("balaboom123/STRLite", "pretrain/checkpoint-last.pth")

python main_finetune.py \
  train_data_path='[/path/to/lmdb_train]' \
  val_data_path='[/path/to/lmdb_val]' \
  pretrained_mae=$path

See the GitHub repo for full installation and dataset preparation instructions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support