STR-Lite
STR-Lite is an ultra-lightweight scene text recognition model that combines Masked Autoencoder (MAE) pretraining with an autoregressive decoder for text generation. With only 6M parameters, it achieves competitive accuracy while remaining highly efficient for real-world deployment.
- GitHub: balaboom123/STR-Lite
- Author: Kuanwei Chen
- License: MIT
Model Architecture
| Component | Details |
|---|---|
| Backbone | ViT-Tiny (embed=192, depth=12, heads=12) |
| Decoder | 1-layer autoregressive transformer (embed=192, heads=12) |
| Input size | 32 ร 128 (H ร W) |
| Patch size | 4 ร 8 |
| Parameters | ~6M |
| Precision | bfloat16 |
Training
Stage 1 โ MAE Pretraining
- Dataset: U14M-Unlabeled
- Epochs: 40
Stage 2 โ Fine-tuning
- Dataset: U14M-L-Filtered
- Epochs: 20, Batch: 256, LR: 1e-3, Weight decay: 0.01
Checkpoints
| Model | Description | Epochs | Acc | Download |
|---|---|---|---|---|
| MAE ViT-Tiny | Pretrained encoder only | 40 | โ | pretrain/checkpoint-last.pth |
| STRLite | Full fine-tuned model | 20 | 93.82% | finetune/checkpoint-best.pth |
Results
Common STR Benchmarks
| Subset | w/ pretrain | w/o pretrain |
|---|---|---|
| CUTE80 | 95.83 | 94.79 |
| IC13 | 96.85 | 96.50 |
| IC15 | 86.80 | 86.25 |
| IIIT5k | 96.97 | 96.47 |
| SVT | 95.36 | 94.90 |
| SVTP | 92.40 | 89.77 |
| Weighted avg. | 93.82 | 93.12 |
U14M Benchmarks
| Subset | w/ pretrain | w/o pretrain |
|---|---|---|
| artistic | 67.78 | 62.11 |
| contextless | 78.95 | 77.43 |
| curve | 82.19 | 78.97 |
| general | 81.07 | 79.96 |
| multi oriented | 82.91 | 78.57 |
| multi words | 76.72 | 74.31 |
| salient | 78.17 | 75.33 |
| Weighted avg. | 81.03 | 79.88 |
Usage
Download and evaluate:
git clone https://github.com/balaboom123/STR-Lite
cd STR-Lite
# Download checkpoint
from huggingface_hub import hf_hub_download
path = hf_hub_download("balaboom123/STRLite", "finetune/checkpoint-best.pth")
# Evaluate
python eval.py \
resume=$path \
test_data_path='[/path/to/lmdb_test]'
Fine-tune from MAE pretrained weights:
path = hf_hub_download("balaboom123/STRLite", "pretrain/checkpoint-last.pth")
python main_finetune.py \
train_data_path='[/path/to/lmdb_train]' \
val_data_path='[/path/to/lmdb_val]' \
pretrained_mae=$path
See the GitHub repo for full installation and dataset preparation instructions.