Kiri OCR Model
Kiri OCR is a lightweight OCR library for English and Khmer documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
β¨ Key Features
- Lightweight: Compact model optimized for speed and efficiency
- Bilingual: Native support for English and Khmer (including mixed text)
- Document Processing: Automatic text line and word detection
- Hybrid Decoding: CTC + Attention decoder with language model fusion
ποΈ Architecture
| Component | Details |
|---|---|
| Type | Transformer Encoder-Decoder with CTC |
| Encoder | 4 layers, 8 heads, 256 dim, 1024 FFN |
| Decoder | 3 layers, 8 heads, 256 dim, 1024 FFN |
| CNN Backbone | ConvStem (4 conv layers with BatchNorm + SiLU) |
| Decoding | Beam search with CTC fusion + LM fusion |
| Input Size | 48 Γ 640 px (height Γ width) |
| Framework | PyTorch |
Model Diagram
Input Image (48Γ640)
β
ConvStem (CNN)
β
2D Positional Encoding
β
Transformer Encoder (4L)
β
βββββ΄ββββ
β β
CTC Head Transformer Decoder (3L)
β β
βββββ¬ββββ
β
Beam Search + CTC Fusion + LM Fusion
β
Output Text
π Dataset
The model is trained on the mrrtmob/khmer_english_ocr_image_line dataset, containing 12 million synthetic images of Khmer and English text lines.
π» Usage
Installation
pip install kiri-ocr
Python API
from kiri_ocr import OCR
# Initialize (downloads from Hugging Face automatically)
ocr = OCR()
# Extract text from document
text, results = ocr.extract_text("document.jpg")
print(text)
# Access detailed results
for result in results:
print(f"Text: {result.text}")
print(f"Confidence: {result.confidence:.2%}")
CLI Tool
# Basic usage
kiri-ocr predict path/to/document.jpg
# With output directory
kiri-ocr predict path/to/document.jpg --output results/
π Benchmarks
Results on synthetic test images (10 popular fonts):
βοΈ Configuration
Default inference parameters:
| Parameter | Value | Description |
|---|---|---|
beam_width |
4 | Beam search width |
ctc_fusion_alpha |
0.5 | CTC score fusion weight |
lm_fusion_alpha |
0.35 | Language model fusion weight |
max_length |
260 | Maximum output sequence length |
π Model Files
kiri-ocr/
βββ config.json # Model configuration
βββ vocab.json # Character vocabulary
βββ model.safetensors # Model weights
βββ README.md # This file
π Links
- GitHub: github.com/mrrtmob/kiri-ocr
- Dataset: mrrtmob/khmer_english_ocr_image_line
- PyPI: pypi.org/project/kiri-ocr
π Citation
@software{kiri_ocr,
author = {mrrtmob},
title = {Kiri OCR: Lightweight OCR for English and Khmer},
year = {2026},
url = {https://huggingface.co/mrrtmob/kiri-ocr}
}
π License
This model is released under the Apache 2.0 License. | Formatting | Inconsistent | Consistent tables and code blocks |
- Downloads last month
- 130

