LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
Paper • 2601.14251 • Published • 26
This model is a fine-tuned version of lightonai/LightOnOCR-2-1B-base specifically trained for line-level OCR of Old Church Slavonic (OCS) manuscripts.
This is a line-level model - it expects cropped line images as input, not full pages. Each image should contain a single line of text.
Evaluated on 50 samples from the test set:
| Metric | Base Model | Finetuned | Improvement |
|---|---|---|---|
| CER (%) | 141.34 | 5.04 | +136.30 |
| WER (%) | 112.34 | 27.85 | +84.49 |
| Perfect Matches | 0 | 15 | +15 |
Lower CER/WER is better. Higher perfect matches is better.
| # | Ground Truth | Base Model | Finetuned |
|---|---|---|---|
| 1 | дьници въсиꙗють рєчє сл҃нцє ꙗко лоуна∙ ꙗ... | $\text{A} \text{B} \text{C} \text{D} \te... | ✓ дьници въсиꙗють рєчє сл҃нцє ꙗко лоуна∙ ꙗ... |
| 2 | свѣтьлость тврьдьнаѧ∙ и провѣды чл҃чє нє... | СКУТДЛАСТЬТЕРДЕНАД. НПРОВЕДЫ ТЛТНЕЖВЕ | свѣтолость тврьдьнаѧ∙ и провѣды чл҃чє нє... |
| 3 | риѥ б҃ь∙жажєлицємь маломь въ жатвю∙ свѣт... | ОНИКЬ. ЖАЖЕЛНЦЕЛЬДАПОЛЪВЪЖАТБЮ. СЕВЬ | риѥ б҃ь∙ жа жє лицємь маломь въ жатвю∙ с... |
| 4 | лы дасть ꙁарѧ сиѧти ис тѣлєсє∙ да оть ви... | лы дасть ꙁа рѧси ѧти истѣлє сє∙ да оть в... | |
| 5 | ихь вѣроуѥмо боудєть и даѥмоѥ∙ часть бо ... | НЬВЕРОВНЯЛО РОУДЬ ГЪНДА НПЛОЮ. ТАСТЬ БОП... | ихь вѣроуѥмо боудєть и да ѥмоѥ∙ часть бо... |
✓ = exact match
# Requires transformers from source
pip install git+https://github.com/huggingface/transformers
pip install pillow torch
import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from PIL import Image
# Load model and processor
model_id = "wjbmattingly/LightOnOCR-2-1B-old-church-slavonic-line"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
processor = LightOnOcrProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=dtype,
).to(device)
# Load your line image
image = Image.open("your_line_image.jpg").convert("RGB")
# Prepare input
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
text=[text],
images=[[image]],
return_tensors="pt",
padding=True,
size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)
# Generate transcription
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
# Decode output
input_length = inputs["input_ids"].shape[1]
generated_ids = outputs[0, input_length:]
transcription = processor.decode(generated_ids, skip_special_tokens=True)
print(transcription)
# Example output: дьници въсиꙗють рєчє сл҃нцє ꙗко лоуна∙ ꙗко
from datasets import load_dataset
# Load dataset
dataset = load_dataset("wjbmattingly/old-church-slavonic-line", split="train[:10]")
# Process batch
images = [[img.convert("RGB")] for img in dataset["image"]]
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
texts = [text] * len(images)
inputs = processor(
text=texts,
images=images,
return_tensors="pt",
padding=True,
size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
predictions = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
for pred, gt in zip(predictions, dataset["text"]):
print(f"Prediction: {pred}")
print(f"Ground Truth: {gt}")
print()
If you use this model, please cite:
@misc{lightonocr2_ocs_2026,
title = {LightOnOCR Fine-tuned for Old Church Slavonic},
author = {William Mattingly},
year = {2026},
howpublished = {\url{https://huggingface.co/wjbmattingly/LightOnOCR-2-1B-old-church-slavonic-line}}
}
And the original LightOnOCR paper:
@misc{lightonocr2_2026,
title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
year = {2026},
howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}
Base model
lightonai/LightOnOCR-2-1B-base