Hitit Çivi Yazısı OCR (Hittite Cuneiform OCR)

Hitit (ve geniş anlamıyla Akad/Mezopotamya) çivi yazısı tabletleri için uçtan uca bir detection + classification pipeline'ı. Depo; bir YOLO tabanlı sign detector ile birden fazla modern görsel omurgayı (DINOv3, ConvNeXt-V2, SigLIP, EVA-02, Swin-V2) birleştiren bir classifier ensemble'ı ve bunların üzerine eklenen dil modeli (KN 5-gram), pair-head, aux-head, conformal prediction, k-NN rerank, ArcFace, SupCon, DRW, LDAM, Focal-SAM gibi ileri yöntemleri içerir.

Hedef: 198 sınıflı Hitit ABZ işaret sınıflandırması (val n=3570, tablet-wise fold). Baseline: DINOv3-ViT-B v2 EMA = 0.634 → Solo rekor: ConvNeXt-V2-L v13b EMA = 0.913 (+27.9 puan).

📊 Ana Sonuçlar (validation top-1, tablet_view_fold=0)

Model	val_top1 (EMA)	Not
v13b ConvNeXt-V2-L	0.913 🏆	Solo rekor
v12 DINOv3 Ultimate	0.908
v13c DINOv3-B	0.877
v13a DINOv3-L	0.857
v14b SigLIP2	— (lin. probe)	Zero-shot / head
Fused ensemble (5-model)	0.9031	+ KN 5-gram LM + fuse_final
Selective @ τ=0.9	0.9646	Coverage düşürülerek
Baseline (v2 DINOv3-B EMA)	0.634	Referans

Ensemble; probs füzyonu + KN 5-gram dil modeli + pair-head + aux-head + conformal ile kuruluyor (detay: code/PAPER_METHODS.md, code/RESULTS_SUMMARY.md).

🏗️ Mimari

Detection (ilk aşama)

YOLO11 / YOLOv8 (Ultralytics), tablet-yalnızca stratified split (detection_tablets/)
COCO format pipeline (convert_to_coco.py)
28k label'lı imageden 927 gerçek tablet + maicubeda single-sign crops

Classification (ana aşama)

Omurgalar: DINOv3-ViT-B/L/14 (ImageNet & self-distill), ConvNeXt-V2-L, SigLIP2-SO400M, EVA-02-L @448, Swin-V2-L
Başlar: CE + label-smoothing, LDAM, ArcFace, Prototype-CE, SupCon aux
Optimizasyon: AdamW, DRW (class-balanced weight), SAM / Focal-SAM, EMA, SWA
Adaptasyon: LoRA kısmi unfreeze, cRT (classifier re-training), linear probe
Test-time: TTA (x5), MC-Dropout, TSC (temperature + logit scale), selective prediction

Post-processing & füzyon

Ensemble: probs-level füzyon (learned weights), swa/tta variants
LM rescoring: KN 5-gram sign-language model (text corpus 97,641 rec)
Pair-head: bigram co-occurrence head (+1.29 pt)
Aux-head: sign-class ontology
k-NN rerank (v4): DINOv3 features üzerinde
Conformal: coverage-aware selective prediction

Veri artırma & pseudo-labeling

Stratified folds (40 unseen class fix), Tablet-LOO split
Elastic + color jitter classical tail-aug
DataDream diffusion tail-aug (SDXL img2img)
ProtoSnap (ICLR'25 ControlNet cuneiform synth) — 925 prototipten
Cleanlab confident learning (2 iter: 744 → 943 noisy)
Unsup cluster anchor (anahtar katkı): K=400, 36,177 pseudo-label
Ensemble-confidence pseudo-label, soft pseudo-label (Roll-with-Punches)

📁 Repo İçeriği

code/                               Saf kod (2.1 MB)
├── src/
│   ├── train_classification.py     Ana classification trainer (CE/LDAM/ArcFace/SupCon/…)
│   ├── train_yolo*.py              Detection trainer
│   ├── inference.py                E2E inference (detect → crop → classify → LM)
│   ├── benchmark.py                Değerlendirme
│   ├── seq2seq/                    Görsel → Latin translit (T5 + ViT encoder)
│   ├── lm/                         KN 5-gram + token-level LM
│   ├── enhancements/               40+ modül: arcface, tail-aug, cleanlab, protosnap,
│   │                               unsup_cluster_anchor, active_learning, …
│   ├── preprocessing/              Folds, stratification, COCO conversion
│   ├── tlhdig_integration/         TLHdig corpus bağlayıcı
│   └── analysis/                   Error analysis, confusion grids
├── configs/                        data.yaml, detection.yaml, classification.yaml
├── scripts/
│   ├── pipeline_h100/              H100 cluster için SLURM scriptleri
│   ├── pipeline_p5/                Alternatif GPU pipeline
│   └── train_yolo11_slurm.sh       Detection job
└── *.md                            PIPELINE, METHODOLOGY, BEYOND_95, TLHDIG_EXPLOITATION,
                                    PAPER_METHODS, RESULTS_SUMMARY

weights/                            Seçilmiş rekor ağırlıklar (~5.1 GB)
├── v13b_convnextv2l/best_ema.pt    751 MB — 🏆 solo rekor 0.913
├── v12_dinov3_ultimate/best_ema.pt 1.2 GB — 0.908
├── v13a_dinov3l/best_ema.pt        1.2 GB — 0.857
├── v13c_dinov3b/best_ema.pt        328 MB — 0.877
├── v14b_siglip/best_ema.pt         1.6 GB — SigLIP2 head
└── fuse/
    ├── fuse_final_probs.pt         Ensemble prob-füzyon ağırlıkları
    └── fuse_final_lm.pt            KN 5-gram LM (final)

Tam ckpt envanteri (50+ model, linear probe, cRT, pair/aux/arcface heads, distill students, posthoc) bu repoya dahil edilmedi; istek üzerine yüklenebilir.

🚀 Hızlı Başlangıç

from huggingface_hub import hf_hub_download
import torch

ckpt_path = hf_hub_download(
    repo_id="savastakan/hitit-cuneiform-ocr",
    filename="weights/v13b_convnextv2l/best_ema.pt",
)
state = torch.load(ckpt_path, map_location="cpu", weights_only=False)
# state["model"], state["cfg"], state["class_to_idx"], state["ema_shadow"]

Tam inference için code/src/inference.py kullanılır (detector + classifier + LM füzyonu).

⚙️ Teknik Detaylar

Python 3.12, PyTorch 2.x, timm, Ultralytics YOLO, Transformers
SLURM ile H100/H200 cluster üzerinde eğitim (ai-tools-kolyoz-1.0 env)
Ortalama tek-omurga FT süresi: 6–12 saat (RTX H100)
Toplam submit: ~50 SLURM job, 13+ saat session
Class count: 198 Hitit ABZ işareti

📚 Veri Kaynakları

Veri bu repo'ya dahil değildir (boyut + lisans). Birleştirilen 12+ kaynak:

TLHdig — Hittite digital corpus (text-level)
Mainz Cuneiform — annotated tablet photos
Maicube CDA — single-sign crops
EBL LMU — Electronic Babylonian Library
cuneiML — synthetic cuneiform dataset
HPM, Cuka (boş/tarball), Santakku (ProtoSnap prototypes)
v12 integrated: 194,809 record; text corpus: 97,641 record

🧪 Değerlendirme

Fold: tablet_view_fold=0, tablet-wise (tablet seviyesinde leakage yok)
Randsplit: ayrıca 80/20 random split ile çapraz doğrulama
Metric: top-1, top-5, selective @τ, per-class macro-F1
Conformal: coverage ≥ 0.9 iken selective accuracy 0.9646

Detaylı sonuç tablosu için: code/RESULTS_SUMMARY.md, yöntem detayları için code/PAPER_METHODS.md.

📖 Atıf

@misc{hitit-cuneiform-ocr-2026,
  author       = {Savaş Takan},
  title        = {Hitit Çivi Yazısı OCR — Detection + Classification Ensemble},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/savastakan/hitit-cuneiform-ocr}},
  note         = {Solo record: ConvNeXt-V2-L EMA=0.913 on 198-class Hittite ABZ}
}

📄 Lisans

Kod: Apache-2.0
Ağırlıklar: Apache-2.0 (bu repo'da paylaşılan ckpt'ler için)
Pretrained backbone'lar: kendi lisans koşullarına tabidir
- DINOv3 (Meta), ConvNeXt-V2 (Meta/Facebook), SigLIP2 (Google), EVA-02, YOLO11 (Ultralytics AGPL-3.0)
Veriseti: ayrı; her kaynağın kendi lisansı geçerlidir

🙏 Teşekkür

TAU-VAILab'a ProtoSnap için
Meta AI Research'e DINOv3 ve ConvNeXt-V2 için
Mainz & LMU ekiplerine dijital çivi yazısı corpus'ları için
Ultralytics'e YOLO için
ARF Kolyoz / H100 cluster operasyon ekibine

Son güncelleme: 2026-04-22

Downloads last month: -; Downloads are not tracked for this model. How to track