TartarusXXX
/

en-ckb-marian

Central Kurdish

Model card Files Files and versions

en-ckb Marian model

This repository contains a raw Marian NMT model for English (en) -> Sorani Kurdish (ckb) trained from the local en-ckb directory of the OPUS-MT training workspace.

Model summary

Direction: en -> ckb
Architecture: Marian transformer
Subword setup: SentencePiece spm4k-spm4k
Primary uploaded checkpoint: best-chrf
Training dataset selection: InterdialectCorpus Tatoeba wikimedia tico-19 navinaananthan_kurdish_sorani_parallel_corpus
Validation set: openlanguagedata_flores_plus
Test set recipe: openlanguagedata_flores_plus

Best validation metrics seen in training logs

BLEU: 14.2475 at epoch 42 / update 55000
chrF: 45.1146 at epoch 44 / update 58000
Perplexity: 8.6557 at epoch 31 / update 40000

Files

translate_with_marian.py: standalone inference helper for downloaded snapshots
curated-floresdev.spm4k-spm4k.vocab.yml: Marian vocabulary
opus.src.spm4k-model: source SentencePiece model
opus.trg.spm4k-model: target SentencePiece model
Decoder config(s) and checkpoint(s):
best-chrf: curated-floresdev.spm4k-spm4k.transformer.model1.npz.best-chrf.npz

Usage

This is a raw Marian model, not a Transformers conversion. To run it you need marian-decoder and spm_encode available locally.

Example:

python translate_with_marian.py input.txt -o output.txt --checkpoint best-chrf

You can also point to custom binaries:

python translate_with_marian.py input.txt -o output.txt \
  --marian-decoder /path/to/marian-decoder \
  --spm-encode /path/to/spm_encode

Notes

The decoder configs in this repo were rewritten to use relative paths so they work from a downloaded Hub snapshot.
Review dataset and license compatibility before redistributing the model publicly.

Downloads last month: -; Downloads are not tracked for this model. How to track