en-ckb Marian model

This repository contains a raw Marian NMT model for English (en) -> Sorani Kurdish (ckb) trained from the local en-ckb directory of the OPUS-MT training workspace.

Model summary

  • Direction: en -> ckb
  • Architecture: Marian transformer
  • Subword setup: SentencePiece spm4k-spm4k
  • Primary uploaded checkpoint: best-chrf
  • Training dataset selection: InterdialectCorpus Tatoeba wikimedia tico-19 navinaananthan_kurdish_sorani_parallel_corpus
  • Validation set: openlanguagedata_flores_plus
  • Test set recipe: openlanguagedata_flores_plus

Best validation metrics seen in training logs

  • BLEU: 14.2475 at epoch 42 / update 55000
  • chrF: 45.1146 at epoch 44 / update 58000
  • Perplexity: 8.6557 at epoch 31 / update 40000

Files

  • translate_with_marian.py: standalone inference helper for downloaded snapshots
  • curated-floresdev.spm4k-spm4k.vocab.yml: Marian vocabulary
  • opus.src.spm4k-model: source SentencePiece model
  • opus.trg.spm4k-model: target SentencePiece model
  • Decoder config(s) and checkpoint(s):
  • best-chrf: curated-floresdev.spm4k-spm4k.transformer.model1.npz.best-chrf.npz

Usage

This is a raw Marian model, not a Transformers conversion. To run it you need marian-decoder and spm_encode available locally.

Example:

python translate_with_marian.py input.txt -o output.txt --checkpoint best-chrf

You can also point to custom binaries:

python translate_with_marian.py input.txt -o output.txt \
  --marian-decoder /path/to/marian-decoder \
  --spm-encode /path/to/spm_encode

Notes

  • The decoder configs in this repo were rewritten to use relative paths so they work from a downloaded Hub snapshot.
  • Review dataset and license compatibility before redistributing the model publicly.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support