aya-enes-B4

English -> Spanish translation model derived from CohereForAI/aya-expanse-8b (32 layers, 8B parameters).

Recipe

Baseline: full 32-layer Aya-Expanse 8B, LoRA fine-tuning + knowledge distillation from Aya-Expanse 32B.

  • Number of transformer layers: 32 (of the original 32)
  • Layers removed: none
  • Pruning method: none
  • Fine-tuning: LoRA (r=16, alpha=32), 3 epochs on News Commentary v18 en-es
  • Distillation: synthetic translations from Aya-Expanse 32B, filtered to COMET >= 0.7
  • Precision: fp16

Evaluation

Evaluated on 500 held-out News Commentary v18 en-es sentences.

Metric Value
COMET (wmt22-comet-da) 0.8930
chrF++ 68.08
BLEU 47.35

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

name = "adrianMT56/aya-enes-B4"
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name, dtype=torch.float16)

prompt = ("Translate the following English text to Spanish.\n\n"
          "English: The quick brown fox jumps over the lazy dog.\n"
          "Spanish:")
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

CPU users can omit dtype=torch.float16 (defaults to float32) or leave it as fp16 at the cost of some throughput. For GPTQ 4-bit conversion see the project's scripts/quantize_to_gptq.py.

Reproducibility

This checkpoint was produced by the pipeline at https://github.com/adrianMT56/attention_lp. See README.md in that repo for the full training recipe and evaluation scripts.

Downloads last month
17
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adrianMT56/aya-enes-B4

Finetuned
(81)
this model
Quantizations
1 model