Gemma-2 DPO Fine-tuned Model

  • Developed by: Phantomcloak19
  • License: Apache-2.0
  • Base model: unsloth/gemma-2-2b-bnb-4bit
  • Training framework: Unsloth + TRL (DPO)

This Gemma-2 (2B) model has been fine-tuned using Direct Preference Optimization (DPO) to reduce hallucinations and improve factual consistency, trained on the Unified Hallucination Benchmark.

The model was trained ~2× faster using Unsloth, enabling efficient low-VRAM fine-tuning.

Downloads last month
4
Safetensors
Model size
3B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Phantomcloak19/gemma-dpo-full

Quantized
(61)
this model

Dataset used to train Phantomcloak19/gemma-dpo-full