Gemma-2 DPO Fine-tuned Model

Developed by: Phantomcloak19
License: Apache-2.0
Base model: unsloth/gemma-2-2b-bnb-4bit
Training framework: Unsloth + TRL (DPO)

This Gemma-2 (2B) model has been fine-tuned using Direct Preference Optimization (DPO) to reduce hallucinations and improve factual consistency, trained on the Unified Hallucination Benchmark.

The model was trained ~2× faster using Unsloth, enabling efficient low-VRAM fine-tuning.

Downloads last month: 4

Safetensors

Model size

3B params

Tensor type

F32

F16

Model tree for Phantomcloak19/gemma-dpo-full

Base model

google/gemma-2-2b

Quantized

unsloth/gemma-2-2b-bnb-4bit

Quantized

(61)

this model

Phantomcloak19
/

gemma-dpo-full

Gemma-2 DPO Fine-tuned Model

Model tree for Phantomcloak19/gemma-dpo-full

Dataset used to train Phantomcloak19/gemma-dpo-full