Self-Fulfilling (Mis)alignment: Post-Trained Models
Collection
Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.
•
22 items
•
Updated
•
1