ThinkTwice-Olmo3-7B-Instruct

This model is fine-tuned from allenai/Olmo-3-7B-Instruct using the ThinkTwice framework.

Paper: ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement (arXiv: 2604.01591)

Code: https://github.com/CSSLab/ThinkTwice

Overview

ThinkTwice is a simple two-phase GRPO-based framework that jointly trains LLMs to (1) solve reasoning problems and (2) refine their own solutions. In each pair of training steps, the model is first optimized on solving a reasoning problem, then optimized on refining its own solution to the same problem — using the same binary correctness reward in both phases, with no correctness signals or critique annotations required.

ThinkTwice reveals an implicit rectify-then-fortify curriculum: early in training, refinement predominantly corrects errors; as the model improves, it naturally shifts toward preserving already-correct solutions, yielding a more rectified reward signal.

Usage

This model supports both direct solving and self-refinement. Use it in two passes:

  1. Solve: prompt the model with the problem to get an initial answer.
  2. Self-Refine: prompt the model with the problem + its initial solution to get a refined answer.

See the GitHub repository for full usage instructions and evaluation scripts.

Citation

@article{jiao2026thinktwice,
  title={ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement},
  author={Jiao, Difan and Wen, Qianfeng and Yang, Blair and Tang, Zhenwei and Anderson, Ashton},
  journal={arXiv preprint arXiv:2604.01591},
  year={2026}
}
Downloads last month
34
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for difanjiao/ThinkTwice-Olmo3-7B-Instruct

Quantizations
2 models

Paper for difanjiao/ThinkTwice-Olmo3-7B-Instruct