OPDLM Collection Data and checkpoints for Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation • 15 items • Updated 14 days ago • 2
divelab/combined_gsm8k_math_dataset_dapo_math_17k_Qwen3-4B_ntokens2048_sft Viewer • Updated Mar 21 • 186k • 31
divelab/combined_gsm8k_math_dataset_dapo_math_17k_Qwen3-4B_ntokens2048_sft Viewer • Updated Mar 21 • 186k • 31
jacob-helwig/Qwen2.5-0.5B-Instruct_countdown2345_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Updated Sep 13, 2025
jacob-helwig/Qwen2.5-1.5B-Instruct_countdown2345_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Updated Sep 13, 2025
jacob-helwig/Qwen2.5-7B-Instruct_countdown2345_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Updated Sep 13, 2025
jacob-helwig/dive7_Qwen2.5-3B-Instruct_countdown2345_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 242k • Updated Sep 13, 2025 • 1