RyanYr/pg_sais-dapo_shuffled-offline-grpo_qwen2.5-math-1.5B_kl_bl0_matheval Updated about 1 month ago • 60
RyanYr/pg_sais-dapo_shuffled-0_offline-grpo_qwen2.5-math-1.5B_kl_matheval Updated about 1 month ago • 55
RyanYr/pg-dapo_shuffled-01_offline-grpo_qwen2.5-math-1.5B_piref_nokl_matheval Updated about 1 month ago • 267
RyanYr/pg-dapo_shuffled-0_offline-grpo_qwen2.5-math-1.5B_piref_nokl_matheval Updated about 1 month ago • 51
RyanYr/pg-dapo_shuffled-01_offline-grpo_qwen2.5-math-1.5B_piref_kl_behavior_matheval Updated about 1 month ago • 49
RyanYr/pg-dapo_shuffled-0_offline-grpo_qwen2.5-math-1.5B_piref_kl_matheval Updated about 1 month ago • 48
RyanYr/pg_trajis-dapo_shuffled-offline-grpo_qwen2.5-math-1.5B_piref_matheval Updated about 1 month ago • 56
RyanYr/pg-dapo_shuffled-01_offline-grpo_qwen2.5-math-1.5B_piref_kl_matheval Updated about 1 month ago • 51
RyanYr/pg_sais-dapo_shuffled-01_offline-grpo_qwen2.5-math-1.5B_kl_matheval Updated about 1 month ago • 49
RyanYr/pg-dapo_shuffled-0_offline-grpo_qwen2.5-math-1.5B_piref_kl_behavior_matheval Updated about 1 month ago • 260
RyanYr/pg_sais-dapo_shuffled-offline-grpo_qwen2.5-math-1.5B_piref_matheval Updated about 1 month ago • 118
RyanYr/pg-dapo_shuffled-10_offline-grpo_qwen2.5-math-1.5B_piref_nokl_matheval Viewer • Updated about 1 month ago • 1.55k • 34
RyanYr/pg-dapo_shuffled-10_offline-grpo_qwen2.5-math-1.5B_piref_kl_matheval Viewer • Updated about 1 month ago • 1.55k • 33
RyanYr/pg-dapo_shuffled-10_offline-grpo_qwen2.5-math-1.5B_piref_kl_behavior_matheval Viewer • Updated about 1 month ago • 1.55k • 16
RyanYr/pg-dapo_shuffled-01_offline-pg-dapo-qwen3-4B-Base-mbs128-n4_kl_matheval Viewer • Updated May 2 • 18.6k • 269