arxiv:2502.18293
Taneesh Gupta
gupta-tanish
·
AI & ML interests
Post-Training @MicrosoftResearch
Organizations
None yet
models 26
gupta-tanish/llama-off-policy-qwq-10k-perturbation-iter1
Text Generation • 8B • Updated
• 2
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-1.0-iteration2
Text Generation • 8B • Updated
• 1
gupta-tanish/llama-3-8b-instruct-refa-budget_length-256-lamda-20.0-iteration1
Text Generation • 8B • Updated
• 1
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-1.0-eos-increase-iteration2-lamda-0.1
Text Generation • 8B • Updated
• 1
gupta-tanish/llama-3-8b-instruct-refa-lr-1e-6-beta10-gamma4-lambda-0.1-eos-increase-iteration2
Text Generation • 8B • Updated
• 1
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.001-lr-1e-6-iteration1
Text Generation • 8B • Updated
• 3
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.01-lr-1e-6-iteration1
Text Generation • 8B • Updated
• 2
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-0.1-lr-1e-6-iteration1
Text Generation • 8B • Updated
• 2
gupta-tanish/llama3-8b-instruct-refa-eos-increase-lamda-1.0-lr-1e-6-iteration1
Text Generation • 8B • Updated
• 2
gupta-tanish/mistral-7b-instruct-refa-iteration2
Text Generation • 7B • Updated
• 7
datasets 188
gupta-tanish/llama3-8b-instruct-on-policy-std-4
Viewer
• Updated
• 60.8k • 9
gupta-tanish/llama3-8b-instruct-on-policy-with-stats
Viewer
• Updated
• 60.8k • 22
gupta-tanish/llama3-8b-instruct-on-policy-noise-std-2.0
Viewer
• Updated
• 60.8k • 14
gupta-tanish/llama3-8b-instruct-on-policy-noise-std-1.0
Viewer
• Updated
• 60.8k • 13
gupta-tanish/llama3-8b-instruct-on-policy-noise-std-0.5
Viewer
• Updated
• 60.8k • 16
gupta-tanish/llama3-8b-instruct-on-policy-GRM
Viewer
• Updated
• 61.1k • 31
gupta-tanish/llama3-8b-instruct-on-policy-ArmoRM
Viewer
• Updated
• 61.8k • 15
gupta-tanish/llama3-8b-instruct-on-policy-PairRM
Viewer
• Updated
• 61.8k • 5
gupta-tanish/Qwen2.5-math-1.5B-Instruct_method_cpo_iteration_5
Viewer
• Updated
• 2.17k • 4
gupta-tanish/Qwen2.5-math-1.5B-Instruct_method_cpo_iteration_4
Viewer
• Updated
• 2.11k • 4