prometheus-eval

university

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper 33 minutes ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

seungone authored a paper 39 minutes ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

amphora submitted a paper 1 day ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

View all activity

prometheus-eval 's Spaces 2

BiGGen Bench Leaderboard

Display model performance leaderboard

README