Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
prometheus-eval
university
Activity Feed
Follow
111
AI & ML interests
None defined yet.
Recent Activity
seungone
authored
a paper
33 minutes ago
Reasoning over mathematical objects: on-policy reward modeling and test time aggregation
seungone
authored
a paper
39 minutes ago
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
amphora
submitted
a paper
1 day ago
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
View all activity
Team members
56
+22
+9
prometheus-eval
's Spaces
2
Sort: Recently updated
Running
Agents
16
BiGGen Bench Leaderboard
😻
Display model performance leaderboard
Running
README
🐨