Leaderboards and Evaluations

The Hub contains leaderboards and evaluations for machine learning models, including LLMs, chatbots, and more. There are three types of leaderboards:

Eval Results from official benchmark datasets like GPQA, MMLU-Pro, or other datasets used in academic papers. When results are published in model repositories, the scores are are shown on the model page.
Community Managed Leaderboards live on Spaces and are managed by the community for specific use cases.
Open LLM Leaderboard was a project curated by the Hugging Face team to evaluate and rank open source LLMs and chatbots, and provide reproducible scores separating marketing fluff from actual progress in the field.

Eval Results