ResearchGym: Evaluating Language Model Agents on Real-World AI Research Paper • 2602.15112 • Published 9 days ago • 20
Jais-2-Family Collection The 2nd generation of the Jais Large Language Models Family • 4 items • Updated 5 days ago • 13
view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 14 days ago • 29
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 22 days ago • 79