Nathan Habib PRO
AI & ML interests
Evals
Recent Activity
new activity about 22 hours ago
InternScience/ResearchClawBench:Benchmark allow-list request for ResearchClawBench new activity about 22 hours ago
North-ML1/CodeBench-30:Request to enable Benchmark status for CodeBench-30 leaderboard upvoted a paper about 23 hours ago
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?Organizations
RULER Datasets Falcon-H1-3B-Base
RULER Datasets
-
lighteval/RULER-131072-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 123 -
lighteval/RULER-65536-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 143 -
lighteval/RULER-32768-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 59 -
lighteval/RULER-16384-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 174
RULER Datasets Qwen2.5-Instruct
RULER Datasets
RULER Datasets Qwen-3
RULER Datasets
Agents ressources
All the ressources I found / used when getting up to speed with agents.
benchmarks
RULER Datasets Falcon-H1-3B-Base
RULER Datasets
-
lighteval/RULER-131072-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 123 -
lighteval/RULER-65536-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 143 -
lighteval/RULER-32768-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 59 -
lighteval/RULER-16384-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 174
RULER Datasets Lamma3-Instruct
RULER Datasets
RULER Datasets Qwen2.5-Instruct
RULER Datasets
RULER Datasets Qwen-3-Instruct
RULER Datasets
RULER Datasets Qwen-3
RULER Datasets
agents
Agents ressources
All the ressources I found / used when getting up to speed with agents.