CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
• Updated
• 1.54k • 1.32k
• 18
Updated
• 92
• 6
rootsautomation/RICO-ScreenQA
Viewer
• Updated
• 86k • 189
• 11
rootsautomation/ScreenSpot
Viewer
• Updated
• 1.27k • 1.33k
• 44
Viewer
• Updated
• 1.27k • 890
• 8
Viewer
• Updated
• 1.59k • 2.05k
• 44
Preview
• Updated
• 1.71k
• 15
Preview
• Updated
• 4.25k
• 25
Viewer
• Updated
• 168k • 285
• 5
Preview
• Updated
• 12
osunlp/Multimodal-Mind2Web
Viewer
• Updated
• 14.2k • 3.44k
• 91
Viewer
• Updated
• 259 • 136
• 2
Viewer
• Updated
• 253 • 3.6k
• 123
Viewer
• Updated
• 7.74k • 4.16k
• 26
xlangai/ubuntu_osworld_file_cache
Updated
• 307k
• 3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published
• 48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
• 2405.14573
• Published
Viewer
• Updated
• 1.21k • 144
• 5