-
CUGA Agent
🤖99Configurable Generalist Agent, leader in AppWorld Benchmark
-
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome
Paper • 2603.28407 • Published • 70 -
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
Paper • 2604.04323 • Published • 41
David PRO
AustinOS
·
AI & ML interests
yes
Recent Activity
updated a collection 22 days ago
good updated a collection 22 days ago
good liked a model 22 days ago
deepseek-ai/DeepSeek-V3.2Organizations
None yet