Perry the Platypus's picture

Perry the Platypus PRO

AgPerry

·

AI & ML interests

None yet

Recent Activity

updated a Space 8 days ago

TIGER-Lab/ClawBench

updated a dataset 8 days ago

TIGER-Lab/ClawBenchV2Trace

updated a dataset 8 days ago

NAIL-Group/ClawBenchV2Trace

View all activity

Organizations

upvoted a paper 16 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published 24 days ago • 9

upvoted 4 collections 21 days ago

eval-papers-collection

8 items • Updated Apr 13 • 1

Reading list

5 items • Updated 22 days ago • 1

Papers

4 items • Updated Apr 28 • 1

ClawBench — Browser Agent Benchmark Suite

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 21 days ago • 1

upvoted 2 papers 24 days ago

Dr. Bench: A Multidimensional Evaluation for Deep Research Agents, from Answers to Reports

Paper • 2510.02190 • Published Jan 29 • 20

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Paper • 2605.05242 • Published 30 days ago • 119

upvoted a paper 27 days ago

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Paper • 2604.28185 • Published Apr 30 • 90

upvoted 2 papers about 1 month ago

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Paper • 2604.24763 • Published Apr 27 • 71

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published Mar 13, 2025 • 25

upvoted 4 collections about 1 month ago

Vision

38 items • Updated 3 days ago • 2

Saved

5 items • Updated Apr 10 • 1

Paper

133 items • Updated Apr 23 • 2

Video understanding

53 items • Updated 3 days ago • 5

upvoted 2 collections about 2 months ago

tanosi

3 items • Updated Apr 13 • 1

To read

226 items • Updated Apr 23 • 5

upvoted 3 papers about 2 months ago

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 263

Watch Before You Answer: Learning from Visually Grounded Post-Training

Paper • 2604.05117 • Published Apr 6 • 36

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 236

upvoted a collection about 2 months ago

AI Paper of the Day

A collection of papers that I think are interesting, one added each day • 640 items • Updated 7 days ago • 97