3 11 5

Chaoyou Fu

BradyFU

https://bradyfu.github.io/

BradyFU

AI & ML interests

Multimodal LLMs

Recent Activity

upvoted a paper 1 day ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

authored a paper 1 day ago

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

authored a paper 1 day ago

A Survey on Multimodal Large Language Models

View all activity

Organizations

authored 10 papers 1 day ago

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Paper • 2411.00774 • Published Nov 1, 2024

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published Jan 3, 2025 • 47

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Paper • 2502.05177 • Published Feb 7, 2025 • 2

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Paper • 2505.03739 • Published May 6, 2025 • 9

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Paper • 2604.03016 • Published 7 days ago • 30

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 4 days ago • 213

submitted a paper to Daily Papers 1 day ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 4 days ago • 213

submitted a paper to Daily Papers 16 days ago

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Paper • 2603.22285 • Published 17 days ago • 49

authored a paper 29 days ago

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

Paper • 2603.06577 • Published Mar 6 • 48

submitted a paper to Daily Papers 29 days ago

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

Paper • 2603.06577 • Published Mar 6 • 48

authored a paper over 1 year ago

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Paper • 2412.09283 • Published Dec 12, 2024 • 19

authored a paper almost 2 years ago

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26

authored a paper over 2 years ago

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Paper • 2312.12436 • Published Dec 19, 2023 • 15

Chaoyou Fu

AI & ML interests

Recent Activity

Organizations

BradyFU's activity