LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 3 days ago • 17
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini Paper • 2605.27295 • Published 2 days ago • 6
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 2 days ago • 91
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion Paper • 2605.23902 • Published 6 days ago • 36
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 15 days ago • 157
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization Paper • 2605.10780 • Published 16 days ago • 33
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 16 days ago • 187
δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 16 days ago • 122
Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions Paper • 2604.23774 • Published 29 days ago • 17
Let ViT Speak: Generative Language-Image Pre-training Paper • 2605.00809 • Published 27 days ago • 33
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons Paper • 2604.28130 • Published 28 days ago • 22