VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting Paper • 2603.14659 • Published 26 days ago • 6
VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting Paper • 2603.14659 • Published 26 days ago • 6
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning Paper • 2506.03525 • Published Jun 4, 2025 • 6
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos Paper • 2512.01707 • Published Dec 1, 2025 • 8
EgoLCD: Egocentric Video Generation with Long Context Diffusion Paper • 2512.04515 • Published Dec 4, 2025 • 6
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published Dec 4, 2025 • 50
PRInTS: Reward Modeling for Long-Horizon Information Seeking Paper • 2511.19314 • Published Nov 24, 2025 • 8
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos Paper • 2512.01707 • Published Dec 1, 2025 • 8
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos Paper • 2512.01707 • Published Dec 1, 2025 • 8 • 2
StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published Oct 10, 2025 • 52
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning Paper • 2506.03525 • Published Jun 4, 2025 • 6
Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Paper • 2504.13143 • Published Apr 17, 2025 • 7