Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts Paper • 2601.22156 • Published 2 days ago • 5
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning Paper • 2601.22069 • Published 2 days ago • 7
LoL: Longer than Longer, Scaling Video Generation to Hour Paper • 2601.16914 • Published 8 days ago • 15
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 2 days ago • 54
Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published 4 days ago • 15
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection Paper • 2601.19375 • Published 4 days ago • 5
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models Paper • 2601.19834 • Published 4 days ago • 24
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published 7 days ago • 22
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security Paper • 2601.18491 • Published 5 days ago • 120
One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment Paper • 2601.18731 • Published 5 days ago • 7
IVRA: Improving Visual-Token Relations for Robot Action Policy with Training-Free Hint-Based Guidance Paper • 2601.16207 • Published 9 days ago • 7
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts Paper • 2601.17111 • Published 8 days ago • 5