The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 11 days ago • 195
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published about 1 month ago • 35
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 24 days ago • 43
Unified Latents (UL): How to train your latents Paper • 2602.17270 • Published 18 days ago • 57
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published 19 days ago • 12
jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published 20 days ago • 26
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published 25 days ago • 20
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 24 days ago • 54
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 62
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published Jan 6 • 159
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published Jan 14 • 26
lovis93/next-scene-qwen-image-lora-2509 Image-to-Image • Updated Oct 21, 2025 • 35.7k • • 586
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 229
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 168