-
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Paper • 2604.08537 • Published • 9 -
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Paper • 2502.07408 • Published • 58 -
Hierarchical Codec Diffusion for Video-to-Speech Generation
Paper • 2604.15923 • Published • 2 -
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
Paper • 2604.16254 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2604.11804
-
kai-os/gemma4-31b-Opus-4.6-reasoning
Text Generation • Updated • 503 • 160 -
nvidia/Gemma-4-31B-IT-NVFP4
Text Generation • 21B • Updated • 1.98M • 434 -
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
Paper • 2604.11804 • Published • 70 -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 591k • • 1.16k
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 7 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 158 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 143 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 156 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 145
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Paper • 2604.08537 • Published • 9 -
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Paper • 2502.07408 • Published • 58 -
Hierarchical Codec Diffusion for Video-to-Speech Generation
Paper • 2604.15923 • Published • 2 -
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
Paper • 2604.16254 • Published • 3
-
kai-os/gemma4-31b-Opus-4.6-reasoning
Text Generation • Updated • 503 • 160 -
nvidia/Gemma-4-31B-IT-NVFP4
Text Generation • 21B • Updated • 1.98M • 434 -
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
Paper • 2604.11804 • Published • 70 -
moonshotai/Kimi-K2.6
Image-Text-to-Text • 1.1T • Updated • 591k • • 1.16k
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 143 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 156 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 145
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 7 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 158 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
Wolf: Captioning Everything with a World Summarization Framework
Paper • 2407.18908 • Published • 32 -
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper • 2407.19985 • Published • 37 -
TPDiff: Temporal Pyramid Video Diffusion Model
Paper • 2503.09566 • Published • 45 -
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Paper • 2506.07464 • Published • 14