Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.11804

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

Paper • 2604.08537 • Published 22 days ago • 9
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Paper • 2502.07408 • Published 15 days ago • 58
Hierarchical Codec Diffusion for Video-to-Speech Generation

Paper • 2604.15923 • Published 14 days ago • 2
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Paper • 2604.16254 • Published 14 days ago • 3

kai-os/gemma4-31b-Opus-4.6-reasoning

Text Generation • Updated 28 days ago • 503 • 160
nvidia/Gemma-4-31B-IT-NVFP4

Text Generation • 21B • Updated 13 days ago • 1.98M • 434
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Paper • 2604.11804 • Published 18 days ago • 70
moonshotai/Kimi-K2.6

Image-Text-to-Text • 1.1T • Updated about 18 hours ago • 591k • • 1.16k

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18, 2025 • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12, 2025 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9, 2025 • 14

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 94
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Paper • 2604.11804 • Published 18 days ago • 70

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published Mar 26 • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published Mar 27 • 143
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published Mar 26 • 156
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 145

about 7 hours ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

Paper • 2604.08537 • Published 22 days ago • 9
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Paper • 2502.07408 • Published 15 days ago • 58
Hierarchical Codec Diffusion for Video-to-Speech Generation

Paper • 2604.15923 • Published 14 days ago • 2
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Paper • 2604.16254 • Published 14 days ago • 3

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 94
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Paper • 2604.11804 • Published 18 days ago • 70

kai-os/gemma4-31b-Opus-4.6-reasoning

Text Generation • Updated 28 days ago • 503 • 160
nvidia/Gemma-4-31B-IT-NVFP4

Text Generation • 21B • Updated 13 days ago • 1.98M • 434
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Paper • 2604.11804 • Published 18 days ago • 70
moonshotai/Kimi-K2.6

Image-Text-to-Text • 1.1T • Updated about 18 hours ago • 591k • • 1.16k

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published Mar 26 • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published Mar 27 • 143
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published Mar 26 • 156
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 145

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18, 2025 • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89

about 7 hours ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 172
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

Video understanding

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Paper • 2407.19985 • Published Jul 29, 2024 • 37
TPDiff: Temporal Pyramid Video Diffusion Model

Paper • 2503.09566 • Published Mar 12, 2025 • 45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO

Paper • 2506.07464 • Published Jun 9, 2025 • 14

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs