Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.15031

WTF GENIUS PAPERS

Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models.

about 11 hours ago

Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published Aug 27, 2025 • 27
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Paper • 2601.06431 • Published Jan 10 • 12
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published Jan 14 • 63

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published 23 days ago • 136
Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176
MOSS-TTS Technical Report

Paper • 2603.18090 • Published 23 days ago • 11
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published Mar 6 • 46

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published about 1 month ago • 150
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Paper • 2603.12228 • Published 28 days ago • 12
Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 50
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

Paper • 2410.16144 • Published Oct 21, 2024 • 5

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

Model_Architecture

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

Frontier Research Papers

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published Oct 22, 2025 • 117
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 132

The Last Prism • Corpus

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176
Mixture-of-Depths Attention

Paper • 2603.15619 • Published 24 days ago • 79
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published 24 days ago • 28

about 19 hours ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 102
Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 64
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Paper • 2603.03205 • Published Mar 3 • 13
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Paper • 2602.23166 • Published Feb 26 • 45

WTF GENIUS PAPERS

Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models.

about 11 hours ago

Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published Aug 27, 2025 • 27
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Paper • 2601.06431 • Published Jan 10 • 12
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published Jan 14 • 63

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

Model_Architecture

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published 23 days ago • 136
Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176
MOSS-TTS Technical Report

Paper • 2603.18090 • Published 23 days ago • 11
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published Mar 6 • 46

Frontier Research Papers

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

Paper • 2510.19338 • Published Oct 22, 2025 • 117
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 132

Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176

The Last Prism • Corpus

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Attention Residuals

Paper • 2603.15031 • Published 25 days ago • 176
Mixture-of-Depths Attention

Paper • 2603.15619 • Published 24 days ago • 79
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published 24 days ago • 28

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published about 1 month ago • 150
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Paper • 2603.12228 • Published 28 days ago • 12
Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 50
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

Paper • 2410.16144 • Published Oct 21, 2024 • 5

about 19 hours ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 102
Qwen3-Coder-Next Technical Report

Paper • 2603.00729 • Published Feb 28 • 64
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

Paper • 2603.03205 • Published Mar 3 • 13
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Paper • 2602.23166 • Published Feb 26 • 45

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs