Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories Paper • 2606.03979 • Published 2 days ago • 20 • 7
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories Paper • 2606.03979 • Published 2 days ago • 20 • 7
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories Paper • 2606.03979 • Published 2 days ago • 20
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published Dec 31, 2025 • 46
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published Dec 31, 2025 • 46
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published Dec 31, 2025 • 46
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71
ATLAS: Learning to Optimally Memorize the Context at Test Time Paper • 2505.23735 • Published May 29, 2025 • 23
ATLAS: Learning to Optimally Memorize the Context at Test Time Paper • 2505.23735 • Published May 29, 2025 • 23 • 3
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published Apr 17, 2025 • 21
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published Apr 17, 2025 • 21
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published Apr 17, 2025 • 21 • 4
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models Paper • 2411.15671 • Published Nov 23, 2024 • 8
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models Paper • 2411.15671 • Published Nov 23, 2024 • 8
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models Paper • 2411.15671 • Published Nov 23, 2024 • 8 • 2
Longhorn: State Space Models are Amortized Online Learners Paper • 2407.14207 • Published Jul 19, 2024 • 18
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models Paper • 2406.04320 • Published Jun 6, 2024 • 10
CAT-Walk: Inductive Hypergraph Learning via Set Walks Paper • 2306.11147 • Published Jun 19, 2023 • 1