SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving Paper • 2505.23932 • Published May 29, 2025
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models Paper • 2602.04163 • Published 27 days ago • 10
The Art of Efficient Reasoning: Data, Reward, and Optimization Paper • 2602.20945 • Published 6 days ago • 5
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published Jan 14 • 15
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published Jan 14 • 15
Revisiting Model Interpolation for Efficient Reasoning Paper • 2510.10977 • Published Oct 13, 2025 • 10
Timber: Training-free Instruct Model Refining with Base via Effective Rank Paper • 2509.23595 • Published Sep 28, 2025 • 1
LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation Paper • 2501.12976 • Published Jan 22, 2025
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21, 2025 • 49
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models Paper • 2411.06839 • Published Nov 11, 2024 • 1
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities Paper • 2212.06385 • Published Dec 13, 2022
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer Paper • 2304.05659 • Published Apr 12, 2023
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast Paper • 2405.14507 • Published May 23, 2024
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models Paper • 2404.02657 • Published Apr 3, 2024 • 2
Weight-Inherited Distillation for Task-Agnostic BERT Compression Paper • 2305.09098 • Published May 16, 2023