Tianyu Pang's picture

Tianyu Pang

P2333

·

https://p2333.github.io/

AI & ML interests

Machine Learning

Recent Activity

upvoted a paper 3 days ago

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning

upvoted a paper 21 days ago

Rethinking the Trust Region in LLM Reinforcement Learning

upvoted a paper 3 months ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

View all activity

Organizations

None yet

upvoted a paper 3 days ago

SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning

Paper • 2602.13515 • Published 12 days ago • 43

upvoted a paper 21 days ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published 21 days ago • 34

upvoted a paper 3 months ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 105

upvoted a paper 4 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 129

upvoted 5 papers 5 months ago

Imperceptible Jailbreaking against Large Language Models

Paper • 2510.05025 • Published Oct 6, 2025 • 34

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28, 2025 • 176

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 90

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70

upvoted 2 papers 6 months ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 84

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1, 2025 • 79

upvoted 2 collections 6 months ago

Perception Encoder

17 items • Updated Jul 11, 2025 • 78

DINOv3

DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21, 2025 • 515

upvoted 7 papers 9 months ago

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28, 2025 • 29

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27, 2025 • 26

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Paper • 2505.21494 • Published May 27, 2025 • 8

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26, 2025 • 24

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Paper • 2505.15141 • Published May 21, 2025 • 4

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22, 2025 • 42

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19, 2025 • 36