3 25 2

Xueqing Peng

Xueqing

AI & ML interests

None yet

Recent Activity

upvoted a paper about 23 hours ago

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

upvoted a paper 1 day ago

RubricBench: Aligning Model-Generated Rubrics with Human Standards

upvoted a paper 8 days ago

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

View all activity

Organizations

upvoted a paper about 23 hours ago

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Paper • 2603.01571 • Published 3 days ago • 29

upvoted a paper 1 day ago

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Paper • 2603.01562 • Published 3 days ago • 48

upvoted a paper 8 days ago

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Paper • 2602.16990 • Published 14 days ago • 11

upvoted a paper 29 days ago

Ebisu: Benchmarking Large Language Models in Japanese Finance

Paper • 2602.01479 • Published Feb 1 • 17

upvoted a paper about 1 month ago

When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents

Paper • 2510.11695 • Published Oct 13, 2025 • 3

upvoted 3 papers about 2 months ago

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Paper • 2601.04160 • Published Jan 7 • 4

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Paper • 2601.05403 • Published Jan 8 • 10

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Paper • 2601.03425 • Published Jan 6 • 16

upvoted 2 papers 3 months ago

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

Paper • 2512.09636 • Published Dec 10, 2025 • 26

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published Dec 2, 2025 • 72

upvoted a paper 5 months ago

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

Paper • 2510.08886 • Published Oct 10, 2025 • 20

upvoted a paper 7 months ago

Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference

Paper • 2508.04586 • Published Aug 6, 2025 • 12

upvoted 2 papers 9 months ago

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16, 2025 • 93

FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

Paper • 2505.20650 • Published May 27, 2025 • 17

upvoted a collection 10 months ago

MultiFinBen

Collection

4 items • Updated May 16, 2025 • 4

upvoted a paper 10 months ago

XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision

Paper • 2505.11336 • Published May 16, 2025 • 7

upvoted 2 papers 11 months ago

JudgeLRM: Large Reasoning Models as a Judge

Paper • 2504.00050 • Published Mar 31, 2025 • 62

FinAudio: A Benchmark for Audio Large Language Models in Financial Applications

Paper • 2503.20990 • Published Mar 26, 2025 • 19

upvoted an article 12 months ago

Article

Plutus: Pioneering Greek Financial AI in a Global Context

Feb 27, 2025

•

upvoted a collection about 1 year ago

Plutus: Benchmarking Greek Financial LLMs