Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published 3 days ago • 29
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 3 days ago • 48
Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Paper • 2602.16990 • Published 14 days ago • 11
Ebisu: Benchmarking Large Language Models in Japanese Finance Paper • 2602.01479 • Published Feb 1 • 17
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents Paper • 2510.11695 • Published Oct 13, 2025 • 3
All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection Paper • 2601.04160 • Published Jan 7 • 4
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection Paper • 2601.05403 • Published Jan 8 • 10
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published Jan 6 • 16
MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment Paper • 2512.09636 • Published Dec 10, 2025 • 26
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published Dec 2, 2025 • 72
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs Paper • 2510.08886 • Published Oct 10, 2025 • 20
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference Paper • 2508.04586 • Published Aug 6, 2025 • 12
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published Jun 16, 2025 • 93
FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information Paper • 2505.20650 • Published May 27, 2025 • 17
XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision Paper • 2505.11336 • Published May 16, 2025 • 7
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications Paper • 2503.20990 • Published Mar 26, 2025 • 19