LLM4Math - a shuoxing Collection

shuoxing 's Collections

MLLM Reasoning, Rewarding, and Understanding

LLM4Math

updated Nov 17, 2025

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Paper • 2510.04721 • Published Oct 6, 2025
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Paper • 2505.02735 • Published May 5, 2025 • 33
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

Paper • 2504.18428 • Published Apr 25, 2025
MathConstruct: Challenging LLM Reasoning with Constructive Proofs

Paper • 2502.10197 • Published Feb 14, 2025
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Paper • 2407.03203 • Published Jul 3, 2024 • 12
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Paper • 2503.21934 • Published Mar 27, 2025 • 1
Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9, 2025 • 20
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Paper • 2505.05758 • Published May 9, 2025 • 1
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

Paper • 2405.12209 • Published May 20, 2024
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Paper • 2505.23851 • Published May 28, 2025
Theorem Prover as a Judge for Synthetic Data Generation

Paper • 2502.13137 • Published Feb 18, 2025 • 1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

Paper • 2505.23754 • Published May 29, 2025 • 15
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23, 2024 • 44
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

Paper • 2511.11134 • Published Nov 14, 2025 • 32